The Python Package Index (PyPI) ended 2022 under siege. In the fourth quarter alone, security researchers identified hundreds of malicious packages uploaded to PyPI, ranging from crude credential stealers to sophisticated multi-stage malware campaigns. The volume and variety of attacks demonstrated that PyPI had become a primary target for supply chain attackers, and the registry's defenses were struggling to keep pace.
The Scale of the Problem
By late 2022, the numbers were staggering:
- Over 400 malicious packages were identified on PyPI in Q4 2022 alone
- Typosquatting remained the dominant distribution technique
- Multiple distinct threat actors were operating simultaneously
- Average time between upload and detection ranged from hours to weeks
- Some malicious packages accumulated thousands of downloads before removal
These weren't isolated incidents by individual troublemakers. They were organized campaigns with clear objectives: steal credentials, establish persistence, exfiltrate data, and mine cryptocurrency.
Notable Campaigns
The W4SP Stealer Campaign
One of the most prolific campaigns of Q4 2022 involved the W4SP Stealer, an information-stealing malware distributed through dozens of typosquatted PyPI packages. Researchers at Phylum and Checkmarx tracked over 30 packages in this campaign, including names like:
typesutil(mimickingtypes-utils)typestring(mimickingtyping)sutiltype(scrambled name)duaborern,staborern(obfuscated names)
The W4SP Stealer targeted Discord tokens, browser-stored passwords, cryptocurrency wallets, and other sensitive data. The malware was embedded in the packages' setup.py files, executing during installation rather than requiring the victim to import the package in their code.
What made this campaign particularly effective was its persistence. When packages were detected and removed, the attacker quickly uploaded new ones with different names. The cat-and-mouse game continued throughout the quarter.
The "Colour" Typosquatting Campaign
In November 2022, Spectralops discovered a campaign of 12 malicious packages that typosquatted popular color and logging libraries:
colourfull(mimickingcolorful)colour-science(mimickingcolour-science)loglib-modules(mimicking logging libraries)
These packages contained code that downloaded and executed additional payloads from remote servers, establishing reverse shells and installing keyloggers. The campaign was notable for its attention to detail — packages included realistic README files, version histories, and even documentation to appear legitimate.
Cryptocurrency-Focused Attacks
Multiple campaigns specifically targeted cryptocurrency developers and users:
- Packages mimicking popular Web3 libraries like
web3andeth-utils - Malicious packages that modified clipboard contents to replace cryptocurrency wallet addresses
- Packages that scanned for and exfiltrated cryptocurrency wallet files
One package, discovered by Phylum in October, intercepted cryptocurrency transactions by monitoring the system clipboard for wallet addresses and replacing them with attacker-controlled addresses — a technique known as clipboard hijacking.
The WASP Nest
In late November, researchers discovered a campaign that distributed malware through PyPI packages that appeared to offer cracked versions of popular tools or access to premium content. These "too good to be true" packages attracted developers looking for free alternatives, only to install credential-stealing malware.
Attack Techniques
The Q4 2022 campaigns revealed increasingly sophisticated techniques:
Installation-Time Execution
Most malicious packages executed their payload during pip install by including malicious code in setup.py or __init__.py. This meant that simply installing a package — even if you never imported it in your code — was enough to trigger the malware.
Obfuscation
Attackers used multiple layers of obfuscation to evade detection:
- Base64-encoded payloads nested multiple levels deep
- String reversal and character substitution
- Encrypted payloads that downloaded decryption keys from remote servers
- Steganography — hiding malicious code in image files included in the package
Staged Payloads
Rather than including the complete malware in the package, many campaigns used a small loader that downloaded the full payload from an external server. This kept the package size small and allowed the attacker to update the payload without re-uploading the package.
Dependency Chain Exploitation
Some campaigns created chains of packages where the malicious code was split across multiple packages. The initial package appeared benign and depended on a second package that contained the actual malware. This made static analysis of individual packages less effective.
PyPI's Response
PyPI and the Python Software Foundation took several steps to address the malware problem:
Mandatory 2FA for critical projects. Following npm's lead, PyPI began requiring two-factor authentication for maintainers of critical projects, reducing the risk of account takeover for the most impactful packages.
Malware detection improvements. PyPI enhanced its automated scanning capabilities, though the details were (understandably) not publicly disclosed.
Temporary new user registration suspension. In a dramatic move that underscored the severity of the problem, PyPI temporarily suspended new user registrations and new project creation during particularly intense waves of malicious uploads.
Community reporting. PyPI streamlined the process for security researchers to report malicious packages, and the response team improved their turnaround time for removal.
Why PyPI Is Particularly Vulnerable
Several characteristics of the PyPI ecosystem make it an attractive target:
pip executes arbitrary code during installation. Unlike some other package managers, pip runs setup.py during installation, which can contain any Python code. This gives malicious packages a reliable execution opportunity.
No package signing. PyPI packages are not signed by default, and there's no built-in mechanism for users to verify that a package was uploaded by its claimed author.
Flat namespace. Anyone can register any available package name, making typosquatting trivially easy.
Limited automated review. The volume of package uploads — over 500,000 packages and growing — makes comprehensive manual review impossible, and automated detection has significant blind spots.
Defensive Measures
For Python developers, the Q4 2022 campaigns reinforced several defensive practices:
- Verify package names carefully. Double-check every
pip installcommand, especially when copying from documentation or tutorials. - Use
--require-hasheswith pip. This ensures that only specific, known-good versions of packages are installed. - Pin dependencies in requirements files. Don't use version ranges that could pull in malicious new versions.
- Use a private PyPI mirror. Curate and scan packages before making them available to developers.
- Monitor for typosquats of your packages. If you maintain popular packages, watch for similarly-named packages that might be targeting your users.
- Run pip install in sandboxed environments. Use virtual environments and consider running installations in containers to limit blast radius.
How Safeguard.sh Helps
Safeguard.sh provides automated protection against the PyPI malware campaigns that plagued Q4 2022 and continue to this day. Our platform scans Python dependencies against known malicious package databases, detects typosquatting attempts, analyzes package behavior for suspicious patterns, and monitors for newly-published packages that target your existing dependencies. By integrating with your CI/CD pipeline, Safeguard.sh prevents malicious packages from entering your software supply chain before they can cause damage.