Throughout 2023, the Python Package Index (PyPI) faced an unrelenting wave of malicious package uploads. By September, researchers had identified hundreds of malicious packages that collectively had been downloaded tens of thousands of times. The campaigns used typosquatting, dependency confusion, and social engineering to trick Python developers into installing credential-stealing malware.
The Scale of the Problem
The numbers tell a grim story. Multiple security research teams tracked the campaigns:
- Phylum identified over 100 malicious packages in a single campaign in August 2023
- Checkmarx reported hundreds of packages targeting developers with information stealers
- ReversingLabs documented campaigns spanning months with evolving evasion techniques
- Sonatype tracked cumulative tallies exceeding 400 malicious packages removed from PyPI in 2023
The volume was so overwhelming that on May 20, 2023, PyPI temporarily suspended new user registrations and new project creation to deal with the flood of malicious uploads. This was an unprecedented step for a major package registry.
Attack Techniques
The campaigns employed several techniques, often in combination:
Typosquatting
The most common technique: creating packages with names similar to popular legitimate packages. Examples from 2023 campaigns include:
requesstsinstead ofrequestsbeautifulsoup5instead ofbeautifulsoup4python-binancetargeting users of thebinancepackage- Variations with prefixes like
py-,python-, or suffixes like-utils,-sdk
Attackers registered dozens of variations for each popular package, casting a wide net for typos.
Starjacking
Some malicious packages hijacked the "homepage" and "repository" URLs in their metadata to point to legitimate popular projects. This made the package appear credible when viewed on PyPI's web interface—it would show high GitHub stars and recent activity, all borrowed from the legitimate project.
Obfuscated Payloads
The malicious code ranged from obvious to highly obfuscated:
Simple exfiltration: Some packages contained plaintext HTTP requests to attacker-controlled servers in their setup.py, executing during pip install.
Multi-stage loading: More sophisticated packages included encrypted or encoded payloads that were decoded at runtime, with the actual malware downloaded from external servers.
Steganography: Some campaigns hid payloads in image files included in the package, extracting and executing code from pixel data.
Import hooks: A few packages installed Python import hooks that intercepted future import statements, allowing them to inject code into other packages.
What the Malware Did
The payloads varied, but common functionalities included:
Browser credential theft. Extracting saved passwords, cookies, and autofill data from Chrome, Firefox, Edge, and other browsers.
Cryptocurrency wallet theft. Scanning for wallet files and seed phrases for Bitcoin, Ethereum, and other cryptocurrencies. Some variants replaced clipboard contents when a crypto address was copied, substituting the attacker's address.
Discord token theft. Stealing Discord authentication tokens, which can be used to take over accounts and spread malware through trusted channels.
SSH key and cloud credential theft. Exfiltrating ~/.ssh/, AWS credentials, GCP service accounts, and other authentication material from developer machines.
System reconnaissance. Collecting hostname, username, IP address, installed software, and running processes to profile victims.
Developer Machines Are High-Value Targets
The targeting of developers is deliberate. Developer machines typically contain:
- Source code for the organization's products
- Cloud credentials with broad permissions for development and deployment
- SSH keys that provide access to production servers and internal infrastructure
- API keys and secrets for third-party services
- VPN credentials for accessing corporate networks
Compromising a developer's machine often provides a path to compromising the organization's entire infrastructure. It's also a potential vector for supply chain attacks—if the developer has commit access to open-source projects or private repositories, the attacker can inject malicious code.
PyPI's Response
PyPI implemented several measures in 2023:
Mandatory 2FA for critical projects. PyPI now requires two-factor authentication for maintainers of the top 1% most-downloaded packages, reducing the risk of account takeover.
Malware detection improvements. PyPI invested in automated scanning to detect malicious packages before they reach users, though the false positive/negative tradeoff remains challenging.
Trusted publishers. PyPI's trusted publisher feature allows packages to be published through GitHub Actions (and other CI systems) without long-lived API tokens, reducing the risk of credential theft leading to package compromise.
Community reporting. PyPI improved its malware reporting workflow, making it easier for researchers and users to flag suspicious packages.
Protecting Your Organization
Use a private registry or mirror. Don't let developers install packages directly from public PyPI. Use a private registry that scans packages before making them available.
Pin dependencies with hashes. Use pip install --require-hashes or equivalent to ensure packages match known-good hashes.
Review new dependencies before adoption. Adding a new dependency should be a reviewed decision, not something a developer does casually.
Monitor for unusual install-time behavior. Packages shouldn't make network requests or access sensitive files during installation. Tools that sandbox pip install can detect this.
Educate developers about typosquatting. A few seconds of verification before pip install can prevent a compromise.
How Safeguard.sh Helps
Safeguard.sh scans your Python dependencies against known malware databases and performs behavioral analysis to detect suspicious packages before they enter your environment. Our platform monitors PyPI for typosquatting attempts against packages in your dependency tree, alerts you when new dependencies are added without review, and provides a curated view of your Python supply chain that highlights risk factors like recent ownership changes, obfuscated code, and unusual install-time behavior. By integrating with your CI/CD pipeline, Safeguard.sh catches malicious packages before they reach developer machines.