Open Source Security

Python Packaging Authority and the Security of pip install

Every pip install is a trust decision. The Python Packaging Authority has spent years hardening the ecosystem, but the attack surface remains vast and the threat actors are persistent.

Alex
Application Security Engineer
7 min read

Every time a developer runs pip install, they are making a trust decision. They are trusting that the package name they typed maps to the software they intend to install. They are trusting that the package has not been tampered with since the maintainer uploaded it. They are trusting that the maintainer's account has not been compromised. They are trusting that the package's dependencies are equally trustworthy.

Most of the time, this trust is justified. Sometimes it is not.

The Python Packaging Authority (PyPA) is the group responsible for maintaining the tools and infrastructure that make Python packaging work. That includes pip, PyPI (the Python Package Index), setuptools, and the packaging standards that hold it all together. Their work sits at the intersection of developer experience and supply chain security, and the tradeoffs they navigate are instructive for the entire open source ecosystem.

The PyPI Attack Surface

PyPI hosts over 400,000 packages. Anyone can create an account and upload a package with almost any name. This openness is a feature, because it enables the permissionless innovation that makes the Python ecosystem vibrant. It is also a massive attack surface.

The most common attack patterns against PyPI include:

Typosquatting. Registering package names that are close to popular packages. reqeusts instead of requests. python-dateutil with a subtle variation. Users who mistype a package name install malicious code.

Dependency confusion. Uploading packages to PyPI with the same names as private internal packages. If an organization's pip configuration is not locked down, pip may pull from PyPI instead of the internal registry, executing the attacker's code.

Account compromise. Taking over a maintainer's PyPI account and uploading malicious versions of legitimate packages. This is particularly dangerous because the package name and history look legitimate.

Malicious new packages. Uploading packages that claim to provide useful functionality but include hidden malicious code. Install hooks that execute during pip install are a common vector.

Each of these attack patterns has been exploited in the wild. Repeatedly.

PyPA's Security Improvements

To their credit, PyPA has steadily improved PyPI's security posture over the past several years. The pace has accelerated since 2022, driven by both increased attack activity and increased funding from sources like the OpenSSF.

Mandatory two-factor authentication. Starting in 2023, PyPI began requiring 2FA for maintainers of critical projects. This was later expanded to all maintainers who publish packages. This single change dramatically reduces the account compromise attack vector.

Trusted publishers. PyPI's trusted publisher feature allows packages to be published directly from CI/CD systems like GitHub Actions without storing long-lived API tokens. The publishing workflow is tied to a specific repository and workflow, making it much harder for an attacker to hijack the publishing process.

Malware detection. PyPI has implemented automated scanning for known malware patterns in uploaded packages. This catches the most obvious attacks but is inherently limited, since sophisticated malicious code can be obfuscated beyond automated detection.

Package signing with Sigstore. PyPI has integrated with Sigstore for package signing, providing a cryptographic link between a package and the identity that produced it. This is still rolling out, but when fully adopted, it will make it possible to verify that a package was published by the expected entity.

Rate limiting and abuse detection. PyPI has improved its ability to detect and respond to bulk upload attacks, such as when an attacker registers hundreds of typosquatting packages in a short period.

What pip install Actually Does

Understanding the security of Python packaging requires understanding what happens when you run pip install somepackage.

  1. pip resolves the package name to a distribution on PyPI (or configured indices)
  2. pip downloads the distribution (wheel or sdist)
  3. For sdist distributions, pip executes setup.py to build the package
  4. pip installs the built package into the environment

Step 3 is where the security risk concentrates. Executing setup.py means running arbitrary Python code with the permissions of the installing user. A malicious setup.py can do anything: exfiltrate environment variables, install backdoors, modify other packages, or establish persistence.

Wheel distributions (.whl files) avoid this risk by not requiring build-time code execution. The push toward wheel-only distribution is partly a security measure. But many packages still distribute as sdist, and pip will fall back to sdist when a compatible wheel is not available.

The Resolver Problem

pip's dependency resolver determines which versions of which packages to install. This resolver operates in a complex constraint space where different packages require different versions of shared dependencies.

From a security perspective, the resolver is a critical control point. If an attacker can influence which version of a dependency gets installed, they can substitute a vulnerable or malicious version.

pip's resolver has improved significantly. The new resolver, introduced in pip 20.3, handles conflicts more correctly than the legacy resolver. But resolver behavior is still influenced by factors that users may not fully understand, including index ordering, version constraints, and platform markers.

Organizations that care about supply chain security should lock their dependency versions. Tools like pip-compile (from pip-tools), Poetry, and PDM generate lock files that pin exact versions of all direct and transitive dependencies. This removes the resolver as an attack vector by ensuring that builds are reproducible and that version selection is determined in advance rather than at install time.

The Private Index Problem

Many organizations run private PyPI indices for internal packages. The interaction between private and public indices creates a dependency confusion attack surface.

If pip is configured to search both a private index and public PyPI, and a package name exists on both, the behavior depends on configuration. In the worst case, pip installs the public package (which may be malicious) instead of the private one.

The mitigation is straightforward but requires explicit configuration. Organizations should configure pip to search their private index exclusively for internal packages and public PyPI exclusively for public packages. The --index-url and --extra-index-url flags, and the [global] section in pip.conf, control this behavior.

In practice, many organizations get this wrong. The default behavior is not secure, and developers who are focused on making their code work are not always thinking about index resolution order.

Namespace Security

PyPI's flat namespace, where any user can register any available name, is a fundamental design decision with deep security implications.

In contrast, npm uses scoped packages (@organization/package) that tie package names to verified organizations. Go modules use domain-based paths that link packages to their source repositories. These namespacing approaches make typosquatting harder and provide inherent provenance information.

PyPI has discussed but not implemented namespacing. The tradeoff is real. Namespacing adds friction to the publishing process. It changes the developer experience. And migrating a massive existing ecosystem to a new naming scheme is technically and politically complex.

The absence of namespacing means that PyPI's defense against name-based attacks relies on reactive measures (malware scanning, takedowns) rather than structural protections. This is a conscious design choice, and it places additional burden on consumers to verify that the packages they install are what they expect.

Best Practices for Consumers

Given the current state of Python packaging security, organizations consuming PyPI packages should:

Pin dependencies with lock files. Use pip-compile, Poetry, or PDM to generate deterministic lock files. Never deploy from floating version constraints.

Verify package integrity. Use pip's hash checking mode (--require-hashes) to ensure that installed packages match expected content.

Audit new dependencies. Before adding a new dependency, check its maintenance status, download count, contributor history, and source repository. A package with ten downloads and no source repository is a red flag.

Separate index configurations. Configure private and public indices explicitly to prevent dependency confusion.

Monitor for compromises. Subscribe to PyPI's security advisories and monitor your dependencies for unexpected version changes or maintainer transfers.

How Safeguard.sh Helps

Safeguard.sh provides continuous monitoring of your Python dependency tree. We track every package in your lock files against known vulnerability databases, malware feeds, and maintainer compromise indicators. When a PyPI package in your dependency tree is flagged for malicious content or a maintainer account is compromised, Safeguard.sh alerts your security team immediately, with affected project context and remediation guidance. Our SBOM generation captures the full transitive dependency graph, including the exact versions and hashes of every package, giving you the foundation for hash verification and reproducible builds.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.