Incident Analysis

Hugging Face Model Hub Supply Chain Risks in 2025

Pickle deserialization, malicious Spaces, and namespace squatting: what 2024-2025 taught us about the Hugging Face model supply chain.

Shadab Khan
Security Engineer
7 min read

Hugging Face hosts over a million models and datasets. For AI teams, huggingface.co has replaced GitHub as the default distribution layer for model weights, tokenizers, and inference configs. That shift made the Hub a high-value supply chain target. Through 2024 and into 2025, multiple incident classes surfaced: weaponized pickle files hidden in model repos, typosquatted model names, secret-exfiltrating Spaces, and CI pipelines that blindly downloaded and executed model artifacts.

This post covers the dominant risk patterns, what has actually been observed in the wild, and which controls meaningfully reduce exposure for ML pipelines.

Why is pickle deserialization still the core problem?

Python's pickle format allows arbitrary code execution during deserialization, and a substantial fraction of Hugging Face models ship as pickle files.

Pickle is the historical serialization format for PyTorch checkpoints, and for a long time torch.load() defaulted to executing arbitrary bytecode during load. A model file can, trivially, include a __reduce__ hook that runs any Python code the attacker wants when the model is loaded. This is not a bug; it is how pickle is designed. The consequence is that running AutoModel.from_pretrained("some/model") on an untrusted model is equivalent to running exec() on attacker-controlled code.

Hugging Face has made significant progress mitigating this. The default checkpoint format has shifted toward safetensors, which is a header-plus-tensor-blob format with no code execution surface. Many popular models now ship safetensors variants alongside legacy pickle files. But pickle has not disappeared: a substantial portion of long-tail community models are still pickle-only, and torch loads the pickle path silently if the safetensors file is absent.

JFrog and other researchers have documented dozens of pickle-weaponized models on the Hub through 2024 and 2025. Some established reverse shells back to attacker infrastructure on load. Others stole environment variables or SSH keys.

How do namespace typosquats work on the Hub?

Attackers publish models with names one character off from a legitimate popular model, hoping downstream users will load the squatted version by mistake.

The Hub uses organization/model-name identifiers. Attackers have registered namespaces like mistraI-ai (with a capital I instead of lowercase L) or added hyphens and numbers to look like plausible variants. Downstream code that constructs model names dynamically (for example, reading from a config or user input) can resolve to a typosquatted model. Because the Hub does not enforce strict namespace collision rules beyond exact match, these variants can coexist with the real projects.

The attack has two variants. The low-skill version relies on users typo-ing a name in their own code. The higher-skill version targets automated pipelines where a model selector, prompt template, or configuration file is manipulated upstream to substitute a malicious model, and the pipeline loads it without verification.

What is the Spaces attack surface?

Hugging Face Spaces run user-provided code in a shared environment with privileged access to models, datasets, and sometimes user tokens.

A Space is essentially a cloud notebook or Gradio app defined by a repo on the Hub. Visitors interact with it through a web UI. Under the hood, Spaces can pull arbitrary code from Python packages, load models, and access the requesting user's Hugging Face session via OAuth. Several incidents have involved malicious Spaces that requested elevated OAuth scopes and used them to access private models or API keys belonging to the visiting user.

There was a specific disclosed incident in 2024 where researchers demonstrated that a Space could read tokens stored by a visitor's browser under certain configurations. Hugging Face has since tightened Space isolation and reduced token scoping, but the underlying pattern (user runs code written by someone else with privileged access) is structural.

How do CI/CD pipelines actually get compromised?

ML pipelines commonly pull models during build or deployment, and any pipeline that does so without pinning revisions and verifying hashes is exposed.

A typical ML CI workflow looks like: base image includes PyTorch, CI step downloads a model from Hugging Face at build time, model gets baked into a container image, image gets deployed. If the model name is resolved as org/model without a revision, the build pulls the latest commit. If a maintainer's account is compromised and a malicious commit is pushed, the next CI build pulls the poisoned weights. If those weights are pickle, loading them during testing or deployment runs attacker code inside the build environment. Cloud credentials, secrets, and deploy keys are then reachable.

The fix is to pin by revision hash: AutoModel.from_pretrained("org/model", revision="sha256hashhere"). This makes the download content-addressable and prevents silent upstream tampering. Very few pipelines actually do this, because the ergonomics are poor and the tooling does not encourage it.

What controls actually reduce exposure?

Five controls, in priority order.

Prefer safetensors and refuse pickle. Configure transformers to require safetensors by default, and fail loudly when a model only ships pickle. This is a one-line configuration change and it eliminates the single most common code-execution attack vector on the Hub.

Pin models by revision hash. Treat model downloads like package installs: exact version, hash-verified, lockfile-committed. The huggingface_hub library supports revision pinning directly; use it.

Scan models before loading. picklescan and commercial tools analyze pickle files without executing them and flag dangerous opcodes. Integrate scanning into CI before any torch.load runs. This catches weaponized pickle before it executes.

Isolate model-loading steps. Run model loads in a sandbox with no network egress beyond the Hub endpoint, no access to cloud credentials, no mounted secrets. If the model is compromised, the blast radius is the sandbox. Kubernetes security contexts, gVisor, or lightweight VMs are all viable.

Inventory every model your organization uses. Most ML teams do not have this list. Building an SBOM-equivalent for model artifacts (provenance, revision hash, format, license, scan status) is the foundation for responding to any upstream Hugging Face incident in under an hour instead of under a week.

How does model-level supply chain differ from code-level?

Model artifacts are binary blobs that encode learned behavior, and the supply chain questions they raise only partially overlap with traditional code dependencies.

A compromised Python package has a clear attack surface: read the code, run the code, observe the effects. A compromised model is murkier. The weights are a few gigabytes of floating-point numbers. They encode a decision function, but reading them directly tells you nothing. You can run the model in a sandbox and observe its outputs on probe inputs, but the behavior that matters may only emerge on specific rare inputs, or in adversarial contexts the defender does not think to test. Model poisoning, backdoor triggers, prompt-injection vulnerabilities, and data exfiltration via encoded outputs are real attack categories that traditional SCA tooling was never designed to address.

The pickle and typosquat risks this post focused on are the immediate, executable-code attack surface. They are the easy half. The harder half is what happens when the model itself is the attack: trained with a backdoor that activates on specific inputs, shipped as safetensors so there is no code-execution path to flag, embedded in your inference pipeline under a name you trust. Detection here is closer to ML research than traditional security engineering. Techniques like activation-pattern analysis, input-triggered behavior testing, and provenance verification of training data are evolving, but they are not mature.

The pragmatic stance for 2025: solve the easy half first. Disable pickle. Pin revisions. Scan for known-bad patterns. That addresses the most common and highest-volume attacks. Track the harder half, invest in it as the tooling matures, but do not let the hard problem block the easy wins. An ML team that has done the basics is already in the top decile of supply chain posture.

How Safeguard.sh Helps

Safeguard.sh's reachability analysis treats Hugging Face models as first-class supply chain artifacts and applies the 60-80% noise reduction to surface model-loading paths that actually execute attacker-controlled code, not just every model in the inventory. Griffin AI scans pickle artifacts for dangerous opcodes, monitors revision drift across pinned models, and flags new network behaviors during model initialization independent of public disclosure. The SBOM pipeline includes model artifacts at 100-level dependency depth, with provenance and revision tracking across pipelines, and TPRM workflows gate AI/ML suppliers under explicit review for model-hub-sourced artifacts. Container self-healing rolls back affected inference images when upstream model integrity signals shift, so compromised model updates do not silently propagate into production.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.