AI Security

Hugging Face Pickle Backdoor Research 2025

Pickle-serialized model files remain a live attack surface on Hugging Face. Here is what 2025 research disclosed about persistent backdoors and what defenders should do about it.

Shadab Khan
Security Engineer
7 min read

Python's pickle format is the single most persistent foothold in the AI model supply chain. It has been known to be dangerous since before large language models existed, it has been flagged in Hugging Face's own documentation for years, and it keeps shipping in production pipelines. In 2025 a run of public research made the risk concrete again: ReversingLabs's nullifAI disclosure, JFrog's ongoing scanner findings, and academic work on backdoor persistence through fine-tuning all landed within a few months of each other. This post covers what that body of work actually demonstrated and what defenders should take away from it.

Why is pickle still a live attack surface in 2026?

Because torch.load still defaults to loading pickle, and the existing Hugging Face scanning tooling is coverage-limited rather than coverage-complete. Pickle's serialization format interleaves data with opcodes that call Python objects during deserialization; the REDUCE opcode is the canonical sink, but GLOBAL and BUILD have been used in real payloads too. Any format that lets an attacker run arbitrary Python on load is, definitionally, a code execution channel pretending to be a data channel.

The industry response since 2023 has been SafeTensors, and SafeTensors is genuinely a solution when it is used. The problem is that pickle has not gone away. Quantized checkpoints, older community models, GGUF-adjacent pipelines, and anything that persists Python objects beyond raw tensors still lean on pickle. ReversingLabs's February 2025 writeup on nullifAI documented malicious models on Hugging Face that evaded the platform's Picklescan checks by using broken PyTorch archives that tripped errors before reaching the scanner's enumeration stage. The attacker's lesson was simple: corrupt the archive enough to bypass the scanner, then rely on PyTorch's permissive loader to execute the payload anyway.

What did the nullifAI disclosure actually demonstrate?

Two malicious repositories on Hugging Face containing PyTorch checkpoints that, when loaded, executed a reverse shell to a hard-coded endpoint. The payload itself was not novel; reverse shells in pickle streams have been in public demos since at least 2018. What was novel was the evasion: the attacker used 7zip-compressed PyTorch archives with trailing data that Picklescan choked on. Picklescan returned an error and the model was not flagged as dangerous. A user who downloaded and loaded the model without additional checks ran the payload.

Hugging Face's Picklescan was patched after the disclosure, and the malicious models were removed. The broader point, though, is that signature-free scanners that enumerate pickle opcodes are defeating themselves whenever they fall back to "could not parse, assume clean." The right default is "could not parse, refuse to load." That is a policy decision, not a technical limit, and individual consumers need to enforce it on their side because the platform is not yet strict enough.

How does persistence through fine-tuning work?

Academic work in 2025 extended the existing BadNets line of research to show that backdoors planted in foundation weights survive typical fine-tuning procedures. The insight is that a small-magnitude trigger-to-target association can be distributed across many weight positions, such that the fine-tuning loss on a new task does not have a strong gradient pointing toward erasing it. Researchers showed this for vision classifiers and, more worryingly, for small language models where a specific token sequence in the prompt flips the model into an attacker-chosen output distribution.

For supply-chain defenders, the takeaway is that the "we fine-tuned it ourselves" answer is not a refutation of base-model compromise. If the base is compromised and the trigger is chosen carefully, the downstream fine-tune inherits the behavior. This elevates base-model provenance to a first-order security property rather than a nice-to-have.

What did JFrog and other scanners report about prevalence?

JFrog's 2025 reporting on Hugging Face scanning found on the order of a hundred actively malicious models on the platform at any given time, with the number rising when new popular base models release and attract typosquatting. Protect AI's ModelScan project reported a similar steady-state figure. The absolute counts are small relative to the millions of models on the platform, but the rates suggest this is an ongoing background threat rather than a single past incident.

The structure of the findings is what matters. Most of the malicious models are unsophisticated: opportunistic reverse shells or credential-theft payloads, often typosquatting popular names. A smaller fraction are crafted to evade scanners, which is the category nullifAI fell into. The most concerning category, which public reporting has only hinted at, is long-dwell backdoors in checkpoints downloaded into corporate environments where the payload waits for a trigger rather than popping immediately. Those are the ones a signature scanner by design cannot find.

What is the right loading posture for a consuming team?

Prefer SafeTensors for every new model, refuse pickle by default, and when pickle is unavoidable load it in an isolated process with no network egress and no credentials. PyTorch 2.4 introduced weights_only=True as a default-safer mode that restricts which globals deserialization can invoke. It is not a full sandbox, and researchers have demonstrated bypasses against restricted-unpickler configurations before, but it substantially narrows the attack surface and should be the baseline.

The second layer is a pre-flight scan. Picklescan, ModelScan, and Fickling each catch the obvious payloads. Stacking two of them reduces false negatives because they use different parsing strategies. None of them substitute for a policy that refuses-to-load on parse errors.

The third layer is execution isolation. If your pipeline loads arbitrary third-party pickles for a research reason that cannot be avoided, run the load in a container with no outbound network, no mounted secrets, and no shared filesystem writable by the process. Then export the model to SafeTensors and discard the original. The one-way conversion turns "trusted during load" into "never loaded again."

How should procurement and registry policy change?

Model registries should enforce format-allowlisting the same way container registries enforce image signing. A model pushed in pickle format gets rejected unless it passes scanning, has a signed provenance record, and is flagged as "legacy pickle." Everything new is SafeTensors or an equivalent data-only format. This is a process control, not a technical one, and it works because the registry is the chokepoint every consumer pulls through.

Procurement for commercial model vendors should ask, in writing, what format their models ship in, whether they sign weights, and what their incident response looks like for a compromised checkpoint. Several commercial providers in 2025 still default to pickle for historical reasons; that is a data point worth pricing into the contract.

What does the research trajectory suggest for 2026?

Two directions. First, more work on backdoors that are semantically hidden rather than opcode-hidden. If a backdoor lives in weight space and activates on a specific natural-language trigger, no pickle-level scanner will find it; you need behavioral evaluation against trigger dictionaries and adversarial probes. Expect more academic work and tooling here over the next year.

Second, more work on provenance as the primary defense rather than scanning as the primary defense. Scanning is a fallback for when you cannot trust the publisher. Signed, attested, and reproducible model artifacts change the trust model such that you do not need to scan unknown pickles because you do not load unknown pickles. That transition is underway at the enterprise tier and will slowly pull open-source norms along with it.

How Safeguard.sh Helps

Safeguard.sh's pickle-payload detection combines multiple open-source scanners with a refuse-on-parse-error policy that catches the nullifAI class of bypass that Picklescan alone misses. Eagle extends that coverage to the full model weight archive, including tokenizer and processor components that pickle-specific tools often skip, and produces an AI-BOM that records every artifact, hash, and format across your inventory. Model-weight signing and attestation workflows enforce Sigstore-backed signatures at the registry boundary so a swapped or resubmitted checkpoint is rejected before it reaches inference, and Griffin AI monitors Hugging Face publisher behavior and model churn so suspicious reupload patterns surface before your team pulls the bad version. Lino compliance maps these controls to the EU AI Act article on technical documentation and traceability, so the same pipeline that keeps pickles out of production also produces the evidence an auditor will ask for.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.