Open Source Security

Hugging Face's Guardian-Plus-Picklescan Stack: How the Model Hub Scanning Posture Evolved Through 2025-2026

Following NullifAI and the broken-pickle bypass campaigns, Hugging Face layered Protect AI's Guardian on top of Picklescan, ClamAV, and secrets scanning across 1.5 million public models. Here is the defender view of the new pipeline.

Yukti Singhal
ML Security Researcher
7 min read

In February 2025, ReversingLabs published the NullifAI write-up demonstrating that an attacker could ship malicious code inside a PyTorch model archive by using a 7z-wrapped pickle stream rather than the conventional ZIP-wrapped one, evading Hugging Face's then-current picklescan tool. JFrog followed with its own bypass research, and over the following months independent disclosures totaling seven confirmed Picklescan bypass classes were filed across JFrog, Sonatype, and Socket. From a defender perspective, the registry response is more interesting than the bypasses: by mid-2026, Hugging Face is running a layered scanner pipeline that combines its in-house picklescan, Protect AI's Guardian commercial scanner, a malware/AV scanner, and a secrets scanner on every public repository push. This post walks through what the Model Hub now enforces, what a consumer can verify directly, and where the policy still has gaps.

What did Hugging Face change in its scanning posture?

Three changes are visible. First, every public model repository is now scanned automatically by Protect AI's Guardian on push, on top of the existing picklescan, ClamAV-based malware scanning, and TruffleHog-based secrets scanning. The Hugging Face docs describe Guardian as a "third-party scanner" reported through the security file viewer on the model card. Second, picklescan itself was updated to handle the 7z-wrapped and other archive-format variants that the NullifAI bypass exploited, and the upstream project added CI gating for known opcode obfuscation patterns. Third, Hugging Face surfaces a scanner verdict directly on the file browser: every weight file shows a green check, a yellow "warning" badge for unsafe but allowed content, or a red block for files quarantined for confirmed malware. Importantly, a yellow verdict does not block download; it informs the consumer that the file uses an unsafe serialization format, leaving the decision in the user's hands. That design choice is intentional and contested, and we will come back to it.

How was the response coordinated across vendors and the Hub?

The post-NullifAI response is a useful case study in cross-vendor incident coordination. ReversingLabs reported the bypass to Hugging Face under a private disclosure window before publishing the public write-up. Hugging Face shipped a picklescan update, the malicious models were quarantined, and the Hub team published a corresponding note explaining what was missed and why. JFrog later partnered with Hugging Face to feed its threat-intel data into the scanner pipeline, and Sonatype's research on opcode obfuscation flowed into upstream picklescan patches. The Hugging Face security team also expanded the public security documentation to call out explicitly that pickle remains an inherently unsafe format and that users should prefer the safetensors format for new model uploads. The transition to safetensors as the default for many widely-used model families has been the most durable defender win of the year.

What signals are now visible to a downstream consumer?

A consumer pulling a model can read four signals without leaving the Hub. The first is the file-level scanner verdict shown alongside every artifact. The second is the model card's "files and versions" tab, which surfaces whether the repository contains any *.bin, *.pt, *.pkl, or *.ckpt files (pickle-based) versus *.safetensors (deserializable safely). The third is the repository's commit history, which can be inspected to confirm that the weights you are about to pull match the commit the maintainer signed. The fourth, newer in 2026, is a "verified" badge for organization-published models from select partner orgs, signaling that the publisher's identity has been verified outside the Hub's own automated checks. Together these are not a guarantee, but they let a consuming organization write a policy that prefers safetensors, accepts pickle only with passing scanner verdicts, and pins to a specific commit SHA rather than a moving main ref.

How do you verify a model before letting it touch a GPU?

A defender pulling a model should treat the operation like installing an arbitrary package: check format, check scanner status, pin to a content hash, and load in an isolated environment first. The huggingface_hub Python client and CLI support all of these.

# Prefer safetensors and pin to a commit SHA
huggingface-cli download org/model-name \
  --revision 4c89ad9c8f3b... \
  --allow-pattern "*.safetensors" \
  --allow-pattern "tokenizer*" \
  --local-dir ./model

# Scan locally with Protect AI ModelScan before loading
modelscan --path ./model

For pickle-based weights that cannot yet be avoided, the safetensors library exposes a one-line conversion that lets you transcode the weights through a trusted environment and then load only the safe artifact in production. The picklescan CLI is also useful as a defense in depth, even though it is no longer the registry's only signal. For organizations operating their own model mirror, JFrog and Sonatype both ship pre-ingestion scanners that block known-bad hashes from the Hugging Face quarantine feed and can apply org-local policies such as "no pickle in this tenant."

What policy gate catches the next bypass class going forward?

Three gates address the structural risk rather than the specific bypass. Gate one is "deny any model whose primary weight format is pickle-based when a safetensors artifact is available in the same revision," which preserves the option to use older models while preferring the safe format for everything new. Gate two is "pin model loads to a commit SHA rather than a tag or branch," which neutralizes a class of attack where a maintainer's compromised account silently rewrites main. Gate three is "block on Hugging Face scanner verdicts of red and gate on yellow with human review," accepting that scanner false negatives exist while treating any positive verdict as a hard signal. For organizations running an internal Guardian or ModelScan instance, those verdicts can be combined with org-local rules such as "no model larger than X GB from a publisher younger than Y days."

What still has to mature?

Three gaps are visible to defenders in mid-2026. The first is that the Hub's "yellow" verdict for unsafe-but-allowed pickle still permits download by default, and a significant fraction of widely-used research models remain pickle-only. Until the safetensors migration is essentially complete for popular families, the format risk is not eliminated. The second is that picklescan bypasses keep being found at a steady rate; the SafePickle and PickleBall academic projects published in late 2025 and early 2026 propose more robust static and runtime detection but neither is yet the default. The third is that CVE-2026-25874, disclosed in April 2026 in Hugging Face's LeRobot framework, showed that pickle's unsafety extends beyond the Hub itself to derived inference servers. Closing that gap requires both registry and downstream-tool authors to give up pickle as a transport format entirely, which is a multi-year process.

How Safeguard Helps

Safeguard treats Hugging Face models as first-class supply-chain components alongside npm and PyPI packages. The ML-BOM workflow inventories every model loaded by your applications, links it to the Hugging Face repository and commit SHA, and tracks whether the weights use pickle or safetensors plus the upstream scanner verdict. The malicious-package feed includes the Hugging Face quarantine stream and surfaces a finding the moment a quarantined model appears anywhere in your inventory. Policy gates can require safetensors-only loads for production, require commit-SHA pinning, and refuse models from publishers below an OpenSSF Scorecard threshold. The provenance verification engine handles SLSA attestations for model build pipelines, so internally produced fine-tunes carry the same chain-of-custody story your code dependencies already have, closing the loop between npm-style and model-style supply-chain risk in a single policy framework.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.