AI Security

AI Model Weights: Signing, Attestation, Provenance

Model weights are binaries with the privilege of code and the review of documents. Here is what signing, attestation, and provenance should actually look like.

A set of model weights is an executable artifact that your inference runtime loads into memory and acts on. It determines every downstream output of the system. In any other context, an artifact with that much influence would be signed, attested, and provenance-tracked end to end. For model weights, the practice is still spotty. This post covers what the signing and attestation story should look like and where the practical gaps are.

Why is signing weights more important than signing binaries?

Because the typical review depth on a set of weights is zero, while the typical review depth on a binary is at least some. A developer pulling a new build of a library probably runs tests, reads a changelog, and occasionally skims the diff. A developer pulling a new checkpoint from Hugging Face often runs nothing more than from_pretrained(). The trust decision is compressed into the act of typing the repo name.

When the PyTorch nightly repository was compromised in late 2022, the payload rode a pickle-based dependency and got into a small but real number of environments. That compromise was caught relatively quickly. A similar compromise targeting a popular model repo rather than a code package would have a longer dwell time because nobody is watching model weights the way they are watching code dependencies. Signing gives you the first rung of defense: a reason to notice when the artifact changed unexpectedly.

What does a real signing scheme look like for weights?

Sigstore's cosign, applied to the weight file or to the manifest that references it, is the pragmatic answer in 2026. Cosign supports both key-based and keyless signing; for public model releases, keyless (OIDC-tied) is usually the right default because it removes the key management problem and gives you an audit trail tied to a real identity. For internal models, key-based with HSM-backed keys gives stronger guarantees.

The signing covers the specific weight bytes, not "a model with this name." Hashes are over the raw file (SafeTensors, GGUF, whatever format), not over metadata that can be regenerated. If you sign only the manifest, an attacker who swaps the weight file without touching the manifest goes undetected.

What is not sufficient: relying on the hash displayed in a web UI. Hugging Face and similar platforms display hashes, but unless your pipeline verifies the signature against a known public key (or a known identity in keyless mode), the hash is only as trustworthy as the display. Several Hugging Face incidents in 2025 involved either UI inconsistencies or weights that were replaced transiently. Verify locally, programmatically, every pull.

How should attestation extend beyond signing?

Signing says "this artifact came from this identity." Attestation says "this artifact was produced by this process on this input." For models, the attestation chain you want is: the training data hashes (or a Merkle root over the dataset), the training code commit hash, the hyperparameter configuration, the compute environment (typically a container image hash), and the resulting weight hash. Each of those is a claim, and the full chain is signed.

This is essentially SLSA applied to ML pipelines, and there is active work around it. In practice, teams that get this right are the ones who treat model training as a build system, with reproducible (or at least attested) runs, rather than as a researcher's notebook session. A run that cannot produce an attestation is a run that cannot be audited later, which is a problem the moment something goes wrong.

For teams consuming third-party models, the attestation questions are: does the model publisher provide a provenance document, does that document chain back to identifiable training data and code, and is there a way to verify the document was not generated after the fact. Most public models fail all three. That is not a reason to avoid using them; it is a reason to price the risk correctly.

How does this interact with the broader SBOM story?

An AI SBOM is a superset of a traditional SBOM that includes model components: the weights file, the tokenizer, the processor, the fine-tuning base, the datasets, and the licenses on each. The signing and attestation data attach to each component. When a vulnerability or compromise is reported against a base model, the SBOM tells you which of your downstream fine-tunes inherit it, and the attestation chain tells you whether the fine-tuning process had the opportunity to clean it.

The Hugging Face transformers library makes it easy to download and use models, and it makes the SBOM story harder because the lineage is often implicit in the model card prose rather than structured. Teams serious about model SBOMs pull this data out of prose into structured fields, keyed off the model hash, so that an automated system can reason about it.

What about runtime verification?

Inference runtimes should verify model signatures before loading, fail closed on verification errors, and log every model load with the hash. This is where the operational rubber meets the road. A model that gets swapped on disk (accidentally, or maliciously) and then loaded by an inference server that does not verify is the canonical undetected compromise. The mitigation is two lines of code in the model-loading path, and it is usually absent.

For hosted inference, the question becomes whether the provider verifies. Some do; some give you knobs to require it; some are silent on the topic. Procurement teams should ask, in writing, and should weigh the answer as a real security property rather than a checkbox.

What gaps remain in 2026?

Three big ones. First, the tokenizer and processor components are often distributed alongside weights and are often not signed even when weights are. A poisoned tokenizer can cause subtle misbehavior. Second, the provenance chain for training data is basically nonexistent for most public models; the best you get is a named dataset and a hash at a moment in time. Third, fine-tuning workflows often rebase onto new versions of the base model silently, which can break your signature chain. These are solvable but not solved, and teams should plan around them rather than pretend they are.

How do you operationalize this without grinding releases to a halt?

Automation at the registry and CI layer. Teams that get this right have two things: a model registry that rejects unsigned artifacts at push time, and a CI check that rejects unsigned model references at build time. Everything else (hash calculation, signature verification, attestation assembly) happens in the background when a trainer promotes a checkpoint. Humans get involved only for exceptions.

The registry-level rejection is the control that scales. If every model consumer in your org pulls through a registry that enforces signatures, you do not need every consumer to get signing right. The registry gets it right once, and drift is bounded. Without that chokepoint, every team ends up reinventing verification badly, and most of them end up not doing it at all. The same lesson applied to container registries a decade ago. The pattern is the same; only the artifact has changed.

How Safeguard.sh Helps

Safeguard.sh extends SBOM generation to cover model weights, tokenizers, and training-time dependencies, with reachability analysis cutting 60 to 80 percent of the downstream alert noise that would otherwise bury real signal. Griffin AI continuously monitors model provenance and flags hash mismatches, signature failures, and suspicious publisher behavior on platforms like Hugging Face before models reach production. TPRM workflows track each model vendor's attestation practices and score them alongside code vendors, and the 100-level dependency depth surfaces transitive compromises in model-adjacent Python packages such as those seen in the Ultralytics incident. Container self-healing rebuilds inference images whose model or runtime dependencies ship fixes, keeping production aligned with the current patch posture.

ai-security model-signing attestation provenance sbom

Back to all articles

More on #ai-security

View all →

AI Security

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.

AI Model Weights: Signing, Attestation, Provenance

Why is signing weights more important than signing binaries?

What does a real signing scheme look like for weights?

How should attestation extend beyond signing?

How does this interact with the broader SBOM story?

What about runtime verification?

What gaps remain in 2026?

How do you operationalize this without grinding releases to a halt?

How Safeguard.sh Helps

More on #ai-security

API Surface Reviewed: Griffin AI vs Mythos

Real-World Deployment: Griffin AI vs Mythos

Scaling Across Repos: Griffin AI vs Mythos

Tool-Call Hijacking: Griffin AI vs Mythos

Related articles in AI Security

Building an Eval Suite for Your Security LLM Workflows

Zero-Day Discovery With LLM-Augmented Reachability: A Safeguard Engine Walkthrough

Frontier LLM Vendors Are Not Your Supply Chain Security Vendor

Never miss an update

Product

Solutions

Compare

Resources

Company

Legal

Developers