AI Security

AI Model Watermarking and Provenance

Watermarking and provenance are the two most confused terms in AI security. A practical breakdown of what each actually does, where the 2025 techniques break, and what to ship in the meantime.

Shadab Khan
Security Engineer
7 min read

Every few months another regulator, standards body, or vendor announces the latest "solution" for AI content authenticity, and every few months security engineers have to relearn why the announcement does not actually solve the problem. The EU AI Act's provenance requirements under Article 50 came into force in August 2026, NIST's AI 100-4 guidance on content authentication landed in January 2025, and the White House EO on AI from late 2023 started the watermarking conversation in earnest. Meanwhile every major model vendor has shipped something they call watermarking or provenance.

The problem is that watermarking and provenance are doing different jobs, and the vendors and the press routinely conflate them. If you are building or buying an AI system and need to answer "where did this come from" or "is this from my model" defensibly, you need to be precise about which technique you are using and what threat model it actually survives.

Watermarking is a signal inside the output, provenance is a claim outside the output

Watermarking embeds a detectable signal into the model's output such that a detector can later recover the signal from a candidate sample. The signal can be in the token distribution of generated text, the pixel distribution of generated images, or the spectral content of generated audio. The detector takes a candidate sample and returns "yes, this came from the model" or "no, it did not" with some confidence. The signal has to survive reasonable transformations like cropping, recompression, paraphrasing, or resampling.

Provenance is a separate claim, usually a cryptographic assertion, attached to a piece of content that says what system produced it, when, and under what conditions. The signature is verified against a public key. The claim is outside the content, not inside it. C2PA is the best-known provenance standard, adopted across Adobe, Microsoft, Sony, BBC, and the major camera manufacturers, and it is what most "provenance" announcements in 2025 mean.

The two are complementary, not interchangeable. Watermarking survives stripping of external metadata but can be weakened by heavy transformation. Provenance survives heavy transformation but is trivially stripped by anyone who wants to remove it. A defensible authenticity story needs both.

What the 2025 watermarking landscape actually looks like

Google DeepMind shipped SynthID for text in October 2024 and published the Nature paper in the same window. SynthID modifies the sampling distribution during generation to bias certain tokens in a way that is imperceptible to humans but detectable statistically across a long enough passage. The detector has access to the watermarking key and runs a statistical test. For images, SynthID uses a neural network embedded during generation to produce a watermark that survives typical editing operations.

OpenAI talked about a text watermarking tool publicly in August 2024 but declined to ship it after internal testing showed that it could be defeated by paraphrasing runs through a competitor model. Anthropic has not publicly shipped watermarking on its production endpoints as of late 2025. Meta shipped Stable Signature for images in 2023 and has iterated since.

The honest read on 2025 text watermarking is that it works against low-effort attackers and does not work against any attacker who runs the output through a second LLM. The Stanford HELM team and several academic groups have published strong evidence through 2024 and 2025 that paraphrase attacks degrade text-watermark detection below usable thresholds. Image and audio watermarking hold up better against common editing operations but also fall to targeted adversarial attacks.

What the 2025 provenance landscape actually looks like

C2PA reached version 2.0 in 2024 and is now integrated into the default camera apps on several flagship devices, Adobe's creative suite, and Microsoft's content creation tools. The standard defines manifest formats, signing chains, and trust lists. When a C2PA-signed image is edited, the edit history is appended to the manifest as a chain of signed operations, so a downstream verifier can see every tool that touched the content.

For model outputs specifically, OpenAI, Google, Meta, and Adobe all sign their generated content with C2PA manifests by default as of 2025. The manifest asserts which model generated the content, when, and which user or API key requested it. The claim is cryptographically verifiable against the vendor's public keys.

The provenance problem is not the cryptography, it is the threat model. C2PA is a positive-assertion system. Content that has a valid C2PA manifest and passes verification is proven to come from the claimed source. Content that lacks a manifest is unclaimed, which is operationally indistinguishable from authentic human content without a manifest. Attackers who want to pass AI content off as human simply strip the manifest. The standard has no way to stop them.

Model signing and weight provenance as a related but separate problem

Watermarking and C2PA both address the output side. There is a parallel and equally important problem on the input side which is "is this actually the model I think it is." This is model signing and weight provenance.

Sigstore's model-signing specification landed in a stable state in mid-2025 and is now supported by Hugging Face Hub for uploads. The specification uses standard Sigstore tooling to sign model weights and publish attestations that tie a specific hash to a specific builder identity. The supply-chain framework is SLSA-compliant and integrates with cosign.

Separately, the Coalition for Secure AI published its model-provenance framework in 2025, and in-toto has ML-specific attestation types that cover training-data provenance, fine-tuning provenance, and inference-time attestation. These are the tools you use to prove "this model was built from this training run by this team with these dependencies."

Model signing is more mature than output watermarking in late 2025. If you are shipping a model, sign it. The tooling is there, the standards are there, and the ecosystem is adopting. The remaining gap is in model registries that do not yet verify signatures by default.

What to ship in late 2025 and through 2026

If you are a model provider, ship C2PA signatures on every output by default, and ship your best-effort watermarking with full disclosure that it is defeatable by paraphrase. Do not oversell watermarking to your customers or to regulators. The legal exposure from a watermark that was presented as robust and turns out to be defeatable is not worth the marketing win.

If you are a model consumer, verify C2PA signatures on all inbound AI-labeled content and treat unsigned AI-like content as if it could be anything. Do not rely on watermark detection as your primary authenticity signal because it will miss the adversarial cases.

If you are building internal models, sign your weights with Sigstore, publish attestations, and require signature verification at inference deployment. If your MLOps pipeline does not refuse to load an unsigned model into production, your supply chain has a gap that any upstream compromise will walk through.

If you are a regulator or a policy team reading this, the important thing to know is that "watermarking" as a requirement without a defined threat model will produce compliance checkboxes that do not protect against the actual attacks. Provenance via C2PA plus model signing is a stronger regulatory foundation, and it is the foundation that the standards bodies are converging on through 2026.

How Safeguard Helps

Safeguard treats model provenance as a first-class supply chain property. Our AI-BOM inventories every model, weight file, and fine-tune in your environment, with Sigstore and in-toto attestations verified at ingest and at deployment. Griffin AI maps the provenance chain from training data through weights to inference endpoint, flagging gaps where signatures are missing or signers are not on your trust list. Our container self-healing refuses to load an unsigned or unverifiable model into production, and C2PA verification is wired into the outbound content path so that any AI content you publish carries a valid manifest. The result is a provenance story that survives audit and adversarial review.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.