AI Security

Training Data Provenance for Enterprise Fine-Tuning

Fine-tuning corpora are supply chain artifacts. We cover the provenance signals, attestations, and drift controls enterprises need before pushing weights to prod.

Nayan Dey
Senior Security Engineer
5 min read

A regional bank we audited in October 2025 fine-tuned a Llama 3.1 70B checkpoint on 14 million internal tickets to power an internal assistant. Nobody could answer a basic question six months later: which tickets were in the training set? The data engineering team had pulled from a Snowflake view that had been redefined twice; the view definition wasn't versioned; the underlying tables had GDPR-driven deletions applied after training. The model had almost certainly memorized content that should no longer exist. There was no way to prove it hadn't without re-running the extraction, which was impossible because the source state was gone. This is the default posture for enterprise fine-tuning today. The corpus is treated as a one-shot input, not a versioned artifact with a provenance chain. When a regulator asks "what's in the model," the honest answer is "we don't know."

Why is fine-tuning data a supply chain problem?

Because a fine-tuned model is a derived artifact whose behavior is a function of inputs that are almost never signed, never hashed, and rarely content-addressed. Pre-training gets the attention — the Pile, RefinedWeb, FineWeb, C4 — and those corpora at least have published checksums and data cards. Enterprise fine-tuning corpora are ad-hoc SQL dumps, SharePoint exports, and CSVs passed between teams. The SLSA v1.1 provenance model, the in-toto attestation spec, and the emerging ML-BOM schema from CycloneDX 1.6 all have the primitives to describe training inputs. Few teams use them. The result is that when a prompt injection later extracts a verbatim customer record, nobody can tell whether the record was in the training set, the RAG index, or the prompt.

What provenance fields actually matter for fine-tuning?

The minimum viable set is: source system identifier, query or extraction definition, extraction timestamp, row-count, content hash of the serialized corpus, transformations applied, and the identity that performed the extraction. CycloneDX 1.6 adds formulation blocks that carry this cleanly; MLflow 2.17 has dataset lineage that maps to the same fields. The important property is that the content hash covers the final, post-transform corpus — the exact bytes fed to the trainer. Hashing the source query is insufficient because two runs of the same query against a mutable source produce different corpora. We've seen teams rely on S3 object versioning, which is better than nothing but doesn't cover in-memory transformations in a Ray or Spark job that never persists intermediate state.

How do you handle deletions after the model ships?

You can't delete from weights, but you can maintain an obligation ledger. When a record is deleted from the source system, an obligation is written: "record X was present in training corpus Y with hash Z, and has been deleted from source on date D." At audit time, you can answer "what's in the model that shouldn't be?" by joining the obligation ledger against the corpus manifest. This doesn't unlearn the data, but it is what regulators under the EU AI Act and the GDPR-AI guidance published in September 2025 are starting to ask for. Unlearning research from Microsoft and Google through 2025 has produced techniques like SISA and approximate unlearning, but the machine-verifiable audit trail matters more for compliance than the unlearning itself.

What about synthetic data and LLM-generated augmentation?

Synthetic data inherits the provenance of the generator. If you use GPT-5 or Claude 4 to augment a classification dataset, your corpus provenance now includes a model identifier, a model version, the system prompt, the sampling parameters, and ideally the inference timestamp. OpenAI's November 2025 data policy change means outputs generated via the API are not used for training, but it does not prevent the output from carrying their copyrighted training content through memorization. A customer of ours found that roughly 0.7% of GPT-4o-generated product descriptions contained verbatim strings from competitor sites — measurable only because they had hashed the synthetic corpus and compared it against a web snapshot. Without the provenance, the leak would have shipped into their fine-tuned model.

How does reachability apply to training data?

Reachability here means: which rows in the corpus materially affected which model behaviors? Exact answers require expensive influence-function work (TRAK from MIT's CSAIL, 2024, remains the best open technique), but coarse answers are cheap. Tag every training row with a topic cluster at ingest; measure model behavior change on held-out probes before and after fine-tuning; correlate behavior shifts with cluster presence. When the model starts confidently answering questions about a product line you didn't train on, the provenance index tells you which cluster seeped in. We've used this pattern to catch a 3,100-row subset of a support corpus that carried internal pricing data the model then surfaced verbatim under specific prompts.

Who signs the corpus and the resulting weights?

The same entities that sign your container images. Sigstore's cosign, extended with the model-signing attestation format published by the OpenSSF AI/ML Security SIG in June 2025, lets you produce an in-toto statement binding the corpus hash, the training script commit, the trainer identity, and the output weight hash. We've deployed this on Kubeflow and Vertex AI pipelines; it adds perhaps 45 seconds to a training job and gives you a cryptographic chain from SQL query to safetensors file. When an incident response later asks "did this checkpoint come from an authorized pipeline," the attestation answers in seconds instead of days.

How Safeguard Helps

Safeguard treats fine-tuning corpora as first-class artifacts in the AI-BOM, capturing source extractions, transformation lineage, and post-transform content hashes alongside the resulting weight artifacts. Griffin AI correlates corpus clusters with model-behavior drift across evaluation runs, so regression in one topic surfaces against the data subset that likely caused it. Policy gates block training pipelines that produce weights without a valid provenance attestation, and the reachability view shows which downstream products consume which checkpoints — so when a deletion obligation lands, you know which systems need to be re-evaluated.

Related articles in AI Security

AI Security

Safeguard Now Supports Every Major AI Model Family for Zero-Day Discovery: Anthropic, OpenAI, Gemini, Microsoft, Meta, and Your Own Models

You should not have to choose between your organization's AI strategy and your security platform. Safeguard's agentic zero-day discovery and remediation pipeline now works on Anthropic Claude Fable 5, OpenAI GPT, Google Gemini, Microsoft Phi, Meta Llama, Safeguard native models, and privately hosted custom models — all running as first-class agents in the same Multi-Agent TAOR Deep Think AI Engine.

June 9, 2026Read
AI Security

Anthropic Claude Mythos Releases Tomorrow: Capabilities, Benchmarks, and What Security Teams Must Do Now

Anthropic's Claude Mythos model goes public on June 10, 2026 — a frontier AI that scored 97.6% on the Math Olympiad, completed expert-level hacking tasks at 73% success, and found 271 vulnerabilities in Firefox 150. Here is everything security teams need to know before it lands, and how Safeguard already supports Mythos zero-day discovery natively.

June 9, 2026Read
AI Security

Claude Fable 5: Anthropic's Most Capable Public Model Is Here — Benchmarks, Capabilities, and What It Means for Security

Anthropic just released Claude Fable 5, its most capable publicly available model and the first Mythos-class AI open to everyone. 80.3% on SWE-Bench Pro, 88% on Terminal-Bench 2.1, state-of-the-art across software engineering, vision, and scientific research. Safeguard has already integrated Fable 5 natively — here is everything you need to know.

June 9, 2026Read

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.