SBOM & Compliance

AI-BOM Explained: Tracking Models As Supply Chain

AI models are now first-class supply chain components. Here is how an AI-BOM captures lineage, datasets, runtimes, and evaluations in a way that survives audit.

The first time a regulator asks "what is in your model?" most engineering teams realise their SBOM does not answer it. A modern production AI feature is not a single artefact. It is a base model pulled from a registry, a tokenizer with its own version drift, a set of fine-tuning datasets each with its own licence and consent status, a quantisation pass, an inference runtime like vllm or tgi, a serving stack, and a handful of guardrail models layered on top. Traditional SBOM tooling captures the Python wheels in the container and stops. The interesting risk lives in the model artefacts and their provenance, and that surface is invisible to a CycloneDX 1.4 emitter that only walks pip freeze. AI-BOM is the answer: a structured, signed, machine-readable record of the model supply chain that sits next to your SBOM and answers the questions auditors, customers, and incident responders will ask in 2026. This post explains what an AI-BOM is, why a traditional SBOM is not enough, the minimum fields that hold up under scrutiny, and the operational patterns that keep the data fresh.

Why SBOM Alone Misses The Model Layer

A typical ML service container has 200-400 Python dependencies and weighs in at 4-8 GB. A standard SBOM captures all of those, and zero of the things that actually matter for AI governance. The base model weights, often 10-70 GB, are mounted from a volume or pulled at startup from a registry; the SBOM never sees them. The training data that shaped the model is upstream of the build entirely. The evaluation suite that demonstrates the model meets safety thresholds runs in a separate pipeline. The guardrail prompts that wrap the model live in a config repo.

When xz-utils 5.6 had its backdoor in 2024, SBOM consumers could answer "do we ship it?" in minutes. When a popular open-weights model was found to have been fine-tuned on a copyrighted corpus in late 2025, almost nobody could answer the equivalent question, "do we ship a derivative?", in less than a week. The data was not in the SBOM. It was nowhere structured at all.

AI-BOM closes that gap by treating models, datasets, prompts, and evaluation suites as first-class components with their own identifiers, versions, suppliers, hashes, licences, and relationships.

The Minimum Viable AI-BOM

A useful AI-BOM does not need to capture every gradient. It needs to answer five questions for every model surface you ship:

What is the model's identity and where did it come from?
What was it trained or fine-tuned on, and under what licence?
How was it evaluated, by whom, and when?
What runtime serves it, and what is the integrity story?
Who is responsible if it breaks?

CycloneDX 1.6 covers this with mlModel and data component types. Twelve fields per model is a reasonable floor:

purl or model URI (for example pkg:huggingface/meta-llama/Llama-3.1-8B-Instruct@<sha256>)
Base model identifier and version
Fine-tune lineage (parent model and training dataset references)
Tokenizer identifier and version
Quantisation method and bit-width
Training dataset list with licence, consent status, and PII assessment
Evaluation suite identifier, version, and result hash
Inference runtime (vllm, tgi, triton, onnxruntime) with version
Hardware class (for example H100-80GB, MI300X)
Weight hash (sha256 of the safetensors or GGUF artefact)
Supplier and licence
Date of last revalidation

If you cannot fill a field, mark it unknown with a remediation date. Auditors react far worse to invented metadata than to honest gaps.

Datasets Are The Hardest Field

Training data provenance is where most AI-BOM implementations either succeed or quietly collapse. Modern instruction-tuned models often have 10-40 fine-tuning datasets. Each one needs licence (cc-by-4.0, mit, proprietary), consent status (whether subjects opted in), PII risk class (none, low, high), and a hash of the snapshot that was actually used. Datasets drift; "the Common Crawl May 2024 snapshot" and "the Common Crawl August 2024 snapshot" are different supply chain inputs.

For derivative models, lineage is recursive. If you fine-tuned Llama-3.1-8B on 30k internal customer support tickets, your AI-BOM has to reference both the upstream model's lineage and your own fine-tune dataset hash. EU AI Act Article 10 makes this explicit for high-risk systems, and US sectoral regulators are following. The Article 10 expectations are detailed in our companion post on AI-BOM and EU AI Act data governance.

A practical pattern: keep dataset records in a content-addressable store keyed by hash, and reference the hash from the AI-BOM rather than embedding the dataset metadata. This keeps the AI-BOM small and lets you answer "every model trained on dataset D" in a single index lookup.

Evaluations As Supply Chain Evidence

An AI-BOM without evaluation references is a parts list without a quality certificate. For each production model, capture the identifier of the evaluation suite (for example an internal red-team battery, plus a third-party benchmark like MLCommons AILuminate), the version of the suite at run time, the result hash, the date of the run, and the pass/fail status against your published thresholds.

Two practical rules. First, evaluations are time-bound. A model that passed in January is not certified for August unless you re-ran. Set a maximum staleness, typically 30-90 days for production-facing models, and treat anything older as expired in the AI-BOM. Second, evaluation results should be signed by the team that ran them, not by the team that ships the model. Signature separation is what makes the evidence credible to an external auditor.

Runtime Integrity And The Inference Stack

The same model weights served by vllm 0.5.3 on H100 and by a custom CUDA kernel on consumer hardware are not the same supply chain. Quantisation choices, kernel implementations, and GPU driver versions all change behaviour. Your AI-BOM should record the inference runtime version, the quantisation pass, the GPU driver and CUDA version, and ideally the hash of the runtime container image.

Sign the model weights at registry-push time, and verify the signature at load time inside the inference server. If your serving fleet does not refuse to load an unsigned or mis-signed weights file, the AI-BOM is descriptive rather than enforcing, and adversaries know it.

Operational Hygiene

AI-BOM is only useful if it stays current. Three rules keep it honest. Generate AI-BOM at the same point in the pipeline that produces the model artefact, never after the fact. Sign and timestamp every emission. Re-emit on any change to weights, runtime, or evaluation status, not on a fixed schedule. Programmes that emit AI-BOM weekly drift quickly; programmes that emit on change stay accurate.

How Safeguard Helps

Safeguard treats AI-BOM as a first-class peer to SBOM. The platform ingests CycloneDX 1.6 mlModel and data components from CI, model registries, and serving stacks, normalises model identifiers across huggingface, ollama, and private registries, and indexes dataset lineage so a single query answers which products use a model derived from a flagged dataset. VEX statements extend to model-level vulnerabilities, letting teams suppress non-applicable findings on guardrail or sandboxed surfaces. Signed attestations cover weights, evaluation results, and runtime images using the same Sigstore-compatible chain Safeguard applies to traditional SBOM artefacts. The outcome is a single defensible record of the entire software-and-model supply chain that holds up to regulators, customers, and incident response in equal measure.

sbom ai-bom compliance supply-chain

Back to all articles

More on #sbom

View all →

SBOM

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.

AI-BOM Explained: Tracking Models As Supply Chain

Why SBOM Alone Misses The Model Layer

The Minimum Viable AI-BOM

Datasets Are The Hardest Field

Evaluations As Supply Chain Evidence

Runtime Integrity And The Inference Stack

Operational Hygiene

How Safeguard Helps

More on #sbom

SBOM vs. VEX: What's the Difference and When Do You Need Each?

How to Read a CycloneDX SBOM: A Line-by-Line Walkthrough

You Cannot Secure What You Cannot See: Asset Discovery

Building A Defensible SBOM Program In 90 Days

Related articles in SBOM & Compliance

Building A Defensible SBOM Program In 90 Days

CycloneDX 1.7 New Features Reviewed

SBOM-Driven Vendor Onboarding: Procurement Blueprint

Never miss an update

Product

Solutions

Compare

Resources

Company

Legal

Developers