← Concepts & Glossary
Inventory & Provenance

AI-BOM

An SBOM extended to model weights, training data, and AI runtime components.

What is an AI-BOM?

An AI-BOM is a Software Bill of Materials extended to cover the parts of a system that a traditional SBOM has no language for: model weights, tokenizer files, training datasets, fine-tune recipes, embedding models, vector database contents, retrieval corpora, and the specific inference runtimes (vLLM, TGI, TensorRT-LLM) that serve them.

It is an emerging standard. CycloneDX 1.6 added first-class model and dataset components; NIST and the EU AI Act are converging on compatible disclosure requirements. The core idea: if a component influences the behaviour of a deployed AI feature, it belongs in the bill of materials.

How it works

An AI-BOM extends the SBOM schema with AI-specific component types and relationships:

  1. Model components. Each model is catalogued by its Hugging Face identifier or equivalent, pinned to a specific revision hash, license, and provenance (who trained it, on what data, under what terms).
  2. Data and fine-tune lineage. Training corpora, embedding datasets, and fine-tune recipes are recorded with dataset hashes so a consumer can verify — or at minimum, cite — what the model was exposed to.
  3. Runtime and retrieval context. The inference server, tool-calling surface, guardrails, and vector-store contents are listed as components. When a RAG pipeline changes which documents are indexed, that is a supply-chain change — and it belongs in the AI-BOM.

Why it matters

AI features inherit a new category of supply-chain risk that traditional SBOMs cannot see. A poisoned fine-tune dataset, a compromised model checkpoint on a public hub, a prompt-injection vector sitting in an indexed document — none of these show up in package.json, but all of them can change what your product does in production.

Regulators are catching up fast. The EU AI Act's transparency requirements and NIST's AI RMF both point at "document the components that shape model behaviour." An AI-BOM is how that documentation becomes a machine-readable artifact instead of a Word document.

What value it adds

  • AI supply-chain visibility

    Know every model, dataset, and fine-tune shipping in every AI feature — at the level of a pinned hash, not a marketing name.

  • Model-poisoning blast radius

    When a public checkpoint is revoked or identified as poisoned, a one-line query tells you which products used it and when.

  • RAG corpus drift detection

    Changes to indexed documents appear in the AI-BOM diff — so prompt-injection vectors hiding in content updates surface before they reach production.

  • EU AI Act and NIST AI RMF alignment

    Transparency documentation becomes a generated artifact rather than a quarterly scramble of spreadsheets.

  • License and data-rights traceability

    Was this model trained on licensed data? Can we ship it to customers in jurisdiction X? The AI-BOM makes that answerable instead of guessable.

How Safeguard uses it

Safeguard inventories model and dataset components alongside software dependencies in SBOM Studio, powers AI governance workflows, and feeds evidence into SLSA provenance attestations for AI artifacts.

Inventory your AI supply chain.

Point Safeguard at a repo, registry, or deployment. Get an AI-BOM covering models, datasets, and retrieval corpora.