SPDX 3.0 was published in March 2025 by the Linux Foundation, introducing a profile-based architecture that separates concerns cleanly: a core SPDX document can carry a Software profile, a Security profile, a Build profile, an AI profile, a Dataset profile, or several at once. The AI profile is the part most security teams care about — it gives SPDX a defensible answer to the question "how do we describe an AI system as a bill of materials." For organizations already invested in SPDX for software-license compliance, the 3.0 release means you do not need to switch to CycloneDX to produce an AIBOM. This post walks through what the AI profile actually requires, how to generate one in practice, and where it differs from CycloneDX ML-BOM.
What is in the SPDX 3.0 AI profile?
The AI profile defines an AIPackage element extending the core SPDX Package. It captures the model's identity, the type of AI system (foundation model, fine-tune, agent, etc.), the model's intended domain (medical, legal, general-purpose), and a set of operational characteristics: training methodology, data handling, explainability properties, energy consumption, and safety risks. The element also references the Dataset profile for any training or evaluation datasets that are themselves first-class SPDX elements. The schema explicitly maps to Model Card concepts so that an existing Hugging Face model card can be transformed into an SPDX AIPackage with minimal information loss.
How does the Dataset profile interact with the AI profile?
The Dataset profile is its own SPDX element, DatasetPackage, with fields for dataset identity, type (training, validation, test, evaluation), data collection process, known biases, sensitive content indicators, anonymization techniques applied, and intended use. The AI profile references one or more Dataset packages via SPDX relationship triples — typically usedBy and trainedOn. This separation matters: a single dataset (say, a curated corpus of legal documents) may train multiple models, and modeling it as an independent SPDX element rather than embedded inside each model package avoids duplication and inconsistency. For organizations that consume datasets from Common Crawl, LAION (post-Re-LAION-5B remediation), or domain-specific sources, the Dataset profile is the documentation surface auditors will inspect first.
What does an SPDX 3.0 AIBOM look like?
Here is a minimal example showing a fine-tuned model, its base model, and a training dataset:
{
"@context": "https://spdx.org/rdf/3.0.1/spdx-context.jsonld",
"@graph": [
{
"spdxId": "spdx:LegalAssistantFineTune-v1.2",
"type": "ai_AIPackage",
"name": "LegalAssistantFineTune",
"ai_modelExplainability": ["chain-of-thought"],
"ai_safetyRiskAssessment": "moderate",
"ai_domain": ["legal"],
"ai_typeOfModel": ["fine-tune"],
"relationship": [
{ "type": "trainedOn", "to": "spdx:LegalCorpus-2025" },
{ "type": "descendantOf", "to": "spdx:Llama-4-Scout-17B-16E" }
]
},
{
"spdxId": "spdx:LegalCorpus-2025",
"type": "dataset_DatasetPackage",
"name": "LegalCorpus-2025",
"dataset_datasetType": ["training"],
"dataset_sensitivePersonalInformation": "no",
"dataset_anonymizationMethodUsed": ["named-entity-redaction"],
"dataset_intendedUse": "Fine-tuning legal-domain assistant"
},
{
"spdxId": "spdx:Llama-4-Scout-17B-16E",
"type": "ai_AIPackage",
"name": "Llama-4-Scout-17B-16E",
"ai_typeOfModel": ["foundation-model"],
"supplier": "Meta Platforms"
}
]
}
The relationship triples form the derivation graph: anyone reading the SPDX document can trace a production model back through every fine-tune, every dataset, and every base model — the same operational property as CycloneDX 1.7 ML-BOM, just expressed in a different schema.
How does SPDX 3.0 compare to CycloneDX ML-BOM?
Both formats now cover the same conceptual ground: model identity, dataset identity, training pedigree, and model-card metadata. CycloneDX 1.7 has a tighter ML-specific schema with structured formula identification, while SPDX 3.0 uses its general-purpose Element model and lets the AI profile add domain-specific properties. CycloneDX is JSON-only; SPDX 3.0 supports JSON-LD natively and produces a queryable RDF graph. Pragmatically: if your existing SBOM tooling is CycloneDX, stay there and use ML-BOM. If you are SPDX-heavy (most Linux distributions, many regulated industries) use SPDX 3.0 AI profile. The Linux Foundation and OWASP teams maintain bridge tooling that can convert between the two — adopt-then-converge is a defensible posture.
What tooling produces SPDX 3.0 AI profile documents?
The Linux Foundation maintains the reference Python library at github.com/spdx/spdx-3-model with parsers, validators, and generation utilities for SPDX 3.0 documents. SPDX-tools provides CLI utilities including a 1.x-to-3.0 converter for organizations migrating from existing SPDX 2.x documents. Several commercial SBOM platforms (Anchore, FOSSA, Snyk, Black Duck) added SPDX 3.0 support in 2025, though the AI-profile generation features lag the core SPDX 3.0 support by a few releases. For organizations that already produce SPDX 2.x SBOMs for software, the migration path is: upgrade to SPDX 3.0 for software in Q1, add the AI profile to the same documents for models in Q2-Q3. The unified document model in SPDX 3.0 means you do not produce a separate AIBOM — you produce a single SPDX document that carries both software and AI profiles, which is operationally simpler than maintaining two artifact types.
What is the adoption picture as of 2025-2026?
Adoption has been gradual. Huawei has contributed AIBOM standard proposals to the SPDX community and has promoted adoption across telecommunications enterprises. NVIDIA's NGC catalog supports model-signing under the OpenSSF specification but has not yet committed to a single AIBOM format. Hugging Face ships model cards in their own metadata format and provides export tooling. The honest assessment: most production AIBOMs as of 2026 are incomplete — they capture the base model but not the fine-tunes, or the model but not the datasets. The remediation is process, not just format: every model that touches production needs an owner who is accountable for keeping its AIBOM current, the same way every microservice needs a runbook owner.
How do we handle multi-vendor AI stacks in one document?
A real production AI system often combines components from multiple vendors: an Anthropic Claude inference endpoint as the primary reasoning model, a Cohere embedding model for retrieval, an internal fine-tuned classifier for safety, and a vector store running on Pinecone. The SPDX 3.0 architecture handles this cleanly because the document is a graph: each vendor's contribution is its own AIPackage with its own supplier, license, and relationship triples to other components. The unified document gives auditors a single artifact that describes the full stack including the cross-vendor dependencies — a question that becomes important when one vendor's deprecation affects another vendor's behavior. The relationship vocabulary supports dependsOn, usedBy, descendantOf, and several others; pick the right triple type per relationship and the resulting graph is queryable for impact analysis.
What about regulatory alignment?
The EU AI Act Article 10 (data governance) requires training-dataset documentation; SPDX Dataset profile produces it. Annex IV (technical documentation) requires a system architecture description; SPDX AI profile produces it. NIST AI RMF GOVERN, MAP, MEASURE, and MANAGE functions all require artifacts that SPDX 3.0 directly maps to. The US Executive Order 14028 attestation requirements extend to AI components when interpreted through CISA's Secure by Design AI guidance. None of these regulations strictly require SPDX or CycloneDX, but both formats produce the data those regulations demand. The risk of doing nothing is not just regulatory exposure — it is the operational debt of trying to answer "where did this model come from" six months after an incident, when the engineers who built it have moved on.
How Safeguard Helps
Safeguard generates SPDX 3.0 AI-profile AIBOMs directly from your model registry, ML platform metadata, and training pipeline runs, and converts between SPDX and CycloneDX ML-BOM so downstream consumers can use whichever format their tooling supports. The Dataset profile is populated from your dataset catalog with provenance, sensitivity labels, and anonymization methods automatically attached. Griffin AI parses upstream model cards from Hugging Face, NGC, and Kaggle and fills in missing fields with citation pointers, so manual review focuses on the actual ambiguous cases rather than copy-paste work. Policy gates block deployments that ship without a complete AIBOM in either format, and EU AI Act Article 10 / NIST AI RMF compliance reports are generated directly from the AIBOM graph. The result: regulatory readiness and operational AI security are produced by the same artifact, maintained by the same workflow.