AI Security

CycloneDX ML-BOM in 1.7: Implementation Guide

CycloneDX 1.7 was published in October 2025 and adopted by the General Assembly in December. We unpack what the ML-BOM capability means in practice for AI inventory.

CycloneDX 1.7 was published in October 2025 and ratified by the ECMA General Assembly in December 2025 as the second edition of ECMA-424. The release consolidates prior work on Machine Learning Bill of Materials (ML-BOM, first introduced in CycloneDX 1.5 in 2023) and adds enough schema discipline that producing a defensible AIBOM is now a tractable engineering project rather than a research exercise. This post walks through what an ML-BOM is, what 1.7 changed, and how to operationalize it for an enterprise with a non-trivial number of models, datasets, and fine-tunes in production.

What does an ML-BOM actually describe?

An ML-BOM extends the CycloneDX component model with new component types: machine-learning-model and data (for datasets). A model component records identity (name, version, vendor, license, supplier, hash), provenance (training source, training pipeline references), characteristics (architecture, parameter count, modality), and the model card metadata (intended use, limitations, ethical considerations, safety evaluations). A dataset component records identity, sourcing, contents description, sensitive-content indicators, and licensing. Critically, ML-BOM is not a separate document type — it sits inside the same CycloneDX document as your software SBOM, so the resulting artifact describes the entire stack: model + inference runtime + dependencies + datasets used for fine-tuning. That unified view is the operational point.

What did CycloneDX 1.7 specifically change for ML-BOM?

Three concrete improvements. First, model-card metadata is structured rather than free-text, with required fields for intended use, out-of-scope use, performance metrics, and ethical considerations — matching closely the Google/Meta model card conventions. Second, dataset components support compositional descriptions, so a fine-tune trained on a mixture (60% Common Crawl filtered, 30% internal documents, 10% synthetic) can be represented without losing the proportion information. Third, the formula identification fields (cdx:learning_type, cdx:hyperparameter, etc.) are properly typed and validatable. The 1.7 spec is the first version where automated tooling can produce a compliant ML-BOM from a Hugging Face model card and a training config without manual annotation.

How do we generate an ML-BOM from a Hugging Face model?

The minimum-viable pipeline: extract metadata from the model card (README.md at the model root), the config.json, and the model_index.json if it exists. Map model identity to a component, attach SHA-256 hashes of each weight file as hashes, and produce a mlbom block. The cyclonedx-cli 0.27+ and cdxgen 11.0+ both support the 1.7 schema. The output looks like this:

{
  "bomFormat": "CycloneDX",
  "specVersion": "1.7",
  "components": [
    {
      "type": "machine-learning-model",
      "bom-ref": "pkg:huggingface/meta-llama/Llama-4-Scout-17B-16E@a3f1c2e",
      "name": "Llama-4-Scout-17B-16E",
      "version": "a3f1c2e",
      "supplier": { "name": "Meta Platforms" },
      "licenses": [{ "license": { "name": "Llama 4 Community License" }}],
      "hashes": [{ "alg": "SHA-256", "content": "..." }],
      "modelCard": {
        "modelParameters": {
          "task": "text-generation",
          "architectureFamily": "transformer",
          "modelArchitecture": "Llama4ForCausalLM"
        },
        "considerations": {
          "useCases": ["assistive chat", "code generation"],
          "outOfScopeUses": ["high-stakes medical decisions"],
          "ethicalConsiderations": [{ "name": "training-data-bias", "description": "..." }]
        }
      }
    }
  ]
}

How do we represent fine-tunes and adapters?

A LoRA adapter or a full fine-tune is its own component with a pedigree block that references the base model. The pedigree captures ancestors (the base model bom-ref) and commits (the training run identifier, ideally a Weights and Biases or MLflow run URL). Datasets used in training are listed as compositions or as separate data components with purpose: "training-data". The result is a derivation graph: anyone reading the BOM can trace your production model back through every fine-tune, every dataset, and every base model. For regulatory contexts (EU AI Act Article 10 data governance), that derivation graph is the artifact auditors will ask for first.

What about vulnerable models — does ML-BOM tie into VEX?

Yes, and 1.7 makes this practical. CycloneDX 1.7 includes the vulnerabilities block at the document level, and ML-specific vulnerabilities (model poisoning fingerprints, known-bad checkpoints from the nullifAI campaign, vLLM CVE-2025-66448 affecting specific inference setups) can be referenced. Pair this with OpenVEX or the CycloneDX VEX extension to suppress non-exploitable findings — for example, a known vulnerability in an inference runtime that does not affect your deployment configuration because you disabled the affected feature. The AIBOM thus becomes a living document, not a static inventory: as vulnerabilities and VEX statements arrive, your AI risk surface updates automatically.

What tooling exists to produce ML-BOMs at scale?

Three tools matter operationally as of 2026. cdxgen (from the OWASP CycloneDX project) supports model-component generation from Python and Java ML projects, reading the model registry and producing a 1.7-compatible JSON. The cyclonedx-python-lib provides programmatic generation for organizations with custom ML pipelines that do not match a standard registry pattern. Anchore Syft added experimental ML-BOM support in 2025 and can detect Hugging Face models in container images, producing a unified SBOM-plus-ML-BOM artifact. For most enterprises the right approach is a small wrapper around cyclonedx-python-lib that reads your ML platform's metadata (MLflow, Weights and Biases, Vertex Model Registry, Bedrock model registry) and produces the ML-BOM as part of the model promotion pipeline — the same way SBOMs are produced as part of container build pipelines today.

What is the right granularity for the AIBOM?

A common adoption mistake is producing one giant ML-BOM that covers every model your organization has ever touched. The right granularity is one ML-BOM per deployed inference endpoint, scoped to the model and its direct dependencies. If you have ten production endpoints, you have ten ML-BOMs, each refreshing when its endpoint deploys a new model version. The per-endpoint scope matches how SBOMs work for container images: one image, one SBOM, with the image's running configuration. A cross-cutting "model registry inventory" is a useful operational artifact but it is not an AIBOM in the regulatory sense — auditors will ask for the artifact that describes a specific deployed system, not the global registry. Plan your AIBOM tooling to produce per-deployment artifacts and aggregate them at query time.

How does this map to the EU AI Act and SSDF?

The EU AI Act Article 10 (data governance) and Annex IV (technical documentation) effectively require what an ML-BOM produces: a record of training datasets, their sourcing, bias mitigation steps, and a description of the model's intended purpose. The NIST SSDF v1.1 task PS.3 (archive and protect each software release) and PS.4 (provenance information) extend naturally to ML artifacts when interpreted through the lens of the CISA Secure by Design guidance for AI. CycloneDX 1.7 ML-BOM is not strictly required by any regulation as of this writing, but it is the only artifact format that produces the data those regulations demand at the granularity they demand. Adopting it now is significantly cheaper than retrofitting after enforcement actions begin.

How Safeguard Helps

Safeguard generates CycloneDX 1.7 ML-BOMs directly from your model registry, your Hugging Face pulls, and your training pipeline metadata, with no manual annotation required for standard cases. The platform attaches model-card metadata, training-data summaries, and SHA-256 hashes automatically, and merges the ML-BOM with the software SBOM for the inference runtime — so the artifact you ship downstream describes the whole stack, not just the code half. Griffin AI parses upstream model cards (Hugging Face, Kaggle, NGC) and surfaces gaps in your ML-BOM against the EU AI Act Annex IV requirements. Policy gates block deployment of models that lack a complete ML-BOM or that reference datasets without provenance records, and VEX integration suppresses non-exploitable findings so the AIBOM remains a useful security artifact rather than a wall of noise. The result: regulatory documentation and operational security converge on the same machine-readable file.

cyclonedx ml-bom aibom sbom ai-supply-chain

Back to all articles

More on #cyclonedx

View all →

SBOM

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.

CycloneDX ML-BOM in 1.7: Implementation Guide

What does an ML-BOM actually describe?

What did CycloneDX 1.7 specifically change for ML-BOM?

How do we generate an ML-BOM from a Hugging Face model?

How do we represent fine-tunes and adapters?

What about vulnerable models — does ML-BOM tie into VEX?

What tooling exists to produce ML-BOMs at scale?

What is the right granularity for the AIBOM?

How does this map to the EU AI Act and SSDF?

How Safeguard Helps

More on #cyclonedx

How to Read a CycloneDX SBOM: A Line-by-Line Walkthrough

cdxgen v12: Reachability Evidence Lands in SBOMs

AI Bill of Materials (ML-BOM) Standards in 2026

Best SBOM Management Platforms 2026 Review

Related articles in AI Security

NIST SP 800-218A: Operationalizing AI Secure Development in 2026

Ollama CVE-2026-7482 'Bleeding Llama': Out-of-Bounds Read

Building an Eval Suite for Your Security LLM Workflows

Never miss an update

Product

Solutions

Compare

Resources

Company

Legal

Developers