The EU AI Act enters its high-risk-system enforcement window in 2026, and Article 10 is where most engineering teams will first feel it. The article requires providers of high-risk AI systems to demonstrate that training, validation, and testing data sets are subject to "appropriate data governance and management practices", with explicit attention to relevance, representativeness, error checking, bias examination, and provenance. The compliance teams I work with read this and reach for a policy document. The data and ML teams read it and realise nothing in their existing tooling answers the questions Article 10 asks. The bridge between legal expectation and technical reality is the AI Bill of Materials. Done well, an AI-BOM operationalises Article 10 by capturing the exact fields the regulation cares about as machine-readable, signed evidence tied to the model artefact that ships. Done badly, it produces a glossy document that survives nothing harder than a desk review. This post walks the Article 10 obligations against the AI-BOM fields that satisfy them, with thresholds drawn from real high-risk-system rollouts in finance, healthcare, and HR through 2025-2026.
What Article 10 Actually Requires
Article 10 imposes seven concrete data governance expectations on high-risk AI systems. The relevant ones for AI-BOM design are:
- Data sets must be relevant, sufficiently representative, free of errors, and complete in view of the intended purpose.
- Data must be subject to appropriate governance practices covering design choices, data collection processes, and data preparation.
- Providers must examine data for biases that may produce prohibited discrimination.
- Personal data processing must comply with GDPR with documented legal basis.
- Provenance of training data must be documented.
This is not a checklist of platitudes. National regulators, including the French CNIL and German BfDI, have published technical guidance through 2025 indicating that Article 10 evidence will be evaluated on the basis of structured, auditable artefacts rather than narrative documents. The artefact regulators expect to see is some form of AI-BOM, even if that exact term is not used in the regulation.
Mapping Article 10 To AI-BOM Fields
Five AI-BOM fields carry most of the Article 10 weight.
The first is dataset identity and hash. Every training, validation, and testing dataset referenced by the model must have a unique identifier and a content hash. CycloneDX 1.6 data components and SPDX 3.0 Dataset profile both support this. Without hashes, "we used version X of dataset Y" is unverifiable; with hashes, it is. Aim for 100% hash coverage on all training and validation datasets, and at least 95% on testing datasets.
The second is provenance lineage. For each dataset, the AI-BOM should capture origin (internal_collection, licensed_third_party, public_open, synthetic), licence, consent basis, and a chain of derivation if the dataset was produced from upstream sources. Article 10's provenance requirement is not satisfied by a free-text description; it expects structured lineage that an auditor can walk.
The third is representativeness metadata. The AI-BOM should record the demographic and geographic distribution of training data where the intended purpose makes those dimensions relevant. For an HR screening model, that means age, gender, and protected-characteristic distributions in the training set, captured as structured fields rather than narrative claims.
The fourth is bias and error evaluation results. Article 10's bias examination obligation maps to a structured evaluation suite identifier, version, and result hash. The evaluation must be re-run on a defined cadence (we recommend 90 days for high-risk systems) and the AI-BOM must reference the latest run.
The fifth is GDPR legal basis and data minimisation evidence. Each dataset containing personal data must reference its legal basis under Article 6 GDPR (consent, contract, legitimate_interest, etc.) and a data minimisation assertion explaining why the included fields are necessary for the model's purpose.
The Twelve-Field AI-BOM For High-Risk Systems
Programmes shipping high-risk AI systems in 2026 should treat the following twelve fields per dataset as the floor:
- Dataset
purlor URI andsha256hash - Source classification (
internal,licensed,public,synthetic) - Licence (SPDX identifier where applicable)
- Consent basis (GDPR Article 6 reference if PII)
- PII risk class (
none,low,medium,high) - Snapshot date
- Size in records
- Demographic distribution structured object (where relevant)
- Bias evaluation reference (suite ID, version, result hash, run date)
- Error rate evaluation reference (same structure)
- Upstream lineage (parent dataset identifier where derivative)
- Steward identity (the person or team accountable for the dataset)
A realistic 2026 implementation captures these as CycloneDX 1.6 data components with custom property extensions for the fields not in the core schema. SPDX 3.0 captures them through the Dataset profile with similar property extensions.
Synthetic Data And The Provenance Trap
Synthetic data is increasingly common in 2026 model training, and Article 10 does not exempt it. A synthetic dataset has its own provenance chain: the generator model used, the seed data that prompted it, the filtering rules applied, and any human review. AI-BOM treats a synthetic dataset as a first-class data component with a synthesised-from relationship to its generator, recursively if the generator was itself synthetic.
The trap is treating synthetic data as provenance-free because it does not contain real PII. Article 10's bias examination obligation applies regardless. If the generator model produced synthetic data that under-represents a protected characteristic in the original distribution, the downstream model will inherit that bias. The AI-BOM should record the bias evaluation result for synthetic datasets with the same rigour as for real datasets.
Continuous Re-Evaluation, Not One-Time Approval
Article 10 obligations are continuous. A high-risk system that satisfied the bias examination requirement in March 2026 is not certified for September unless the evaluation has been re-run. AI-BOM operationalises this by treating evaluation references as time-bound: every reference includes a run date, and a configurable staleness window (we recommend 90 days for high-risk systems) drives an automated expired state that surfaces in dashboards.
A common failure mode in 2025 deployments: teams emitted a beautiful initial AI-BOM, satisfied the launch review, and let evaluation references age silently for six months. By the time a regulator asked, the underlying datasets had drifted, the evaluations were stale, and the AI-BOM described a state of the world that no longer matched production. Time-bounding evaluation references closes this gap mechanically.
The Provider-Deployer Split
Article 10 distinguishes providers (who place the AI system on the market) and deployers (who use it). Providers carry the data governance obligation. Deployers carry obligations around context-of-use monitoring and incident reporting. AI-BOM matters to both.
Providers emit AI-BOM as the primary evidence artefact. Deployers ingest provider AI-BOMs alongside their own usage telemetry to maintain the deployment-side record Article 26 requires. A deployer who cannot link a production incident to the provider's AI-BOM at the time of deployment has a defensibility gap. Treat AI-BOM ingest as a procurement obligation for any high-risk system you operate, even when you did not train the model yourself.
How Safeguard Helps
Safeguard maps AI-BOM ingest directly to Article 10 obligations. The platform accepts CycloneDX 1.6 mlModel and data components and SPDX 3.0 AI/Dataset profiles, normalises dataset identifiers, and tracks lineage recursively across upstream and synthetic data sources. Bias and error evaluation references are time-bounded with configurable staleness windows that mark stale evidence automatically. VEX statements extend to model-level findings so Article 10 representativeness and bias claims can be paired with structured exceptions. Signed attestations cover both provider-emitted AI-BOMs and deployer-ingested artefacts using a Sigstore-compatible chain, producing the auditable, machine-readable evidence Article 10 enforcement is converging on through 2026 and beyond.