Research

SBOM Quality Across Ecosystems: 2026 Report

The Safeguard Research team measured SBOM quality across ecosystems and generators. The gaps between formats, tools, and languages are larger than most teams assume.

Shadab Khan
Security Engineer
7 min read

A software bill of materials is only as useful as it is accurate. Procurement teams treat SBOMs as authoritative. Auditors treat them as evidence. Incident response teams try to use them to answer "are we affected" in the first hour of a new advisory. If the document is wrong, or quietly incomplete, all three of those uses fail silently.

The Safeguard Research team wanted to measure, in a rigorous way, how accurate SBOMs actually are in 2026. How many components does a typical generator miss? How different are outputs between tools? How do the gaps differ across ecosystems and artifact types? This is our report.

How did the team design the evaluation?

We built ground-truth inventories for a corpus of real applications across Node.js, Python, Java, Go, and Rust, along with container images for each. Ground truth came from instrumented builds that recorded every resolved artifact at compile or install time, cross-checked against manual review of lockfiles and vendored code.

For each application, we ran several widely-used SBOM generators and compared their outputs against ground truth on three dimensions: completeness (the share of true components listed), accuracy (the share of listed components that are correct and well-identified), and resolvability (the share of listed components whose identifiers, such as PURLs or CPEs, resolve back to the right upstream record).

We graded CycloneDX and SPDX outputs separately where both were available, and we kept container-native generation (scanning the image) distinct from source-native generation (scanning the build).

How complete are typical SBOMs?

Completeness varied widely by ecosystem and generator, in a range from mid-70s to mid-90s of a percent against ground truth, with a long tail of edge cases that no generator caught cleanly.

Node.js and Python ecosystems were generally the most complete, because lockfiles are standardised and widely used. Java and Rust were close behind. Go was the most uneven, depending heavily on whether the generator used the Go module graph, inspected the binary, or relied on build-time plugins.

The gap that most people do not expect is between source-native and container-native generation. Container scanners often missed artifacts installed outside the package manager, statically linked binaries, vendored libraries, and copies of libraries installed under atypical paths. Source-native generation missed anything that the build pulled in but did not record in the manifest. Both approaches had blind spots. Neither was reliably complete on its own.

How consistent are generators with each other?

Running two generators against the same input produced different SBOMs a disconcerting fraction of the time. In our corpus, pairwise component overlap between generators was typically in the 80% to 95% range, with higher disagreement on Java's complex classpath layouts and on any container image containing a mix of system and language packages.

Naming is a large part of the disagreement. Generators name the same component differently depending on whether they key off package-manager metadata, binary signatures, or file path heuristics. A component listed in one SBOM as org.apache.commons:commons-lang3 and in another as apache-commons-lang3 with a different version because the scanner read the wrong file is the same component for a human reader and a different component for every downstream tool.

This is the source of an enormous amount of real-world pain. Matching CVEs to components relies on stable identifiers, and the instability of identifiers across generators is the main reason teams see vulnerability counts jump by orders of magnitude when they switch tools.

How resolvable are the identifiers in modern SBOMs?

Identifier resolvability has improved over the last two years, but a meaningful share of PURLs and CPEs in modern SBOMs still fail to resolve cleanly to the upstream record they reference.

In our evaluation, PURL resolvability was generally strong, typically in the 90s of a percent for package-manager-native components, dropping for binaries and OS packages. CPE resolvability was substantially worse, because CPE naming is historical, ambiguous for many components, and not maintained in lockstep with real-world library releases.

The practical consequence is that a tool that matches advisories by CPE only will under-match in ways that are hard to detect. We strongly recommend using PURL as the primary identifier wherever possible, with CPE as a supplementary field rather than the authoritative one.

How does ecosystem type influence SBOM quality?

The cleanest SBOMs we saw were for ecosystems with deterministic lockfiles, reproducible builds, and a single canonical package identifier. The messiest were for environments that combine language-level packages, OS packages, and vendored or statically linked code.

Container images are the hardest case by a wide margin. A realistic production image mixes a language runtime, OS packages, installed language packages, installed binaries, and configuration files, any of which can bring security-relevant code along for the ride. Generators built for any single layer will miss what the other layers contain.

This is not a criticism of the tools. It is a structural feature of how we ship software today. The right response is to combine multiple generators, compare outputs, and treat discrepancies as a data-quality signal rather than a tool failure.

What fields beyond component identity matter most?

The fields that most separated high-quality SBOMs from low-quality ones, in our sample, were relationship fields, hash fields, and supplier or upstream provenance fields.

Relationships describe whether a component is a dependency, a runtime component, a dev-only component, or something else. Without this, an SBOM is a list of things that were touched at some point in the build, not a model of what actually runs in production. Many generators emit all-or-nothing relationship data, which makes triage harder than it needs to be.

Hashes, when present, turned out to be the single most useful field for cross-referencing components between SBOMs and for detecting tampering. Generators that omitted hashes, or emitted only partial hashes, produced SBOMs that were harder to use in any downstream verification workflow.

Supplier and provenance fields remain inconsistently populated. Where they were present and accurate, they were enormously useful for procurement review and incident response. Where they were absent or filled with placeholder values, they were actively misleading.

What should teams do with these findings?

The most important operational change is to stop treating an SBOM as a single-tool deliverable, and start treating it as a data product with its own quality metrics.

First, generate from multiple sources. Run a source-native generator and a container-native generator against every release candidate, reconcile the outputs, and treat unresolved discrepancies as bugs in your build or your scanner configuration.

Second, choose PURL as your primary identifier. Configure downstream tooling to prefer PURL resolution, and fall back to CPE only when PURL is unavailable.

Third, measure SBOM quality directly. Track completeness, accuracy, and resolvability as metrics on every release, the same way you track test coverage. A quarterly quality review on your SBOM pipeline catches drift that nobody notices otherwise.

Fourth, use the SBOM for something, not just as a compliance artifact. An SBOM that is generated and never consumed by your own incident response or vulnerability management workflow will rot. An SBOM that is consumed daily will stay accurate because accuracy bugs become operational bugs.

What this means

SBOM quality in 2026 is better than it was two years ago, but not yet good enough that any single generator, any single format, or any single scanning approach can be trusted on its own. Teams that want accurate inventories have to combine tools, reconcile outputs, and invest in quality as a first-class engineering concern. Teams that treat SBOMs as a checkbox produce documents that fail silently under real incident pressure.

The procurement and regulatory environment is moving toward requiring SBOMs with teeth. The organisations that already treat them as production data are the ones that will pass the next round of audits without a scramble.

How Safeguard.sh Helps

Safeguard.sh runs multiple SBOM generators across source and container layers, reconciles the outputs into a single high-fidelity inventory, and reports the completeness, accuracy, and resolvability metrics described in this post on every build. We prefer PURL identifiers, normalise across generator quirks, and flag discrepancies so teams can see where their inventory is unreliable before a real incident forces them to care. Customers use Safeguard.sh to produce SBOMs their auditors accept, their incident responders trust, and their procurement teams can defend line by line.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.