SBOM

AI-Generated SBOMs: How Accurate Are They?

LLMs can now generate SBOMs from source code and documentation. We tested five AI SBOM generators against traditional tools to measure accuracy, completeness, and reliability.

A new category of SBOM generation tool emerged in 2025: AI-powered generators that use large language models to analyze source code, build files, and documentation to produce SBOMs. The pitch is compelling -- instead of relying on rigid manifest parsing, AI can understand the intent behind dependency declarations, detect vendored code, and even identify undeclared dependencies.

We tested this claim. Over four weeks, we evaluated five AI-powered SBOM generators against three traditional tools across 50 open-source projects of varying size and complexity. The results are nuanced.

Methodology

We selected 50 projects spanning:

10 Node.js projects (ranging from small CLI tools to large web applications)
10 Python projects (ML pipelines, web services, utility libraries)
10 Java/Maven projects (microservices, enterprise applications)
10 Go projects (CLI tools, servers, libraries)
10 multi-language/polyglot projects (monorepos, projects with mixed stacks)

For each project, we generated SBOMs using:

AI-powered tools: Five commercial and open-source tools that use LLMs for SBOM generation (anonymized as AI-1 through AI-5 per vendor request).

Traditional tools: Syft, cdxgen, and Trivy.

Ground truth: A manually verified SBOM for each project, created by analyzing manifests, lock files, binary contents, and build outputs. This took approximately 15 hours per project.

We measured:

Completeness -- percentage of ground truth components identified
Accuracy -- percentage of generated entries that match ground truth (correct name, version, hash)
False positive rate -- percentage of generated entries not present in ground truth
Metadata quality -- correctness of PURL, license, supplier, and hash fields

Results: Completeness

| Tool Category | Direct Deps | Transitive Deps | OS Packages | Vendored Code | |---------------|-------------|------------------|-------------|---------------| | Traditional (best) | 96% | 94% | 89% | 12% | | Traditional (average) | 93% | 88% | 78% | 8% | | AI-powered (best) | 94% | 82% | 71% | 47% | | AI-powered (average) | 87% | 71% | 58% | 34% |

Traditional tools significantly outperform AI tools for direct and transitive dependency detection. This makes sense -- parsing a lock file is deterministic, and these tools have had years to get it right.

The interesting result is vendored code detection. AI tools detected vendored libraries (code copied into the project rather than declared as a dependency) at 3-4x the rate of traditional tools. This is where the LLM's ability to recognize code patterns shines -- it can identify that a file is a copy of lodash's debounce function even when it has been modified and renamed.

OS package detection was weaker for AI tools, primarily because analyzing container images requires different capabilities than analyzing source code.

Results: Accuracy

| Tool Category | Name Accuracy | Version Accuracy | PURL Accuracy | License Accuracy | |---------------|---------------|------------------|---------------|------------------| | Traditional (best) | 99.2% | 98.7% | 97.1% | 91.3% | | Traditional (average) | 98.1% | 96.4% | 93.8% | 84.6% | | AI-powered (best) | 95.8% | 89.3% | 82.7% | 78.4% | | AI-powered (average) | 91.2% | 82.1% | 74.3% | 71.8% |

This is where AI tools fall short. When a traditional tool says a component is lodash@4.17.21, it is almost certainly correct because it parsed that exact string from a manifest file. When an AI tool says the same thing, it may have inferred the version from context, import patterns, or API usage -- and the inference can be wrong.

Version accuracy is the most concerning gap. An SBOM that lists the wrong version for a component will either miss real vulnerabilities (if the listed version is not affected but the actual version is) or generate false alerts (the opposite). Either outcome undermines trust in the SBOM.

PURL accuracy follows a similar pattern. AI tools sometimes generate malformed PURLs, use incorrect namespaces, or confuse packages with similar names across ecosystems.

Results: False Positives

| Tool Category | False Positive Rate | |---------------|---------------------| | Traditional (best) | 1.2% | | Traditional (average) | 3.4% | | AI-powered (best) | 7.8% | | AI-powered (average) | 14.1% |

AI tools generate significantly more false positives -- components that appear in the SBOM but are not actually present in the project. Common causes:

Confusion between similar packages. The AI identifies code patterns similar to a library and incorrectly adds it to the SBOM.
Hallucinated dependencies. The LLM "remembers" that projects like this typically use certain libraries and adds them without evidence. This is the classic LLM hallucination problem applied to SBOM generation.
Development vs. production confusion. The AI includes devDependencies, test fixtures, or documentation examples as production components.
Stale information. The AI detects references to a previously-used dependency (in comments, old documentation, migration scripts) and incorrectly includes it.

A 14% false positive rate means that roughly 1 in 7 entries in an AI-generated SBOM may be wrong. For automated vulnerability scanning, this translates directly into wasted time investigating phantom vulnerabilities.

Where AI SBOM Generation Works Best

Despite the accuracy gaps, AI-powered SBOM generation has genuine advantages in specific scenarios:

Legacy projects without manifests. Older projects that pre-date modern package managers may have dependencies installed manually, copied into vendor directories, or compiled from source. Traditional tools that rely on manifest parsing find almost nothing. AI tools can analyze the source code and identify these components.

Vendored and forked code. Code copied from another project and modified is nearly invisible to traditional tools. AI tools can identify the origin library even when the code has been modified.

Documentation-based analysis. AI tools can extract dependency information from README files, build documentation, and deployment guides. This is not a substitute for manifest analysis, but it catches components that are documented but not declared.

Initial SBOM bootstrap. For organizations creating their first SBOM for a large, complex codebase, an AI-generated SBOM can be a useful starting point for manual refinement. It is faster to verify and correct an AI-generated SBOM than to build one from scratch.

Recommendations

Do not replace traditional SBOM generation with AI generation. Traditional tools are more accurate for the components they can detect, and accuracy matters more than completeness for most security use cases.

Use AI generation as a supplement. Run AI tools alongside traditional tools to catch vendored code, undeclared dependencies, and other components that manifest-based tools miss. Merge the results, preferring traditional tool output for components both detect.

Validate AI-generated entries. Any component identified only by an AI tool should be flagged for validation. Do not treat AI-generated entries with the same confidence as manifest-parsed entries.

Watch this space. AI SBOM generation is improving rapidly. The accuracy gaps we measured in 2025 may narrow significantly in 2026. But for now, treat AI-generated SBOMs as draft documents that require verification.

How Safeguard.sh Helps

Safeguard.sh uses a hybrid approach to SBOM generation. Our primary analysis uses traditional, deterministic parsing of manifests and lock files for high accuracy. We supplement with AI-assisted detection for vendored code, undeclared dependencies, and legacy projects. Every AI-detected component is clearly flagged with a confidence score, so you know which entries are verified and which are inferred. Our SBOM quality scoring accounts for detection method, giving you a clear picture of how much of your SBOM is verified versus estimated. The result: comprehensive SBOMs you can actually trust.

AI SBOM accuracy LLM testing

Back to all articles

More on #AI

View all →

AI Security

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.

AI-Generated SBOMs: How Accurate Are They?

Methodology

Results: Completeness

Results: Accuracy

Results: False Positives

Where AI SBOM Generation Works Best

Recommendations

How Safeguard.sh Helps

More on #AI

Anthropic's Mythos Vulnerability Scanner: An Honest Assessment of Strengths, Weaknesses, and Reasons to Be Cautious

AI Models in Your Supply Chain: The Security Risks Nobody Talks About

Getting Started with Safeguard MCP + ChatGPT

Getting Started with Safeguard MCP + Claude Desktop

Related articles in SBOM

SBOM vs. VEX: What's the Difference and When Do You Need Each?

How to Read a CycloneDX SBOM: A Line-by-Line Walkthrough

CISA Minimum Elements for SBOM: 2026 Update

Never miss an update

Product

Solutions

Compare

Resources

Company

Legal

Developers