A raw SBOM tells you what is in your software. It lists component names, versions, relationships, and maybe license information. This is valuable for inventory purposes, but it does not answer the question security teams actually care about: what risk does this software composition represent?
Answering that question requires enrichment — the process of augmenting raw SBOM data with vulnerability information, exploit intelligence, maintainer health metrics, and license risk assessments. Enrichment transforms a parts list into a risk assessment.
Most organizations generate SBOMs but do not enrich them. This post covers the enrichment pipeline, the data sources that matter, and the practical challenges of doing this at scale.
The Enrichment Pipeline
A complete SBOM enrichment pipeline has five stages:
Stage 1: Component Identification
Raw SBOM data often contains imprecise component identifiers. A component might be listed as "openssl" without a Package URL (PURL), CPE, or other standardized identifier. Before enrichment can begin, each component needs to be mapped to a canonical identity.
This mapping is harder than it sounds. The same library might appear as:
openssl(common name)pkg:npm/openssl@3.0.0(PURL)cpe:2.3:a:openssl:openssl:3.0.0:*:*:*:*:*:*:*(CPE)OpenSSL/3.0.0(custom format)
The enrichment engine needs to normalize these identifiers, handle aliases, and resolve ambiguities. False matches here cascade through the entire pipeline — if a component is misidentified, every vulnerability correlation will be wrong.
Stage 2: Vulnerability Correlation
Once components are identified, match them against vulnerability databases:
NVD (National Vulnerability Database) — the authoritative source for CVEs. NVD provides CVE descriptions, CVSS scores, affected version ranges, and references. Coverage is comprehensive but updates can lag days or weeks behind initial disclosure.
OSV (Open Source Vulnerabilities) — a distributed, open-source vulnerability database with better coverage for open-source ecosystems. OSV aggregates data from multiple sources (GitHub Advisories, PyPI Advisories, RustSec, Go Vulnerability Database) and uses the PURL-native matching that NVD's CPE-based system struggles with.
GitHub Advisory Database — curated advisories with reviewed version ranges and CVSS scores. Strong coverage for npm, PyPI, Maven, RubyGems, and Go modules.
CISA KEV (Known Exploited Vulnerabilities) — a curated list of vulnerabilities actively exploited in the wild. Any CVE on this list warrants immediate attention.
Using multiple data sources provides better coverage and catches vulnerabilities that any single source might miss. The trade-off is managing inconsistencies — different sources may report different affected version ranges or severity scores for the same CVE.
Stage 3: Exploit Intelligence
Not all vulnerabilities are created equal. A CVE with a publicly available exploit is more dangerous than one with no known exploitation technique.
Enrich vulnerability data with exploit intelligence:
EPSS (Exploit Prediction Scoring System) — a machine learning model that predicts the probability of a vulnerability being exploited in the next 30 days. EPSS scores range from 0 to 1, with higher scores indicating higher exploitation likelihood.
Exploit databases — Exploit-DB, Metasploit modules, and nuclei templates indicate that working exploits exist and are accessible to attackers.
Threat intelligence feeds — commercial and open-source feeds that track active exploitation campaigns, attacker tooling, and vulnerability weaponization.
Exploit intelligence transforms a CVSS score (theoretical severity) into an exploitation likelihood (practical risk). A CVE with CVSS 9.8 but EPSS 0.01 is very different from a CVE with CVSS 7.5 and EPSS 0.85.
Stage 4: Component Health Assessment
Beyond vulnerabilities, component health provides a risk signal:
Maintainer activity — when was the last commit? How many active contributors? A component with no commits in 18 months is more likely to have unpatched vulnerabilities.
OpenSSF Scorecard — an automated assessment of security practices including branch protection, dependency updates, fuzzing, and SAST.
License risk — some licenses (AGPL, SSPL) create compliance risks. License identification from SBOM data should be cross-referenced with the actual license file in the component's repository.
Dependency freshness — how far behind the latest version is the version in use? Components multiple major versions behind may have known issues that are not formally tracked as CVEs.
Stage 5: Risk Scoring
The final stage combines all enrichment data into a risk score that security teams can use for prioritization:
- Vulnerability severity (CVSS)
- Exploit availability (EPSS, exploit databases)
- Active exploitation (CISA KEV)
- Component health (Scorecard, maintainer activity)
- Reachability (is the vulnerable code actually called?)
- Deployment context (internet-facing, internal, data sensitivity)
The specific scoring formula matters less than its consistency and interpretability. Security teams need to trust the score and understand what drives it.
Scaling Challenges
Update Frequency
Vulnerability databases update continuously. New CVEs are published daily. EPSS scores are recalculated weekly. CISA KEV is updated irregularly. Keeping enrichment data current requires continuous re-correlation, not periodic batch processing.
An SBOM that was clean yesterday may have three new high-severity findings today because of newly published CVEs. The enrichment pipeline needs to detect these changes and notify affected teams promptly.
Volume
An organization with 1,000 services, each with 300 dependencies, has 300,000 components to enrich. Each component needs to be checked against multiple vulnerability databases, exploit intelligence sources, and health metrics. The data volume is significant.
Caching, incremental updates, and efficient matching algorithms are essential for keeping enrichment latency manageable. Full re-enrichment of the entire portfolio on every NVD update is not feasible for large organizations.
Data Quality
Vulnerability databases contain errors. CPE entries are sometimes wrong. Affected version ranges are sometimes too broad or too narrow. CVSS scores are sometimes disputed.
A robust enrichment pipeline needs data quality controls: cross-referencing multiple sources, flagging inconsistencies, and allowing manual overrides when automated matching produces incorrect results.
How Safeguard.sh Helps
Safeguard's enrichment engine handles the entire pipeline — from component identification through risk scoring — as an automated, continuous process. We correlate against NVD, OSV, GitHub Advisories, and CISA KEV simultaneously, enrich with EPSS scores and exploit intelligence, assess component health through OpenSSF Scorecard integration, and produce a unified risk score that reflects your organization's specific context. The enrichment runs continuously, so when a new CVE is published, affected SBOMs are re-scored and teams are notified within minutes. For organizations that have SBOMs but are not enriching them, Safeguard turns your inventory data into the actionable intelligence your security program needs.