Every SCA vendor now claims reachability analysis on the product page, and most of them are misleading. The category has matured to the point where the basic CVE-against-SBOM matching is commodity, and reachability is the new battleground. The trouble is that reachability means very different things to different vendors, and the gap between a real implementation and a marketing checkbox is wide enough to drive a budget through.
This guide walks through the questions to ask in a buyer evaluation, the patterns that distinguish real reachability from theater, and the operational considerations that show up six months after deployment. We have been through ten of these evaluations in the past eighteen months and the pattern is consistent.
What does reachability actually mean?
Reachability is the determination of whether a vulnerable function in a dependency is invoked by code paths that originate from your application's entry points. The naive form is static call graph analysis: parse your code, build a graph of function calls including into transitive dependencies, and check if any path reaches the vulnerable function. The sophisticated form layers in framework-aware entry point detection, dynamic dispatch resolution, and runtime instrumentation to confirm what static analysis suggests.
Vendors differ in how seriously they take this. Some implement only file-level reachability, which is barely better than CVE-against-SBOM matching: if a vulnerable package's file is imported anywhere, it counts as reachable. Real function-level reachability requires building a call graph, which requires real language-aware parsing, which is hard. Ask the vendor to show you the call graph for a sample CVE, and watch how quickly they pivot to a different conversation if they cannot.
Which languages and frameworks are supported?
Language support is where the marketing falls apart fastest. A reachability engine that works well for Java may be barely usable for JavaScript, because JavaScript's dynamic dispatch and runtime monkey-patching make static call graph construction much harder. Python sits in the middle. Go is relatively tractable because of its more rigid type system. Rust is excellent for static analysis but harder than Go because of trait resolution.
When evaluating a tool, run it against a representative sample of your codebase and check the false negative rate on known reachable CVEs. We typically construct a test set of 50 CVEs across our top three languages where we have ground truth from manual analysis or runtime instrumentation. A good tool gets 85% or better. A marketing tool gets 40% and then claims the rest are 'configuration issues.' The test set takes a week to build and is the single most valuable artifact in the evaluation.
How does it handle transitive dependencies?
Transitive reachability is where most tools struggle. The vulnerability is three levels deep in your dependency tree, and the question is whether your code path actually traverses through the intermediate packages to reach it. Tools that handle this correctly need to build call graphs across package boundaries, which means parsing all the transitive dependencies at the function level. That is expensive, both in compute and in storage, and many tools quietly skip it.
Ask the vendor specifically how they handle reachability through a transitively imported package. The good answer involves a multi-layer call graph that traverses package boundaries with proper symbol resolution. The bad answer is some variation of 'if it is imported, we assume it is reachable.' That assumption inflates the alert volume and erodes confidence in the prioritization, which defeats the point of buying a reachability tool.
What is the false positive and false negative rate?
The honest answer is that no reachability tool achieves zero of either. The interesting question is the tradeoff curve. A tool tuned to minimize false negatives, missed reachable CVEs, will produce more false positives, flagging unreachable CVEs as reachable. Most teams should bias toward fewer false negatives, because a missed reachable CVE is a real security gap while a false positive is just additional triage work.
Real numbers from our evaluation set: the top tier of vendors hit roughly 88% true positive rate with 12% false positives. The middle tier hits 75% true positives with 20% false positives. The bottom tier is closer to coin flip and should not be considered a reachability tool at all. These numbers are workload-dependent, so build your own test set and demand the vendor run against it during the trial.
How does it integrate with developer workflows?
Reachability findings have to land in developer workflows or they will be ignored. The integration patterns that work are PR comments with specific reachable findings, IDE plugins that show reachability inline, and CI gates that block merges only on reachable critical CVEs. The patterns that fail are dashboards no one looks at and Slack channels that get muted within two weeks.
Pay attention to the latency. A reachability analysis that takes 30 minutes per scan will not run on every PR. Tools that cache call graphs intelligently can return per-PR results in under two minutes. We refused to consider any tool that could not analyze a representative PR in under five minutes, because anything slower will not be used.
How Safeguard Helps
Safeguard implements function-level reachability across Java, JavaScript, Python, Go, and Rust, with call graphs that traverse transitive dependencies. Griffin AI ranks reachable CVEs against exploitation signal from CISA KEV and commercial feeds, so the prioritization reflects exploitability, not just severity. SBOM ingestion is incremental, so reachability runs on every PR in under two minutes for typical repositories. Policy gates can block merges that introduce new reachable critical CVEs without blocking unreachable ones, eliminating most of the noise that drives developer pushback. TPRM scoring incorporates a supplier's historical reachability profile, and zero-CVE base images cut the underlying dependency surface so reachability has less to analyze in the first place.