Architecture

Inside Safeguard's Reachability Engine

A deep look at how Safeguard's reachability engine combines call graph construction, symbolic analysis, and runtime evidence to reduce vulnerability noise by an order of magnitude.

Reachability is the difference between a vulnerability report that a team will action and one they will silently ignore. If a finding says "your dependency has CVE-2024-XXXX," a senior engineer looks at it, asks whether any code path in their application actually invokes the vulnerable function, and — without a good answer — bins the alert. Safeguard's reachability engine exists to answer that question automatically for every finding we surface, across seven languages, at ingestion speed. This post walks through how it is put together.

What does reachability actually mean in our model?

Reachability in Safeguard is a four-valued predicate, not a boolean. A finding is classified as runtime-reachable, statically-reachable, transitively-present, or declared-only. The progression matters because teams triage differently at each level. Runtime-reachable means we have evidence the code executed in production. Statically-reachable means a call edge exists from an entry point to the vulnerable sink. Transitively-present means the vulnerable symbol is linked into the build but no edge reaches it from any entry. Declared-only means the package is in the manifest but not on the classpath.

We pick this four-valued model because triage policy varies. A FedRAMP HIGH tenant may want to patch everything that is transitively-present, while a commercial tenant may only care about runtime-reachable. Reducing this to a boolean hides the nuance. Every Safeguard policy rule can condition on the level, and every finding in the UI shows the evidence that led to the classification.

What is the architecture of the reachability engine?

The engine has four layers: ingest, graph construction, propagation, and evidence join. Each layer is independently scalable because the intermediate representation between layers is a content-addressed Protobuf blob.

 Source / Build   →   Facts Extractor   →   Call Graph Builder
      (git, OCI)       (per-language)         (merged, cross-lang)
                                                      │
                                                      ▼
 Runtime Agents  →   Evidence Stream   →   Reachability Propagator
      (eBPF, APM)      (trace, exec logs)     (fixed-point solver)
                                                      │
                                                      ▼
                                           Finding Annotator
                                        (writes verdicts to DB)

The Facts Extractor runs per-language and emits a normalized fact stream: symbol definitions, call sites, inheritance edges, dynamic dispatch hints, reflection targets, and entry point markers. Python uses Jedi and our own import resolver, Java uses a customized Soot frontend, JavaScript uses a TypeScript-compiler-based extractor, and Go uses the go/ssa package directly. Rust, C#, and Ruby each have their own extractors. All seven emit into the same schema so downstream code is language-agnostic.

The Call Graph Builder merges facts into a whole-program graph. For polyglot repos this is non-trivial because a Python service can call a native library, a Java service can shell out to a Node process, and both may be described in the same SBOM. We do cross-language binding resolution using JNI signature matching, FFI attribute parsing, and protocol-buffer-declared service boundaries.

How is the call graph actually constructed?

We use a variant of RTA (Rapid Type Analysis) with points-to refinement on virtual dispatches. For each application we start with the set of instantiated types observed in the code, then iteratively resolve virtual calls using that set rather than the declared type hierarchy. This gives us a graph that is precise enough to be useful — typical precision is 15-30x better than a naive CHA (Class Hierarchy Analysis) graph — without the cost of full Andersen-style points-to analysis, which does not finish in reasonable wall-clock time on applications over about 500k lines.

Framework hints are essential. Spring, Django, Express, FastAPI, gRPC, and about forty other frameworks inject entry points that are invisible to a standard static analysis. We ship framework descriptors as first-class data and the graph builder consumes them as fact streams:

# framework-descriptors/spring-web.yaml
framework: spring-web
version_range: ">=5.0.0 <7.0.0"
entry_points:
  - annotation: "org.springframework.web.bind.annotation.RequestMapping"
    kind: http_handler
    argument_source: http_request
  - annotation: "org.springframework.web.bind.annotation.GetMapping"
    kind: http_handler
    argument_source: http_request
dispatch_hints:
  - pattern: "org.springframework.context.ApplicationContext#getBean"
    resolution: registered_beans
sink_rewrites:
  - from: "javax.sql.DataSource#getConnection"
    to: "sql-query-boundary"

These descriptors are maintained by a dedicated team and versioned alongside the framework releases. When a new Spring major ships, we publish a new descriptor before the first CVE in that version lands.

How do you make this fast enough to run per commit?

Three design decisions keep the hot path under a minute for most applications. The first is incremental call graph updates. Commits typically touch a small fraction of an application, and we cache the per-function call graph slab keyed by the function's source hash. On a recompute we invalidate only the slabs whose source changed, plus the transitive set of slabs that had edges to those functions. On a typical Java monorepo we rebuild fewer than two percent of slabs per commit.

The second is sink-directed propagation. We do not compute full reachability for every node in the graph. Instead, we take the set of known vulnerable sinks from our vulnerability database (keyed on fully-qualified symbol and version range) and run backwards reachability from each sink toward the set of entry points. This is a k-CFA-bounded backwards traversal with a cutoff at fifteen edges, which empirically catches over 97 percent of real exploitable paths in our benchmark corpus.

The third is deterministic sharding. The graph is partitioned by module or package, and the propagator runs as a set of workers each responsible for one shard. Cross-shard edges are handled by a gossip protocol so workers do not need global synchronization. This lets us scale horizontally — large customers run the engine on a dozen nodes in parallel and still get whole-program results.

How does runtime evidence feed into reachability?

Static reachability is a powerful prior but it still overestimates exploitability. A code path can exist and never actually execute in production. To sharpen the verdict, Safeguard's runtime agents emit an evidence stream of executed function signatures, sampled stack traces, and inbound HTTP route hits. The evidence joiner correlates each observation against our call graph and promotes any statically-reachable finding to runtime-reachable when we have direct evidence the vulnerable function was called.

Agents are intentionally thin. The eBPF-based Linux agent hooks ELF symbols and language-specific probes (libpython, libjvm function entries, Node V8 internals). It does not do any correlation on host — it ships a sampled stream to a regional collector that handles the join against the customer's call graph. This split means agents add negligible CPU overhead (we measure under 0.3 percent p95 on production workloads) and can be deployed in environments where heavy on-host processing is not acceptable, including air-gapped federal installs.

For serverless, where you cannot install an agent, we consume AWS X-Ray, Google Cloud Trace, and Azure App Insights traces directly. The joiner treats a span name like a function entry and does fuzzy matching against the call graph. This is less precise than eBPF, but it works without requiring customers to change their deployment.

How do you prevent the engine from lying at the edges?

Every reachability verdict in Safeguard carries an evidence bundle. Click a finding and you can see: the entry point chosen, the exact call path through the graph, the framework descriptors applied, and the runtime trace evidence if any. If a verdict is wrong, an engineer can file a correction that updates the underlying descriptor or adds a graph-level override, and the change takes effect on the next scan. We treat reachability verdicts as falsifiable claims, and we expose the falsification surface deliberately.

We also maintain a benchmark corpus of over 1,200 real-world vulnerable applications with ground-truth labels — each labeled "actually reachable in this deployment" or "not reachable here" by a human reviewer. Every engine change runs against the corpus before it ships. We publish the precision and recall numbers internally every release and gate deploys on regression thresholds. A change that improves recall by three percent but drops precision by five percent does not ship.

How Safeguard.sh Helps

Safeguard's reachability engine turns a noisy vulnerability feed into a ranked list of findings that an engineering team can actually work through. By combining static call graph construction with runtime evidence from lightweight agents, the platform answers "does this CVE matter to me right now" for every finding, not just the critical ones. Customers who enable reachability routinely report an 85-95 percent reduction in actionable ticket volume without lowering real coverage. The engine is built to be falsifiable and extensible — descriptors, overrides, and the evidence trail are all first-class so teams can trust and improve the signal. If you want to see it against your own codebase, the reachability module is available to every Safeguard tenant.

reachability architecture static-analysis call-graphs vulnerability-prioritization

Back to all articles

More on #reachability

View all →

AI Security

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.

Inside Safeguard's Reachability Engine

What does reachability actually mean in our model?

What is the architecture of the reachability engine?

How is the call graph actually constructed?

How do you make this fast enough to run per commit?

How does runtime evidence feed into reachability?

How do you prevent the engine from lying at the edges?

How Safeguard.sh Helps

More on #reachability

Zero-Day Discovery With LLM-Augmented Reachability: A Safeguard Engine Walkthrough

Transitive Depth: Griffin AI vs Mythos

Version-Aware Resolution: Griffin AI vs Mythos

Safeguard vs Snyk: Detailed 2026 Comparison

Related articles in Architecture

Multi-Tenant Isolation for FedRAMP HIGH

Safeguard Policy Evaluation Engine

Griffin Agent Loop: Design Decisions

Never miss an update

Product

Solutions

Compare

Resources

Company

Legal

Developers