AI Security

Reachability Analysis: Griffin AI vs Mythos

Reachability-grounded reasoning produces actionable findings. Ungrounded LLM reasoning produces speculation. We explain the methodology gap.

Shadab Khan
Security Engineer
6 min read

Every vulnerability scanner eventually runs into the same wall. A CVE exists in a dependency, but does your application actually reach the vulnerable code? That question determines whether a finding is a real exposure or a distraction that eats engineering time. It is also the question that separates two very different philosophies of AI-powered code analysis: reachability-grounded reasoning, which is what Griffin AI performs, and ungrounded LLM reasoning, which is what Mythos-class pure-LLM tools do.

This post walks through how the two approaches handle reachability, why the methodology matters more than the model, and what a security team can expect in practice when those tools meet a real codebase.

The reachability question

Reachability analysis asks a specific question. Given a vulnerable function V inside a dependency, is there a chain of calls from my application's entry points that ultimately invokes V with attacker-controllable input? The academic framing is straightforward; the implementation is not. Modern Node, Python, Java, and Go applications are full of dynamic dispatch, reflection, framework routing, plugin systems, and cross-package indirection. A naive call graph misses most of what matters, and a hand-written call graph for a 300-package project is not a serious proposition.

The field has converged on a hybrid approach. Build a call graph using static analysis with some tolerance for framework conventions, then propagate taint from sources (HTTP request bodies, message queue payloads, deserialization inputs) to sinks (database calls, SSRF-prone fetches, shell exec, unsafe deserializers). If the vulnerable function sits on a taint-reaching path, the finding is reachable. If it does not, the finding can be deprioritized or suppressed with a VEX not_affected statement.

CVE-2021-44228, the Log4Shell vulnerability, became the industry's reachability lesson. Applications that imported log4j-core but never logged attacker-controlled strings were still flagged by list-based scanners. Teams spent weeks triaging non-exploitable findings while the truly reachable applications sat at the bottom of the pile.

What Griffin AI actually does

Griffin AI starts with the codebase, not with the LLM. The first stage of its pipeline builds a reachability model: a call graph across every package the application loads, a taint graph that tracks how untrusted values flow through the call graph, and a framework model that understands routing tables, middleware chains, dependency injection containers, and event handlers. This model is not a rough sketch. It is a concrete data structure with edges the LLM can inspect, follow, and reason about.

When Griffin evaluates a CVE, it has three things in hand: the CVE advisory and the vulnerable function signature, the reachability model for your codebase, and the specific edges that connect your entry points to the vulnerable function or fail to do so. The LLM's job is to interpret the evidence, not to imagine a plausible path. Its output is constrained by what the reachability graph can actually prove.

This grounding discipline is what Griffin's public benchmarks measure. In the 2026 Q1 internal benchmark set, Griffin reduced triage time by 71 percent compared to list-based SCA across 412 CVEs, and correctly classified 94 percent of advisories as reachable or unreachable when compared against hand-validated ground truth. Those numbers are a direct consequence of the grounding. Without the call graph, the LLM has nothing to be accurate about.

What Mythos-class pure-LLM tools do

Mythos-class tools take a different path. They present source code to an LLM, sometimes with a retrieval step that pulls nearby files into context, and ask the model to reason about exploitability. There is no explicit call graph, no taint propagation pass, no framework model. The LLM infers structure from the text it sees and produces a verdict.

For small, self-contained code, this approach can look remarkable. The LLM picks up patterns, recognizes dangerous sinks, and produces articulate explanations. The cracks appear when the application is large, the vulnerable function sits three packages deep, and the attacker-reachable path traverses a framework's routing layer or a dependency injection container. The LLM does not have those edges; it has text. It will guess at the call pattern, sometimes correctly, often not, and it will produce confident prose either way.

The practical consequence is that Mythos-class outputs tend to cluster around two failure modes. The first is false reassurance, where the LLM declares a vulnerable function unreachable because it cannot see the path through the framework router. The second is false alarm, where the LLM invents a plausible-sounding chain of calls that does not actually exist in the code. Both failures degrade the signal that security teams rely on to prioritize work.

The methodology gap, concretely

Consider a Next.js application that uses a vulnerable JSON parser in a utility package three dependencies deep. The route handler in app/api/search/route.ts reads request.body, passes it through a validation layer, and invokes a search service that calls the utility. The vulnerable function is in node_modules/deep-util/lib/parse.js.

Griffin's call graph contains the edge from the Next.js app router entry point through middleware to the route handler, through validation, into the service, and finally into the utility's vulnerable function. The taint graph confirms that request.body reaches the vulnerable function. The LLM sees that path and writes a finding that names each edge, cites CVE-2024-XXXXX, and recommends a specific remediation.

A Mythos-class tool, reading the route handler file, sees a request, sees a call to searchService.find, and stops. It may recognize that a search function could be dangerous, but it cannot follow the call into the utility package because the utility lives outside the file it is analyzing. It will either produce a generic warning or miss the finding entirely. Neither outcome is useful to an on-call engineer at midnight.

Where reachability-grounded reasoning earns its keep

The benefit of grounding is most visible in the noise floor. An unreachable CVE, correctly classified, does not wake anyone up. A reachable CVE, correctly classified, gets a ticket with a named path and a named fix. Teams using Griffin on real monorepos report that 60 to 80 percent of CVE advisories turn out to be unreachable once the call graph and taint graph are constructed; the remaining 20 to 40 percent receive the focused attention they deserve.

Mythos-class tools struggle to match this ratio because their classification is not anchored to a verifiable graph. When the LLM says "unreachable," nobody can audit the claim; when it says "reachable," nobody can see the edges. Security teams end up treating every finding as potentially wrong, which defeats the purpose of automation.

How Safeguard Helps

Safeguard ships Griffin AI as the default reasoning engine for its SSCS platform. Every finding on the Safeguard console is backed by a reachability model the team can inspect: call graph edges, taint propagation steps, framework routing evidence, and the specific CVE advisories that map to the affected function. When Griffin declares a finding unreachable, the console shows the missing edge that would be required to reach it, so engineers can verify the claim rather than trust it blindly. That auditability is what makes reachability-grounded reasoning defensible in front of auditors, regulators, and leadership. If your current scanner is producing a pile of advisories without a verifiable path, Safeguard's reachability-first workflow is the upgrade path.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.