A call graph is only as useful as the layers it reaches. Stop at the entry point and you miss the vulnerable sink. Stop at the first framework boundary and you miss cross-package flows. Stop inside the current file and you miss almost everything that matters in a modern dependency-heavy application. This post unpacks how call graph depth separates reachability-grounded reasoning, which Griffin AI performs, from ungrounded LLM reasoning, which Mythos-class tools offer.
Why depth is the hinge
In a typical Node.js or Python web application, the attacker-reachable code path is rarely within a single file. An HTTP request lands at a router, passes through three or four middleware layers, is dispatched to a controller, calls a service, calls a data-access helper, and only then touches a vulnerable function in a deep transitive dependency. If your call graph only models two or three of those hops, the reachability verdict is fiction.
The CWE catalog makes this concrete. CWE-918 (Server-Side Request Forgery) typically involves four to seven call frames between the HTTP entry and the outbound fetch. CWE-502 (Deserialization of Untrusted Data) is usually five to eight frames deep, because the untrusted bytes pass through parsing, validation, and normalization before hitting the unsafe deserializer. CWE-94 (Code Injection) via a templating engine crosses multiple packages in any real framework. A shallow graph misses all of these.
How Griffin AI approaches depth
Griffin AI builds a call graph that spans the full dependency closure of the application, not just the first-party code. For a Node project, that means walking package.json resolutions, indexing every require and import, modeling ESM dynamic imports, and stitching the result into a single graph. The graph goes as deep as the transitive dependency tree; for a typical production application, that is eight to twelve levels on the common paths and thirty or more on exceptional paths. The graph is not built on demand as the LLM asks; it is pre-computed and durable, so reasoning can traverse it freely.
Depth alone is not enough. Griffin also records the semantics of each edge: is it a direct call, a method dispatch through a class hierarchy, a callback passed to a higher-order function, a promise resolution, an event handler registered at module load? Each edge type carries different reachability implications, and the LLM needs that metadata to decide whether a path is truly reachable or theoretically reachable. Griffin's 2026 Q1 benchmark recorded 94 percent classification accuracy against hand-validated ground truth on a 412-CVE panel, and depth was the single biggest contributor to recall.
How Mythos-class tools approach depth
Mythos-class pure-LLM tools approach depth as a function of context window and retrieval quality. The LLM sees as much code as the retrieval layer hands it, and it reasons within that window. Retrieval is usually keyword-based or embedding-based, which means the files in context tend to be topically related but not necessarily call-graph-connected. The model may have the route handler and a service file in context, but not the two intermediate packages that actually connect them.
When the intermediate packages are missing, the LLM faces a choice. It can admit it does not know, which makes its output less useful to the user. Or it can guess at what the intermediate code probably looks like, which is exactly the kind of speculation that reachability analysis is supposed to prevent. Most Mythos-class tools lean toward the second behavior because it produces more confident-looking answers.
The deeper the true call path, the worse this problem gets. A three-frame path might fit in context; a ten-frame path almost certainly will not. Mythos-class tools tend to perform well on shallow synthetic benchmarks and degrade sharply on real codebases where paths are deep.
A practical comparison
Imagine a Django application with a custom middleware that decodes a JWT, attaches the decoded payload to the request, and passes control to a view. The view calls a service that calls a repository that calls a serializer that calls a vulnerable YAML loader in a utility package. That is six frames plus the middleware layer.
Griffin's call graph contains every frame. The taint analysis confirms that the JWT payload reaches the YAML loader. The finding cites CVE-2020-1747 for the PyYAML unsafe loader and names each frame in the path. An engineer can walk the path in the console, verify it, and apply the fix.
A Mythos-class tool, given the view file and perhaps one adjacent file, sees the call to service.fetch and stops. If the LLM guesses correctly, the output is useful but unverifiable. If it guesses incorrectly, the output is wrong and still unverifiable. Neither is a foundation for a security program.
Depth versus cost
Building a deep call graph is not free. Indexing a mid-size monorepo with 600 packages takes several minutes on modest hardware, and the resulting graph can occupy a few hundred megabytes. Griffin absorbs this cost once per commit and caches aggressively, so the per-finding latency is low. Mythos-class tools avoid the cost by skipping the graph entirely, which is why they feel snappier on demos; the cost reappears as false positives and false negatives in production.
The economics favor depth. A security team that spends an hour of engineering time per false positive pays far more in wages than an analysis pipeline that spends ten minutes building a durable graph. Griffin's benchmark reported a 71 percent reduction in triage time precisely because the deeper graph eliminated most of the findings that would otherwise require human investigation.
Where depth interacts with other axes
Depth is not a standalone property. A deep graph with weak taint propagation still produces noise, because you know the path exists but you cannot confirm the attacker-controlled data flows through it. A deep graph without framework routing awareness misses the first hop from the HTTP entry into the application code. A deep graph without dynamic dispatch handling misses the call from an interface to its implementation. Griffin combines depth with those companion features; Mythos-class tools, lacking the graph, cannot build any of them.
The industry is converging on this understanding. Reachability benchmarks published over the last year, including OpenSSF Reach and the CNCF Static Analysis Working Group's 2025 comparisons, have repeatedly shown that depth-capable tools outperform pure-LLM tools by wide margins on reachability recall.
How Safeguard Helps
Safeguard uses Griffin AI's deep call graph as the backbone of every reachability finding on the console. When a CVE advisory arrives, Safeguard traces the path from your application's entry points, through every intermediate frame, into the vulnerable function, and surfaces the full chain with each edge annotated by call type and taint status. Engineers can click into any frame to see the source, confirm the call, and understand why the finding is or is not reachable. The depth is not a marketing claim; it is a visible artifact in the UI. If you currently rely on a pure-LLM triage tool and your engineers are spending time verifying claims by hand, Safeguard's graph-grounded workflow will give that time back.