Token cost is the line item that surprises security leaders six months after they sign a pure-LLM vulnerability contract. The demo scan cost a few cents. Then engineering wired the scanner into CI, the SBOM grew, and the monthly bill arrived with an extra digit. The question is never whether AI helps triage findings. The question is what fraction of a scan actually needs a large language model at all.
Griffin AI answers that question with a tiered architecture: a deterministic engine handles the mechanical parts of a scan, and the model layer is split across Opus for hard reasoning, Sonnet for drafting, and Haiku for high-volume classification. Mythos-class pure-LLM tools answer the question differently. They send almost everything through a single frontier model and absorb the bill. This post walks through where the tokens actually go during a typical scan, and why the architectural choice compounds across a year of scans.
What happens inside a single scan
A vulnerability scan is not one task. It is about a dozen sub-tasks, and they vary wildly in difficulty. Parsing a lockfile is deterministic. Matching a package version against a CVE range is deterministic. Resolving a transitive dependency graph is deterministic. None of these tasks benefit from a language model. A pure-LLM tool still pays for them because its architecture has no other way to process input.
Griffin AI treats the deterministic work as engine work. The lockfile parser is a parser. The version matcher is a range solver. The dependency graph walker is a graph algorithm. The engine emits a structured intermediate representation that describes what is installed, what CVEs intersect the installed versions, which of those CVEs are reachable from application entry points, and what the project's VEX statements already say. That representation is small and precise, and producing it requires zero model tokens.
The model layer is invoked only when the question actually demands reasoning. Does this finding contradict the VEX statement from last sprint. Does the call graph evidence suggest the CVE is exploitable in this codebase. Should the suggested remediation be a version bump, a patch, a configuration change, or a suppression with justification. These are the questions where a model earns its cost. Everything else is engine work.
Where the tokens go at each tier
Once the engine has narrowed the question, Griffin AI routes it to the smallest model that can answer it correctly. Haiku classifies findings by exploitability tier, normalises advisory language across ecosystems, and handles the high-volume formatting work that a batch of several thousand CVE matches produces. Sonnet drafts remediation plans, summarises dependency graph changes for pull request comments, and writes the human-readable explanations that show up in the dashboard. Opus is reserved for the residual hard problems: novel CVEs without clean advisories, conflicting evidence from multiple sources, complex cross-language taint questions, and the occasional architectural review where the engine cannot decide on its own.
A representative scan for a mid-sized service, perhaps 400 direct dependencies and a few thousand transitive ones, passes maybe fifteen findings to the model layer. Of those, twelve go to Haiku for classification, two go to Sonnet for remediation drafting, and one goes to Opus because it involves a CVE with partial exploitation evidence and an ambiguous call graph. The total token spend for the scan is dominated by the Haiku classifications, which cost about one-tenth of a Sonnet call and roughly one-fortieth of an Opus call.
A Mythos-class pure-LLM tool, on the other hand, passes everything through a single model. The lockfile text, the advisory text, the project context, the call graph summary (if the tool builds one at all), and every candidate finding all arrive as prompt tokens. Every finding, exploitable or not, gets a model-generated explanation even when a template would do. The model is almost always the frontier tier, because mixing tiers requires the kind of routing logic that pure-LLM architectures deliberately avoid.
The arithmetic at enterprise scale
Consider a 200-repository estate with nightly CI scans, pull-request scans on every merge, and weekly deep scans for supply chain drift. That is roughly 30,000 scans per month. In Griffin AI's tiered model, most of those scans resolve entirely in the engine because nothing has changed since the last scan. When the model layer is invoked, the median call is a Haiku classification measured in hundreds of output tokens, and Opus is touched on only a few percent of scans.
In the pure-LLM model, every scan hits the frontier model for essentially the full scan context. The per-scan cost difference we measure is consistently between 15x and 40x, depending on repository size and how aggressively the tool does its own redundant reasoning on unchanged findings. Over a year, that difference is not a line item, it is a strategic problem.
Why tiering is an architectural decision, not a configuration
A pure-LLM vendor cannot fix this by flipping a switch to use a smaller model. The reason Mythos-class tools use frontier models everywhere is that they rely on the model to understand the structure of the input. Without an engine, the model is the parser, the matcher, and the reasoner. Downgrading to a smaller model breaks the parsing step, which breaks the matching step, which breaks the finding. Tiering only works when the engine has already extracted the structure, leaving the model with a narrow, clearly-scoped question.
This is why Griffin AI's cost profile holds as scan volume grows. The engine handles volume. The model handles nuance. The bill grows with the interesting questions, not with the size of the lockfile.
What security leaders should measure
If you are evaluating a pure-LLM tool against Griffin AI, the demo scan cost is misleading because the demo repository is small and clean. Ask the vendor for per-scan token cost on a realistic target: a 50,000-line service with 400 dependencies, nightly scans, and a year of accumulated VEX statements. Ask for the input token count and the output token count separately. Ask how many model calls a single scan produces, and what the tier distribution looks like.
Griffin AI publishes those numbers directly in the tenant usage dashboard, broken down by scan, by model tier, and by repository. The visibility itself is part of the argument. If a vendor cannot show you where the tokens are going, the tokens are almost certainly going somewhere expensive, and the bill at twelve months will prove it.
Token cost is not the only reason to choose an engine-plus-LLM architecture, but it is the one that shows up unambiguously on the invoice. Matching model capability to task difficulty, and keeping the model out of the work that does not need it, is the design choice that makes the arithmetic work.