AI Security

Cache Hit Optimisation: Griffin AI vs Mythos

Prompt caching and engine memoisation combine to make Griffin AI scans repeat-cheap. Pure-LLM tools recompute the same reasoning on every run.

Shadab Khan
Security Engineer
6 min read

A vulnerability scanner that runs nightly should get cheaper over time, not more expensive. The packages do not change overnight. The CVE database rarely shifts underneath a stable release train. The codebase itself evolves slowly for most production services. If the scanner is paying full cost on every run, it is because the architecture is not recognising what it already knows. Caching is the mechanism that translates stability into savings, and it is one of the largest cost differences between Griffin AI and Mythos-class pure-LLM tools.

Two layers of caching that work together

Griffin AI operates two caching layers. The first is engine-level memoisation. When the engine processes a lockfile, it fingerprints the manifest, the resolved transitive tree, and the advisory database snapshot. If the fingerprint matches the previous scan, the engine short-circuits: no recomputation of the dependency graph, no re-matching of CVE ranges, no call-graph rebuild. The scan completes by replaying the prior intermediate representation and checking for deltas. In steady state, the engine layer produces a cache hit on most scans because most scans run against unchanged code against an unchanged advisory set.

The second layer is prompt caching at the model provider. When the engine has to invoke the model layer, for example to draft a remediation plan or to classify a newly-appeared finding, the prompt is structured so that the long, stable portion of the context comes first. Project metadata, accumulated VEX statements, policy definitions, and advisory excerpts are all cache-prefix material. The novel portion of the prompt, which is the specific finding the model is being asked to reason about, comes at the end. The cache key on the stable prefix hits repeatedly across scans of the same project, and the pricing advantage of cached tokens over fresh tokens compounds.

The two layers are not independent. The engine-level cache is what makes the prompt prefix stable enough to cache well. If the engine recomputed its intermediate representation on every run, the serialisation of that representation would change subtly each time, breaking the prompt cache hash. The engine is the thing that gives the prompt cache something consistent to bite on.

Why Mythos-class tools struggle with cache hits

A pure-LLM vulnerability tool has no engine, so it has no engine-level cache. The tool either sends the full scan context to the model on every run, which is expensive, or it tries to cache the model's output directly, which is brittle. Caching model outputs is brittle because the model's output format tends to vary across runs even when the input is functionally identical. A small change in how the tool formats the lockfile, the inclusion of a new timestamp, or a whitespace difference in the manifest all break the cache key. Pure-LLM tools end up with cache hit rates in the single digits on repeated scans of unchanged code.

Some Mythos-class vendors attempt to improve the situation by introducing their own prefix structure, typically by prepending a "system context" that rarely changes. This helps marginally with the model provider's prompt cache, but it does not address the underlying issue: the rest of the prompt, which contains the scan-specific content, is still regenerated from scratch on every run because there is no stable intermediate representation upstream of the prompt. The model provider sees a cache-miss on the variable portion, and the cached prefix is a small fraction of the total token count.

Griffin AI achieves prompt cache hit rates above 70 percent on repeat scans because the engine has already reduced the variable portion to a minimal delta. When the scan is a pure repeat, the prompt contains a stable project context, a stable VEX state, and a small delta describing what changed since the last scan. If nothing changed, there is no model call at all. The engine short-circuits and returns the previous result with a freshness stamp.

What a cached scan actually looks like

A typical repeat scan of a stable service in Griffin AI completes in under a second. The engine fingerprint matches, the cached intermediate representation is replayed, the dashboard is updated with the latest "scan completed" timestamp, and no model tokens are consumed. The cost of that scan is essentially the cost of reading and writing a few database rows.

A scan where the lockfile has changed but the change is minor, for example a single patch-version bump, also stays mostly inside the engine. The engine recomputes the affected portion of the dependency graph, notices that the patched version addresses a previously-known CVE, updates the finding status, and only invokes the model layer if a new finding has appeared or if the suppression logic needs a fresh justification. In practice, a patch-bump scan costs maybe ten percent of a cold scan.

A scan where the advisory database has updated but the code has not changed is the case where prompt caching matters most. The engine re-runs CVE matching against the updated advisory set and produces a small set of newly-matched findings. The model layer is invoked to classify each new finding, but the prompt prefix is cache-hit because the project context is unchanged. The model provider charges cached-token rates for the bulk of the prompt, and the tool absorbs only the cost of the novel output tokens.

Measuring cache hit rate honestly

Cache hit rate is a vendor-favourable metric if measured carelessly. A vendor can claim high cache hit rates by counting only the prompt prefix tokens and ignoring the uncached tail. Griffin AI reports cache hit rate at the scan level: what percentage of scans were fully short-circuited by the engine, what percentage hit the model layer but with cached prefixes, and what percentage were full cold runs. The three numbers sum to one hundred, and the distribution tells you honestly how much of the total spend is cache-advantaged.

When evaluating a Mythos-class tool, ask for the same three numbers. Most pure-LLM vendors cannot report the first number because they have no engine to short-circuit. That absence is the signal.

Compounding over a year

A single cached scan is a rounding error. Ten thousand cached scans a month is a budget line. Over a year, the difference between Griffin AI's cache profile and a pure-LLM tool's cache profile compounds into a large fraction of total cost of ownership. It also compounds into latency: cached scans complete fast, and fast scans keep the CI pipeline quick, which keeps developers from bypassing the gate when they are in a hurry.

Cache hit optimisation is not an incremental tweak. It is a consequence of having an engine in front of the model. The engine produces a stable representation, the stable representation caches well at every layer, and the combined effect is a scanner that gets cheaper and faster the longer you run it. Mythos-class tools run at full price forever because they have nothing to cache on.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.