AI Security

Elastic Scale Behaviour: Griffin AI vs Mythos

Scanning bursts when a monorepo merges. We explain why Griffin AI absorbs the spike gracefully while Mythos-class tools degrade into rate-limit queues.

Nayan Dey
Security Engineer
6 min read

A vulnerability scanner lives on the hot path of continuous integration. The load it sees is not flat. Morning pushes, sprint-end merges, dependency-update campaigns, and emergency security patches all create bursts that can be five or ten times the steady-state rate. A scanner that handles the average but stalls on bursts is operationally unusable, because the bursts are when the scanner matters most. Griffin AI's engine-plus-LLM architecture absorbs bursts gracefully. Mythos-class pure-LLM tools degrade under them for architectural reasons this post unpacks.

What a burst looks like in practice

A monorepo with a hundred services sees an everyday pattern where individual merges trigger targeted scans, each touching the services affected by the change. Occasionally a shared library is updated, and the merge fans out to dozens of services simultaneously. An even larger spike happens when a critical CVE prompts a fleet-wide dependency bump: every service gets a scan, every scan has to clear within the deployment window, and the scanner's tail latency becomes the bottleneck on the organisation's ability to respond.

The steady-state scan rate for a mid-sized org might be a few hundred per hour. The burst rate during a fleet-wide campaign can be a few thousand in fifteen minutes. A scanner that holds its per-scan latency under burst conditions is elastic. A scanner whose latency explodes is not.

Why Griffin AI scales elastically

The Griffin AI engine is stateless per scan and runs on a horizontally-scaled worker pool. When the ingestion queue fills, the worker pool scales out, and the engine portion of each scan runs at the same per-scan cost it runs at under steady-state load. The engine does not care about the size of the burst because the bottleneck at the engine layer is CPU, and CPU is cheap to add.

The model layer is called sparingly because of the tiered architecture. A burst of three thousand scans in fifteen minutes does not produce three thousand model calls. Most of those scans are repeat scans of services whose lockfiles did not change, which short-circuit in the engine via the caching layer. The scans that do hit the model layer are usually low-tier Haiku calls for classification. Even in aggregate, the model-call rate during the burst stays well below the provider's rate limit.

The operational signature during a burst is that the engine worker pool ramps up, ingestion throughput rises to match the incoming rate, and the model-call rate rises modestly. Latency per scan stays within the same band as steady state. The burst clears in proportion to the capacity we scale the workers to, and the cost scales linearly with CPU-seconds rather than with model-call volume.

Why Mythos-class tools degrade under bursts

A pure-LLM architecture routes essentially every scan through the model, which means the burst hits the model provider directly. Model provider rate limits are expressed in tokens per minute and requests per minute, and at enterprise tiers the limits are high but finite. A burst of three thousand scans that each require several model calls can hit the provider limit within seconds.

Once the limit binds, the provider returns rate-limit errors, and the tool has to retry. Retries compete with pending requests, and the effective queue depth grows. The tool's internal scheduler eventually catches up, but the tail latency on individual scans during the burst stretches into minutes or tens of minutes. In CI terms, that is far beyond the patience of any reasonable deploy gate.

Some Mythos-class vendors attempt to mitigate this by queueing scans on their own side and processing them in batches. The batching reduces the rate-limit pressure but pushes the latency even further because the queue latency compounds the processing latency. The end-user experience is a scanner that feels slow whenever it is needed most.

Others attempt to buy more capacity from the provider, but enterprise rate limits are priced aggressively, and the cost of holding headroom large enough to absorb the worst burst is substantial. That cost gets passed through, and the customer ends up paying for peak-sized capacity to serve average load.

Elastic behaviour matters for incident response

The case where elastic scale matters most is incident response. When a severe CVE lands and the team needs to verify exposure across the fleet within the hour, the scanner has to produce verdicts on every service quickly. A scanner that takes forty-five minutes to clear the burst has effectively added forty-five minutes to the incident response timeline, and in a fast-moving incident that is the difference between containing the blast radius and reading a breach writeup after the fact.

Griffin AI clears fleet-wide burst scans in well under ten minutes for most mid-sized estates. The engine does the CVE match against updated advisory data deterministically, the reachability analysis is run per service in parallel, and the model layer is invoked only for the small subset of findings where the advisory is ambiguous or the call graph is contested. The AppSec team gets a full fleet-wide exposure picture while the incident is still active, which is the point of having the tool in the first place.

The predictability of elastic cost

Elastic scale also affects cost predictability. Griffin AI's cost scales with the CPU-seconds the engine consumes, which is close to linear in workload. Bursts cost proportionally more than average minutes, but not disproportionately more, because the engine's per-scan cost is stable. Budgeting is straightforward: if the fleet is expected to scale thirty percent next year, cost scales roughly thirty percent.

Mythos-class tools have a different cost profile under bursts. The token cost per scan does not change, but the tool's overall throughput is limited by the provider rate limit, which means either the bursts get slower or the tool buys more headroom. Either way, the customer's cost under burst is a step function rather than a linear curve. Budgeting becomes harder, and the finance team tends to discover the issue only when the first real incident triggers a bill spike.

A quick reference for evaluation

When evaluating tools for a production estate, run a burst experiment. Pick a realistic burst size, something like five or ten times the steady-state scan rate, and have the tool process the burst during business hours when the provider's load is also high. Measure the median and ninety-fifth percentile latency during the burst, the time to fully drain the queue, and the token or CPU cost during the burst window.

Griffin AI's numbers on a burst look close to its steady-state numbers, with the worker pool absorbing the load. Pure-LLM tools show a visible degradation: latency distributions widen, tail latency balloons, and the drain takes significantly longer than the arrival window. The test is not complicated, but it reveals the architectural floor that determines whether the tool can live on the hot path during the moments that matter.

Elastic scale is not a marketing feature. It is the consequence of putting an engine in front of the model and keeping the model off the hot path. The engine takes the burst, the model takes the nuance, and the combined system holds its behaviour under the load spikes that define real operational use.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.