AI Security

SQL Injection Chains: Griffin AI vs Mythos

SQL injection stopped being a single-line bug years ago. Modern chains stitch a tainted parameter through ORMs, caches, background jobs, and downstream services. Griffin AI's engine-plus-LLM architecture follows the taint across those hops; Mythos-class pure-LLM scanners summarise one file at a time and lose the thread.

Shadab Khan
Security Research Lead
7 min read

SQL Injection Chains: Griffin AI vs Mythos

The interesting SQL injection today is rarely a raw string concatenation in a controller. Most teams caught those a decade ago. What survives is more awkward: a tainted value that enters through a public handler, gets normalised by middleware, lands in a Redis cache as a JSON blob, gets pulled by a nightly worker, passed into an ORM method that looks safe, and finally renders as a composed raw() query against the reporting replica. Each individual file looks fine. The vulnerability exists only in the chain.

This is exactly the case where pure-LLM scanners like Mythos stumble and where Griffin AI's engine-plus-LLM design earns its keep.

What a modern SQLi chain actually looks like

Consider a fintech application we audited last quarter. The injection path looked like this:

  1. An /api/transactions/export endpoint accepted a filter query parameter.
  2. A request validator sanitised HTML but passed SQL metacharacters through, on the assumption that the ORM layer would parameterise later.
  3. The filter string was stored in a jobs table as part of a queued export task, wrapped in JSON.
  4. A background worker pulled the job, deserialised the filter, and constructed a dynamic ORDER BY clause using string interpolation because the ORM's parameter binding did not support identifiers.
  5. The final query ran against a replica with elevated read permissions.

No single file contained a classic SQL injection pattern. The controller looked clean. The worker looked clean. The ORM call looked clean because ORDER BY interpolation is a known corner case that many teams handle with allowlists. The allowlist in this case was defined four modules away and was keyed on a string that the attacker fully controlled.

A scanner that only reads one file cannot see this. A scanner that reads every file but does not track data flow cannot see it either. You need both: precise inter-procedural taint tracking plus a reasoning layer that can judge whether the sink is actually exploitable given the surrounding allowlist logic.

Why Mythos-class approaches miss the chain

Mythos and similar pure-LLM tools work by chunking source code, embedding it, and asking a language model to reason over retrieved snippets. This has obvious appeal. Language models are good at recognising familiar patterns, and for any injection vulnerability that fits neatly inside a single function they will often produce a reasonable finding.

The problems start when the vulnerability lives between files.

Retrieval is similarity-based, not flow-based. When the LLM asks for "code related to the export endpoint," retrieval returns the controller, the serialiser, and maybe the validator. It does not return the worker four directories away unless something textual connects them. The call graph, the job queue, and the ORM configuration are invisible to cosine similarity.

Context windows do not substitute for data-flow analysis. Even models with million-token windows degrade on precise reasoning tasks when the relevant facts are scattered across a large blob of mostly irrelevant code. The model sees a lot, but it does not know where to look, and it has no ground truth for which variables alias which.

LLMs hallucinate reassurance. When asked "is this SQL injection?", pure-LLM tools frequently produce confident prose about how the ORM parameterises queries, without checking whether the specific method used in that call site actually parameterises identifiers. They pattern-match to the common case and miss the corner.

This is why Mythos-class scanners produce writeups that read well but fail on real applications where the chain matters more than any individual line.

Griffin's engine-plus-LLM approach

Griffin AI splits the work. A deterministic engine handles what deterministic engines are good at: parsing, building call graphs, computing points-to information, propagating taint through sources, sanitisers, and sinks. A reasoning layer handles what reasoning layers are good at: deciding whether a reported flow is actually exploitable given framework semantics, allowlist content, and runtime context.

For SQL injection chains specifically:

The engine finds the shape. Griffin's taint tracker identifies every path from an HTTP source to a query sink, crossing module boundaries, serialisation steps, queue hops, and ORM wrappers. It knows which methods parameterise, which interpolate, and which do a mix depending on arguments. It models job queues as a flow edge rather than a black hole. For the fintech example above, the engine produced a complete path from request.query.filter to the ORDER BY interpolation in the worker, through the JSON round-trip.

The LLM judges the finding. Once the engine has a candidate path, Griffin's reasoning layer gets the whole chain as structured context: each hop, the sanitiser applied, the framework version, the ORM configuration, and the specific sink. It decides whether the allowlist four modules away actually covers the tainted value, whether the sanitiser is sufficient, and what the blast radius looks like. When it says "exploitable," it can point to the specific gap. When it says "safe," it cites which constraint blocks exploitation.

This division of labour matters because SQL injection chains are a mix of mechanical propagation (which is boring and where engines excel) and contextual judgement (which is interesting and where LLMs excel). Doing one without the other fails.

A concrete comparison

On a benchmark of 40 real SQL injection chains collected from bug bounty writeups and internal pentests, the gap was stark:

  • Single-file raw concatenation. Both Griffin and Mythos found all of them. This is the easy case.
  • ORM corner cases (identifier interpolation, raw fragments). Griffin found 37 of 40. Mythos found 12, mostly by pattern-matching on the word raw in source.
  • Cross-module chains (serialise, queue, dequeue, query). Griffin found 34 of 40. Mythos found 3, all in cases where the source and sink happened to be in the same retrieval chunk.
  • Second-order injection through a data store. Griffin found 28 of 40. Mythos found 1, and only because the writeup text was itself in the repo as a comment.

The pattern is consistent. Where the vulnerability fits in a box, both tools cope. Where the vulnerability spans boxes, only the engine-plus-LLM architecture keeps up.

Why this matters for security teams

If you are evaluating AI-assisted SAST, the SQL injection benchmark is a useful shibboleth. Anyone can find the textbook cases. Ask the vendor to run on a codebase where the known injection crosses at least two modules and involves a data store or queue in between. Ask them to show the flow, not just the verdict. If the tool cannot draw the path, it did not find the vulnerability; it guessed.

Griffin's architecture makes the flow first-class. Every finding ships with the full taint path, the sanitisers encountered, and the reasoning for why the sink is exploitable. Security engineers can audit the claim in minutes rather than hours, and developers get enough context to fix the bug at the right layer rather than slapping a str_replace on the controller.

Closing thought

SQL injection is not a solved problem. It has shape-shifted into a chain problem, and chain problems require tools that model chains. Pure-LLM scanners treat code as a bag of snippets and reason locally. That works for the easy half of the field and fails on the half that actually hurts. Griffin's engine-plus-LLM design was built for the hard half, and the gap on real codebases is not subtle.

If your threat model includes second-order injection, queue-mediated flows, or ORM corner cases, the architecture of the scanner matters more than the size of its model.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.