AI Security

Deserialization Vulnerabilities: Griffin AI vs Mythos

Unsafe deserialization looks obvious on a slide and impossible on a real codebase. Sinks are language-specific, gadgets live in third-party libraries, and the tainted byte can arrive wrapped in six layers of framework ceremony. Griffin's engine-plus-LLM design handles each of those concerns separately; Mythos-style pure-LLM scanners blur them into pattern-matching.

Nayan Dey
Principal Engineer
7 min read

Deserialization Vulnerabilities: Griffin AI vs Mythos

Deserialization bugs are the ones that convince security teams that SAST is a scam. The finding reads well ("untrusted input flows into readObject"), the code looks roughly right, and the model explains the exploit in confident prose. Then the team tries to reproduce it and discovers the tainted byte never reaches the sink, the class loader rejects the gadget chain, or the framework already filtered the payload two layers upstream. Half of all deserialization findings from pure-LLM scanners are like this. The other half are real and happen to be catastrophic.

Telling the two apart is the entire job. Griffin AI's engine-plus-LLM architecture is designed for that job. Mythos-class pure-LLM approaches are not.

Why deserialization is uniquely hard to analyse

Three properties of deserialization make it different from other sinks:

Sinks are language and library specific. Java has readObject, readExternal, XMLDecoder, ObjectInputStream subclasses, Jackson polymorphic deserialisation, SnakeYAML tags, Kryo with default registration, and a long tail of serialisation libraries with their own rules. Python has pickle.loads, yaml.load, marshal, shelve, and framework-specific caches. .NET has BinaryFormatter, NetDataContractSerializer, LosFormatter, ObjectStateFormatter, and Json.NET with TypeNameHandling. PHP has unserialize and Laravel's encrypted cookies. A scanner that looks for "readObject" in Java code catches one-tenth of the actual sinks.

Exploitability depends on the gadget chain. A readObject on attacker-controlled bytes is only exploitable if a gadget chain exists on the classpath. Commons-Collections, Spring, Hibernate, and Rome are the classic suspects, but the list changes with every release and every dependency bump. Whether a specific call is exploitable today is a function of your pom.xml, your shaded jars, and your JVM version.

Tainted input often arrives via indirect paths. The serialised blob rarely comes straight from a request body. It is base64-decoded, pulled from a cookie, fetched from a cache, loaded from a message queue, or assembled across multiple frames of a multipart upload. Many of these paths involve framework code the developer did not write.

Any tool that wants to produce accurate deserialization findings has to model all three dimensions. Pure-LLM scanners model roughly none.

How Mythos-class tools fail here

Mythos retrieves code by similarity and reasons with a language model. For deserialization this produces three characteristic failure modes.

Sink blindness. The model knows readObject is dangerous because the training data says so, but it does not know that a custom MessageSerializer.decode() in your infrastructure module calls ObjectInputStream.readObject inside a helper. Without a call graph it cannot follow the wrapper. Worse, it sometimes recognises Json.NET as "safe JSON" without noticing TypeNameHandling.All in the project's JsonSerializerSettings.

Gadget hallucination. Ask a pure-LLM scanner whether a finding is exploitable and it will tell you about Commons-Collections 3.1. It does not check whether Commons-Collections is actually on your classpath, what version, or whether the specific chain still works on current JVMs. The prose sounds authoritative; the reasoning is detached from your dependency graph.

Source confusion. The scanner sees a deserialisation call and a request handler in the same chunk and assumes they connect. Or it sees them in different chunks and assumes they do not. Neither conclusion survives contact with a dispatcher, a queue, or a cache. The false positives are loud and the false negatives are silent, which is the worst combination.

The result is a tool that is either spammed with false alarms or quietly misses the one real bug that matters.

Griffin's architecture for deserialization

Griffin treats deserialization as three intertwined problems and solves each with the right mechanism.

The engine enumerates real sinks. Griffin ships with per-language sink catalogues that cover the actual API surface: not just readObject but the ecosystem of serialisation libraries, their configuration options, and the wrappers that teams build on top of them. Jackson with default typing, SnakeYAML without SafeConstructor, Kryo with setRegistrationRequired(false), and dozens more are first-class sinks with configuration-aware rules. Custom wrappers are followed through call graphs, so MessageSerializer.decode is resolved to the underlying dangerous call and flagged accordingly.

The engine tracks taint through framework plumbing. Request bodies, cookies, headers, queue messages, and cache entries are modelled as sources. Base64 decoding, compression, and encryption wrappers are modelled as non-sanitising transforms that preserve taint. When the engine reports a flow, it can show every hop from the HTTP boundary to the sink, including framework code.

The LLM reasons about exploitability. Once the engine has a candidate flow, Griffin's reasoning layer evaluates whether a gadget chain is plausible. It inspects the resolved dependency graph, checks library versions against known gadget databases, and considers JVM or runtime version constraints. It distinguishes between "sink reachable but no known gadget on classpath" and "sink reachable and Commons-Collections 3.2.1 is present." The verdict cites specific libraries, specific versions, and specific chain names where applicable.

This separation is what makes Griffin's deserialization findings actionable. Developers do not have to argue with the tool about whether the sink is real; the flow is explicit. Security engineers do not have to re-verify exploitability; the gadget reasoning is tied to the actual dependency manifest.

A benchmark that reflects reality

We maintain an internal corpus of 60 deserialization cases drawn from CVEs, bug bounty reports, and consented customer codebases. The cases are graded by difficulty:

  • Trivial. Direct untrusted input to a textbook sink in a single function.
  • Moderate. Input wrapped in framework decoding, sink behind a helper, obvious gadget on classpath.
  • Hard. Multi-hop flow through a cache or queue, non-obvious sink configuration, gadget availability that depends on build profile.

On this corpus:

  • Griffin found 58 of 60 cases overall, with the two misses both in the "hard" tier involving a Kotlin-specific serialisation quirk we have since patched.
  • Mythos found all of the trivial cases, most of the moderate ones, and fewer than a quarter of the hard ones. More concerning, Mythos produced 4x the false positive volume, mostly on moderate-tier code where the sink was configured safely but the model did not notice.

False positives matter because they train teams to ignore the tool. A deserialization finding that turns out to be wrong three times in a row will get triaged as noise the fourth time, which is the time it is real.

What to ask your vendor

If deserialization is on your threat model, the useful questions for any AI-assisted scanner are not about model size or benchmark scores. They are:

  • Which specific sinks does the tool recognise for my language stack, and can it show me the catalogue?
  • Does the finding include a full taint path from the request boundary to the sink, or only a sink location?
  • Does exploitability reasoning reference my actual dependency graph, or does it cite generic gadget chains?
  • How does the tool handle custom serialisation wrappers?

Griffin answers all four concretely. Pure-LLM tools tend to answer in prose that sounds good and says little.

Closing thought

Deserialization is the canonical case where pattern-matching is not enough. The sinks are too varied, the exploit surface is too configuration-dependent, and the flows are too indirect. Griffin's engine-plus-LLM design is not a marketing split; it is the architecture the problem demands. Mythos-class scanners approach the same problem with the wrong tool and produce the familiar mix of confident prose, noisy false positives, and occasional quiet misses.

If a scanner cannot show you the byte path, it is not finding your deserialization bugs. It is guessing, and with this class of vulnerability, guesses are expensive.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.