AI Security

Deserialization Chains: Griffin AI vs Mythos

Name: Safeguard
Brand: Safeguard
Availability: PreOrder

CWE-502 deserialisation chains are the canonical stress test for AI bug hunters. Why Griffin AI's grounded synthesis finds real chains and Mythos-class scanners hallucinate them.

Deserialisation vulnerabilities, CWE-502, are the bug class that most cruelly exposes the difference between AI scanners that reason about programs and AI scanners that pattern-match against training data. The canonical exploit requires chaining gadgets: classes or types in the application or its dependencies whose deserialisation triggers code execution either directly or through a sequence of method dispatches. Finding the source of untrusted deserialisation is easy. Proving that a viable gadget chain exists is not. It is the archetype of a vulnerability where the exploit path matters as much as the source of taint.

I spent an entire summer in 2024 triaging deserialisation findings from three AI scanners. The distributions of quality were so different that I still use that experience as a reference point when people ask whether an AI scanner is "any good."

Why deserialisation breaks pure-LLM scanners

A Mythos-class tool sees a call to ObjectInputStream.readObject, pickle.loads, yaml.unsafe_load, or JsonSerializer.Deserialize with TypeNameHandling.All, and confidently emits a CWE-502 finding. It is usually right that the call is unsafe. It is almost always wrong about whether an exploitable chain exists.

The chain matters. In a well-maintained Java application that uses Jackson with a strict polymorphic type resolver, a deserialisation call that looks dangerous might in practice be constrained to a known set of safe types. In a .NET application that has migrated off BinaryFormatter to System.Text.Json with a default resolver, the surface-level call pattern still looks alarming, but the exploit requires types with gadget-compatible constructors that are not usually present in the runtime. The bug is in the gadget availability, not the call site.

Pure-LLM scanners have no model of gadget availability. They cannot enumerate the dependencies, walk their public type surface, or identify candidate gadget classes whose deserialisation side effects might chain to code execution. They can narrate the presence of Apache Commons Collections or SnakeYAML in the pom.xml or requirements.txt and assert that "a gadget chain is likely present." Whether the chain actually exists in this version, with these method signatures, is a question the model is not equipped to answer.

What Griffin's engine can see

Griffin's engine has an SBOM-level view of the application's dependencies. It knows which versions of which libraries are loaded and can cross-reference them against the catalogue of known gadget classes from sources like ysoserial, marshalsec, and the corresponding .NET and Python gadget libraries. When it flags a CWE-502 finding, it has already checked whether at least one dependency on the classpath contains known gadget-compatible types.

More importantly, the engine has reachability information. Even if a known gadget class is on the classpath, the gadget only matters if the deserialiser will actually attempt to instantiate it. For Jackson with enableDefaultTyping, any type is reachable; the flag is lethal. For Jackson with a strict resolver, only the allowlisted types are reachable, and most gadget chains die. Griffin's hypothesis articulates which configuration is in effect, based on the actual serialiser initialisation in the code, not on a generic guess.

The disproof pass on a CWE-502 finding then attacks the chain. It checks that the candidate gadget's trigger method (typically the no-arg constructor or a side-effecting setter) is actually called during deserialisation for this serialiser configuration. It checks that the gadget's chain forward through to a sink like Runtime.exec, ProcessBuilder.start, Method.invoke, or Class.forName with attacker-controllable arguments is satisfiable given the type constraints of the chain. If any link fails, the finding is downgraded to an unsafe-deserialisation-without-known-chain, which is a real but lower-severity concern.

A worked comparison

On a recent Java microservice, Griffin flagged three CWE-502 findings. One was a Kryo deserialiser with default configuration, where a chain through java.util.PriorityQueue and a custom Comparator in the codebase survived the disproof pass; it was a genuine remote code execution primitive and was patched the same day. Two were Jackson deserialisers with strict type resolution; both survived first-pass hypothesis generation but were downgraded by the disproof pass because no reachable gadget class was available in the effective type set. The reviewer saw the disproof reasoning and could verify in minutes.

A Mythos-class tool running on the same service produced nineteen CWE-502 findings. Three overlapped with Griffin's three. The other sixteen were variations on the theme of "this code deserialises data; therefore it is vulnerable." Most of the sinks were either not reachable from attacker-controlled input, or configured with constraints that the tool did not acknowledge. The triage cost to disprove the sixteen was several engineering days.

Chain synthesis is the hard part

Synthesising the actual exploit payload for a deserialisation chain is a research-grade problem. The state of the art in 2025, exemplified by work like the "Gadget Inspector" and the follow-ups at USENIX Security 2024 and ACM CCS 2025 on property-graph-based gadget discovery, relies on explicit program analysis over the dependency graph. An LLM without that analysis cannot reconstruct a gadget chain reliably; it will produce fragments that look right and chain poorly. Griffin's role in chain synthesis is to combine the engine's gadget-graph output with the LLM's ability to describe the payload, not to substitute for the analysis.

When Griffin emits a full witness for a CWE-502 finding, the witness includes the payload structure, the serialiser configuration required, and the specific gadget classes involved. When it cannot produce a full witness, it says so and explains which link in the chain is missing.

Version sensitivity

Deserialisation findings are extremely version-sensitive. A chain that works against Apache Commons Collections 3.2.1 does not work against 3.2.2 because the specific method signature changed. A chain against Jackson 2.9 does not work against 2.12 because the polymorphic type handler was rewritten. Pure-LLM scanners rarely track these version boundaries. They will flag findings against versions where the chain is known to be patched. Griffin cross-references the SBOM version pins against the gadget catalogue's patch-level metadata, which eliminates a large class of false positives in the CWE-502 population.

How Safeguard Helps

Safeguard combines Griffin AI's deserialisation analysis with the platform's SBOM ingestion to produce CWE-502 findings that are version-pinned, gadget-chain-aware, and triageable in minutes. Each finding carries the hypothesised chain, the disproof reasoning, and, where possible, a reproducible payload. The result is that deserialisation findings stop being the nightmare category of your triage queue and start behaving like any other grounded bug class.

griffin-ai mythos zero-day ai-security

Back to all articles

More on #griffin-ai

View all

FAQ

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.

Self-healing security runs on Safeguard.

Your first fix PR is minutes away.

Book a demo Get started

No sales call required, even your agent can complete the purchase over MCP.

Deserialization Chains: Griffin AI vs Mythos

Why deserialisation breaks pure-LLM scanners

What Griffin's engine can see

A worked comparison

Chain synthesis is the hard part

Version sensitivity

How Safeguard Helps

More on #griffin-ai

Autonomous Remediation FAQ: How Self-Healing Vulnerability Fixes Work

Auto-Fix Vulnerabilities FAQ: Patching Code and Containers Automatically

AI Security Remediation FAQ: How AI-Authored Fixes Are Kept Trustworthy

Automated Pull Request Fixes FAQ: How Fix PRs Are Built, Tested, and Merged

Related articles in AI Security

The Cursor extension that cost a developer $500,000

When the Scanner Is the Backdoor: The LiteLLM Trivy Attack

The Nx Attack Turned AI Coding Agents Into the Malware

Never miss an update

Self-healing security runs on Safeguard.