AI Security

The Disproof Step: Griffin AI vs Mythos

Most AI bug hunters skip the hardest step: trying to kill their own findings. Here is why Griffin AI's disproof pass is the single biggest lever on false-positive rate.

Shadab Khan
Security Engineer
5 min read

The worst thing an AI bug hunter can do is fall in love with its own hypothesis. Once a model has committed to a narrative where a function is exploitable, it develops a curious blindness to the evidence that the function is fine. It will re-read the sanitiser and describe it as insufficient. It will look at a validator and argue that the validator is bypassable in some hypothetical deployment. It will find reasons. A model asked to defend a finding will defend the finding, which is why a single-pass scanner is essentially a motivated storyteller.

Griffin AI's answer to this is structural rather than prompt-based: a separate disproof stage, with a separate context, whose explicit job is to kill the hypothesis. The model entering the disproof stage is not told to confirm. It is told to find any reachable configuration in which the hypothesis fails, and the hypothesis only survives if no such configuration is found. This is not a small prompt change. It is a different loop.

Why single-pass scanners cannot disprove

Mythos-class tools generally produce a finding and stop. If they run a second model call at all, it is usually a "self-critique" pass where the same model, with broadly the same prompt, is asked to reconsider. The empirical literature on self-consistency (work coming out of the LLM reasoning groups through 2024 and 2025, including the studies of self-reflection published at NeurIPS 2024) is clear that self-critique reduces errors at the margin but does not catch the errors the model is most confident about. Hallucinated vulnerabilities are, by nature, the errors the model is most confident about. They pattern-match cleanly to textbook bugs. That confidence survives a self-critique pass.

The disproof step only works if it is adversarial and grounded. Adversarial, meaning the evaluator's success criterion is to invalidate the claim, not to confirm it. Grounded, meaning the evaluator has access to the same program artefacts the original engine used: the taint path, the call graph, the sanitiser set, the framework model. Without grounding, the disproof collapses back into narrative argument, and the failure mode of narrative argument is that both sides "win."

What Griffin's disproof pass actually checks

From reading the explanations Griffin attaches to downgraded findings, the disproof pass appears to systematically attack four things.

First, sanitiser coverage. Many frameworks interpose escaping or encoding at layers the model's pattern-matching will miss. Django ORM parameterises by default. Spring's JdbcTemplate uses prepared statements unless you specifically drop to raw SQL. Express's res.render HTML-escapes interpolated variables. Whenever Griffin flags an injection, the disproof pass asks whether any framework layer between the taint source and the sink is actually doing the sanitisation the hypothesis assumes is absent. Roughly a third of the CWE-79 XSS hypotheses I have watched it generate die at this step.

Second, reachability refinement. The first-pass engine proves a path exists in the control-flow graph. The disproof pass asks whether the path is feasible given realistic branch conditions. If the injection sink is only reached when user.role == 'admin' and the taint source is only populated for unauthenticated requests, the path is unreachable in practice. This is the kind of reasoning that formal tools like CodeQL have always had some support for through the QL flow libraries, but combining it with an LLM that can read naming conventions and developer comments is genuinely new.

Third, precondition tractability. Even if a bug exists in the abstract, some preconditions are effectively unreachable. A heap overflow that requires the attacker to first authenticate as a specific internal service account, inside a VPC, with a mutual TLS client cert, is technically a vulnerability and practically not. Griffin's disproof pass articulates the precondition chain and flags hypotheses whose preconditions assume an already-compromised attacker.

Fourth, cross-file escape routes. Many bug hypotheses die because a middleware or interceptor, defined in another file, intervenes. The disproof pass explicitly widens its reading window to the call chain and the request pipeline, which is the only way to catch framework-level mitigation that the local function does not hint at.

The numbers that matter

I keep a private spreadsheet of triage outcomes across the last year. On the targets where I have enough data to compute confidence intervals, Griffin's first-pass hypotheses have a true-positive rate of roughly 35 to 45 percent. After the disproof pass, the rate on surviving findings jumps to 85 to 92 percent. The disproof pass is killing roughly half to two-thirds of the first-pass hypotheses, and the ones it kills are overwhelmingly the ones that would have wasted triage time.

For comparison, Mythos-class tools without a disproof stage sit at 5 to 40 percent true-positive rates depending on the CWE class, with no stage that systematically culls. The variance in outcomes between teams using Mythos-class tools is largely determined by how much human disproof work the team is willing to absorb.

The philosophical point

The disproof step encodes an epistemological stance that most AI tooling skips: a claim is only worth making if you have tried, and failed, to break it. Scientific communities have operated this way since the mid-20th century, and Popper's framing of falsifiability as the demarcation criterion is not a bad lens for AI security findings either. A finding that cannot, in principle, be falsified by an automated pass is not a finding; it is a hunch. A tool that emits hunches as findings is optimising for the wrong variable.

How Safeguard Helps

Safeguard ships Griffin AI's disproof reasoning alongside every surviving finding. Reviewers can see what the disproof pass tried, which configurations it explored, and which constraints had to hold for the hypothesis to survive. Downgraded hypotheses are also visible, with the reason they were killed, so triagers can spot-check the disproof logic and flag cases where they disagree. The result is a review experience where the tool is explaining its own doubts rather than pretending to be certain.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.