AI Security

Pattern Scanners Can't Find Zero-Days. This Can.

Signature-based scanners only know what other people have already named. Here is the architectural reason they cannot find zero-days, and what actually does.

Shadab Khan
Security Engineer
7 min read

The pitch I have heard most often from a SAST vendor in the last decade goes something like this: we have ten thousand rules, our coverage matches the OWASP Top 10, and our newest detection rule landed yesterday. Each of those facts is true. None of them addresses the problem the customer was actually asking about. The customer wanted to know whether the tool would surface a vulnerability that nobody had named yet. The honest answer, and the one nobody likes to give in a sales call, is that signature-based scanners cannot do that. Not because the engineers building them are unserious, but because the architecture is structurally unable to surface what nobody has already taught it to look for.

That gap is the entire reason teams burn out on vulnerability tooling. It is also the reason most of the headline supply chain incidents in the last three years were sitting in dependency graphs that had been "scanned clean" by signature tools the week before. If we want to talk about zero-day discovery in 2026, we have to be honest about why the dominant paradigm misses them.

What a pattern scanner actually does

Strip away the marketing, and a pattern scanner is doing one of three things. It is matching identifiers against a database of vulnerable package versions (the SCA case). It is matching syntactic patterns against a corpus of known-bad code shapes (the rule-based SAST case). Or it is matching binary fingerprints against a database of known-bad artifacts (the antivirus lineage). All three are descendants of a single idea, which is that you can detect a thing you have already seen before. They are extraordinarily good at this. The CVE matching engines I trust to flag a vulnerable left-pad transitively pulled into a backend service are doing exactly that work, and they do it in milliseconds.

The trouble is that "things I have already seen before" is a bounded set. The vulnerabilities that are dangerous to your organisation right now, this quarter, are precisely the ones that fall outside that set. By definition, a zero-day is a flaw without a signature.

Why this is not a tuning problem

A common rebuttal is that the rule corpus simply needs to grow. If we wrote enough rules, eventually the rule set would catch the next bug. This is wrong in a way that is worth being precise about.

A rule encodes a pattern: a particular shape of source-to-sink flow, a particular API misuse, a particular constant or string. To write that rule you need to know in advance what the shape is. The space of bug shapes is not bounded by your rule corpus, it is bounded by the language semantics, the framework conventions, and the attacker's creativity. Each of those exceeds the rule writer's imagination by orders of magnitude. CodeQL queries, Semgrep rules, and proprietary SAST patterns are all useful for catching the bug class you have already characterised. They cannot catch the bug class that has not been characterised yet, because there is nothing to write down.

This is the same reason intrusion detection systems based on attack signatures get bypassed by the next variant. The architecture rewards what is already known and ignores everything else.

What zero-day discovery requires

A system that can surface a previously unknown bug has to do three things that a pattern scanner does not do. It has to reason about reachability rather than syntactic similarity. It has to hypothesise an exploit and check whether the hypothesis survives contact with the actual code. And it has to ground its claims in evidence the developer can verify, because a finding without grounded evidence is indistinguishable from a hallucination.

The first requirement rules out grep-style tools, regardless of how clever the regular expressions are. Reachability is a whole-program property; you cannot recover it from a single file in isolation. The second requirement rules out pure-LLM bug hunters, which are excellent at narrating plausible vulnerabilities but cannot, on their own, distinguish a narration from a real exploit primitive. The third requirement rules out anything that returns a verdict without showing its work, because the cost of a wrong verdict is borne by the engineer who has to disprove it.

Putting these three together describes an architecture, not a product. The architecture has a static engine at the bottom that does the reachability analysis and surfaces grounded taint paths, an LLM in the middle that hypothesises bug classes and exploit conditions over those paths, and a disproof pass at the top whose only job is to kill weak hypotheses before they reach a human.

What that looks like in practice

I have spent most of the last year working with the engine-plus-Griffin AI pipeline that follows that shape. The engine layer surfaces inter-procedural taint flows in the codebase, including transitive ones through dependencies. The Griffin layer reasons over each flow and asks what CWE class it would correspond to if the obvious sanitisers were absent or insufficient. The disproof layer takes the resulting hypotheses and tries to falsify them: it checks framework-level escaping, type narrowing, runtime invariants, and whether the preconditions for the exploit could plausibly hold under any input. Anything that survives this gauntlet is a finding. Anything that does not is silently dropped, and the pipeline does not nag the developer about it.

The findings that come out of this process are qualitatively different from the output of a pattern scanner. They are not "this regex matched in this file." They are "data from this HTTP handler reaches this SQL builder through these three function calls without passing through any sanitiser the engine recognises, here is the CWE-89 hypothesis, here is the disproof attempt that failed, and here is the line where the developer should look first." The triager is not being asked to invent an explanation. The explanation is the finding.

Why precision matters more than volume

The most expensive thing in a security programme is not licence cost. It is engineer attention. A scanner that emits a thousand findings of which fifty are real consumes more engineer time than a scanner that emits eighty findings of which seventy-five are real, even if the second tool misses a few. Pattern scanners are biased towards volume because their rules are tuned to recall on historical bug shapes. Engine-plus-Griffin pipelines are biased towards precision because they will not emit a finding the disproof pass cannot defend.

I would rather have a quiet tool that is correct than a loud tool that is wrong. After enough quarters of triage debt, most security teams arrive at the same view, usually painfully.

How Safeguard Helps

Safeguard runs the engine-plus-Griffin AI pipeline across your monorepo and your transitive dependency graph on every merge. It does not rely on a rule corpus to define what a zero-day looks like, because the corpus would not contain it. Instead, it surfaces reachable taint paths, hypothesises bug classes from a CWE-grounded model, runs a disproof pass that drops the speculative hypotheses, and reports only the findings that survived. Each report ships with the taint path, the hypothesised exploit conditions, and the disproof attempt that failed. Teams using Safeguard see zero-day candidates in code their pattern scanners called clean, and they see them with enough evidence that the triage decision is genuinely a decision rather than a guess.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.