AI Security

Zero-Day Triage Without Drowning Engineers

A zero-day discovery pipeline is only as useful as the triage process around it. Here is what triage looks like when the pipeline gives engineers something they can defend.

The triage queue is where vulnerability programmes either succeed or quietly collapse. I have watched both outcomes more than once. The pattern is consistent. A team turns on a new discovery tool, the queue swells, the engineers assigned to triage burn out within a quarter, the queue stops being read, and six months later somebody asks why nobody is closing tickets and the answer is that the tool is producing more noise than the team can process. The tool is not removed, because removing it would feel like a regression, but it stops mattering. Findings sit unread until they age out.

The interesting question is what happens when the discovery pipeline stops producing the kind of noise that drives this collapse. The engine-plus-Griffin AI architecture I have written about elsewhere produces findings at a rate and a precision that an actual engineering team can absorb. That changes triage. It does not eliminate it. The artefacts and workflows that fit the new shape are different from the ones that grew up around CVE-driven scanners.

What triage looks like when findings are grounded

A typical zero-day finding from the engine-plus-Griffin pipeline arrives in the queue with five things attached. The first is a reachable taint path with line numbers across the relevant packages. The second is a CWE class identifying the bug pattern. The third is the set of exploit conditions the hypothesis depends on. The fourth is the disproof attempt the pipeline ran. The fifth is a confidence score derived from how cleanly the disproof attempt failed.

When an engineer opens the ticket, they are not being asked to construct an exploit hypothesis from scratch. They are being asked to verify a hypothesis the pipeline has already constructed and partially defended. That is a different cognitive task. It is bounded, it has a clear stopping condition, and it produces a defensible decision either way.

The triage protocol that fits this artefact has three steps. The engineer reads the taint path, confirms or refutes the reachability claim by walking the path in the editor, and decides whether the disproof attempt that failed is convincing in their own reading of the code. If the path is real and the disproof attempt is convincing, the finding is accepted and moves to remediation or disclosure. If the path is real but the disproof attempt is unconvincing on closer inspection, the finding is reclassified or dropped with notes. If the path is not real, the engineer marks the finding as a pipeline error and the metadata is fed back to improve the engine.

This protocol fits in 15 to 30 minutes per finding for a competent engineer. That is a workable rate. It is not the 5 minutes some marketing decks claim for AI-assisted triage, but it is not the 60 to 90 minutes that pure-LLM tools force on you when the engineer has to construct the entire vulnerability story themselves.

What "drowning" actually looks like

Drowning, in this context, has a specific shape. The engineer opens a ticket, reads a vulnerability narrative that the model produced, looks at the code, finds that the narrative is wrong in one of several ways (the function does not exist, the data flow is blocked by a sanitiser the model missed, the bug class is not applicable in this framework), spends 45 minutes writing up the disproof, and closes the ticket. They open the next ticket. The next ticket is the same shape. By the third ticket of the day, they are skimming. By the fifth, they are closing as "not reproducible" without reading carefully. By the end of the week, they hate the queue and avoid it.

This pattern is not a personality flaw. It is the predictable response to a queue dominated by hallucinated findings. The engineer's time is being consumed by work that produces no value, which is the most demoralising kind of work there is. The fix is not to push the engineer harder. The fix is to stop putting hallucinated findings in the queue.

The triage workflow that scales

Three practices have made the difference between a queue that drains and a queue that drowns the team, in the deployments I have observed.

The first is to enforce that every finding ships with its artefact bundle attached. If a finding shows up in the queue without a taint path, without exploit conditions, or without a disproof attempt, it is rejected at intake and sent back for the pipeline to reprocess. This sounds harsh, but it is the discipline that prevents the queue from accumulating noise. A finding without grounding is not a finding. It is a vibe.

The second is to time-box triage at the start. Each finding gets an initial 30-minute slot. If the engineer cannot resolve it in 30 minutes, they leave a note, escalate to the owner of the affected component, and move on. The 30-minute time-box does two things: it prevents one stubborn finding from consuming a day, and it forces the team to be honest about which findings need cross-team coordination versus which ones a single engineer can resolve in isolation.

The third is to maintain a feedback loop with the pipeline. Every finding that the engineer rejects, modifies, or escalates produces a structured signal that the pipeline ingests. Over time, the disproof pass learns from the rejections and the precision improves. This loop is a feature, not an optional integration. A pipeline that does not learn from its triage outcomes is one that cannot improve.

Cross-team coordination

Some findings are too big for a single engineer. A zero-day in a transitive dependency that affects three different services, owned by three different teams, requires a coordinator. The coordinator is usually someone in the security team who has the relationships to convene the right engineers, agree on a remediation approach, and track the disclosure timeline.

The artefact bundle from the pipeline becomes the working document. The taint path tells the consuming services which entry points are involved. The CWE class anchors the conversation in shared vocabulary. The disproof attempt becomes the basis for evaluating proposed mitigations. By the time the meeting ends, the team has a plan that can be executed in parallel across services, with each team owning the slice of the path that runs through their code.

This is the kind of coordination that the conventional CVE-driven workflow rarely produces, because a CVE finding does not include the path. The path is what makes coordination tractable.

Metrics that signal a healthy queue

A healthy zero-day triage queue looks like this. The median time-to-first-touch on a finding is under one business day. The median time-to-resolution is under two weeks. The acceptance rate (findings the team agrees are real) is between 75 and 95 percent. The repeat-error rate (the same false positive shape appearing again from the pipeline) is dropping over time. The team's stated comfort with the queue, which I solicit in retros, is positive or neutral.

A drowning queue looks like the inverse. Findings sit untouched for weeks. Resolution times are erratic. Acceptance rates are low. The same kinds of false positives recur. The team complains about the queue and avoids the rotation. If you see those signals, the discovery pipeline is the problem, not the team.

How Safeguard Helps

Safeguard ships the engine-plus-Griffin AI pipeline together with the triage workflow that fits it. Every finding lands in the queue with the full artefact bundle: taint path with line numbers, CWE class, exploit conditions, and the disproof attempt that failed. The triage UI is built around the protocol of verify, decide, and feed back, with time-boxing and cross-team coordination tooling included. The platform tracks queue health, surfaces the metrics that distinguish a healthy queue from a drowning one, and feeds triage outcomes back into the pipeline so the false positive rate continues to fall over time.

zero-day griffin-ai ai-security supply-chain

Back to all articles

More on #zero-day

View all →

AI Security

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.

Zero-Day Triage Without Drowning Engineers

What triage looks like when findings are grounded

What "drowning" actually looks like

The triage workflow that scales

Cross-team coordination

Metrics that signal a healthy queue

How Safeguard Helps

More on #zero-day

Zero-Day Discovery With LLM-Augmented Reachability: A Safeguard Engine Walkthrough

Pattern Scanners Can't Find Zero-Days. This Can.

Anthropic's Mythos Vulnerability Scanner: An Honest Assessment of Strengths, Weaknesses, and Reasons to Be Cautious

The Limits of Single-Model Vulnerability Scanning: A Technical Analysis of the Mythos Approach

Related articles in AI Security

Building an Eval Suite for Your Security LLM Workflows

Zero-Day Discovery With LLM-Augmented Reachability: A Safeguard Engine Walkthrough

Frontier LLM Vendors Are Not Your Supply Chain Security Vendor

Never miss an update

Product

Solutions

Compare

Resources

Company

Legal

Developers