Griffin AI drafts remediation PRs for reachable vulnerabilities — including zero-day candidates — with the taint path, exploit hypothesis, and disproof attempt attached. A human approves before merge. Frontier models matched to the tasks they are actually good at, with eval-gated fallbacks and a published benchmark.
AI remediation without structure produces unreliable PRs. With structure, it ships.
Using the same frontier model for every task wastes budget on trivial work and hits reasoning limits on complex work. Different security tasks want different model families.
If your team can't tell you the current refusal rate, citation accuracy, and regression delta against last release — you don't have an AI workflow, you have a demo.
A patch for a vulnerability that isn't actually reachable from user input is wasted work. Fixes for reachable vulnerabilities, with the taint path shown, are triage-ready.
Automated PR merges for AI-generated fixes is where programs explode publicly. Every remediation needs a human checkpoint before reaching main.
Reasoning-heavy triage routes to Opus-class models; high-throughput scanning uses Haiku-class for cost; multi-step remediation uses Sonnet-class for the balance. Eval-gated fallbacks per task family.
Every LLM gets exactly the tools it needs — via scoped service identities. Irreversible actions require out-of-band confirmation. Tool-call distribution shifts trigger alerts.
Griffin AI drafts the fix PR carrying the taint path, exploit hypothesis, disproof attempt, and ranked evidence. A human reviewer approves before merge. The workflow ships, not just demos.
Five eval families run on every release: exploit-hypothesis accuracy, remediation-PR correctness, advisory summarization, cross-finding correlation, and adversarial resistance. Regressions of more than one standard deviation block the release. The published benchmark numbers are the same numbers our internal pipeline gates against — no marketing gloss, no cherry-picking.
Concrete shapes the remediation workflow takes once it actually ships.
Around 90% of low-risk fixes — patch bumps, lockfile churn, transitive resolves — get applied automatically when the project's test suite passes.
A single approval applies the same upgrade across 200 services. Griffin generates a per-service patch and orchestrates the rollout in safe waves.
Griffin proposes a fix to an upstream OSS dependency. The platform opens a clean upstream PR with the hypothesis, the patch, and the reproduction.
The same remediation quality whether you route to Griffin, an on-prem model, or a customer-chosen LLM. Eval gates the swap; nothing else changes.
Griffin attaches the exploit hypothesis, the taint path, and the ranked evidence to the finding before it ever reaches a queue.
A candidate patch is drafted with a cited reasoning trace — which docs, which prior fixes, which test cases informed the change.
The project's own test suite executes inside an ephemeral sandbox. No infrastructure load, no flaky environments.
A clean PR opens with the diff, the rationale, and the test-run artifact attached. The reviewer reads the why before reading the what.
Human gate, every time. The reviewer can request a regeneration with constraints, accept as-is, or take it as a starting point.
Auto-fix rate, time-to-fix, reviewer-edit distance, and rollback rate stream into the console. The program is measured, not vibed.
Failing tests feed back into the loop. Griffin retries with the next-best patch up to N times before escalating to a human-driven fix.
The editor surfaces an inline 'apply suggested fix' lens on a finding. Diff preview, reasoning trace, test outcome — all without context switching.
A scoped bot comment lays out the diff, the rationale, the test results, and any caveats. Reviewers approve from the same surface they use for human PRs.
Percent auto-remediated, time-to-fix histogram, and Griffin variant routing distribution. The board sees which model is doing what work, and what it costs.
See the current benchmark numbers. Run the eval harness on your workloads. Ship fixes you can defend in a review.