Use Case · AI Remediation & LLM Selection

Right Model. Right Task. Reviewable Fix.

Griffin AI drafts remediation PRs for reachable vulnerabilities — including zero-day candidates — with the taint path, exploit hypothesis, and disproof attempt attached. A human approves before merge. Frontier models matched to the tasks they are actually good at, with eval-gated fallbacks and a published benchmark.

73%
Auto-PRs Compile Unchanged
87%
Pass With Minor Edits
5
Eval Families, Every Release
Human
Reviewer Required

Why "Just Use ChatGPT" Isn't A Strategy

AI remediation without structure produces unreliable PRs. With structure, it ships.

01

One-Size-Fits-All Model Choice

Using the same frontier model for every task wastes budget on trivial work and hits reasoning limits on complex work. Different security tasks want different model families.

02

No Eval Harness, No Confidence

If your team can't tell you the current refusal rate, citation accuracy, and regression delta against last release — you don't have an AI workflow, you have a demo.

03

Fixes Without Reachability Context

A patch for a vulnerability that isn't actually reachable from user input is wasted work. Fixes for reachable vulnerabilities, with the taint path shown, are triage-ready.

04

Auto-Merge Without Human Review

Automated PR merges for AI-generated fixes is where programs explode publicly. Every remediation needs a human checkpoint before reaching main.

Structure Over Hype

Model Selection. Tool Orchestration. Human Gate.

LLM Selection By Task Type

Reasoning-heavy triage routes to Opus-class models; high-throughput scanning uses Haiku-class for cost; multi-step remediation uses Sonnet-class for the balance. Eval-gated fallbacks per task family.

Opus · triage + hypothesis
Sonnet · remediation drafting
Haiku · bulk scanning

Tool Orchestration With Bounded Privilege

Every LLM gets exactly the tools it needs — via scoped service identities. Irreversible actions require out-of-band confirmation. Tool-call distribution shifts trigger alerts.

Capability-scoped tools
Out-of-band confirmation
Tool-call drift alerts

Remediation PR With Full Context

Griffin AI drafts the fix PR carrying the taint path, exploit hypothesis, disproof attempt, and ranked evidence. A human reviewer approves before merge. The workflow ships, not just demos.

Reachability-gated fixes
Disproof-attached evidence
Human merge gate
Published Benchmarks

Griffin AI Eval Harness — Measured, Not Marketed

Five eval families run on every release: exploit-hypothesis accuracy, remediation-PR correctness, advisory summarization, cross-finding correlation, and adversarial resistance. Regressions of more than one standard deviation block the release. The published benchmark numbers are the same numbers our internal pipeline gates against — no marketing gloss, no cherry-picking.

81%
Hypothesis Accuracy
0.89
Summary Semantic Similarity
100%
Canary Leak Pass Rate

AI Remediation, Without The Hand-Waving.

See the current benchmark numbers. Run the eval harness on your workloads. Ship fixes you can defend in a review.