Use Case · AI Remediation & LLM Selection

Right Model. Right Task. Reviewable Fix.

Griffin AI drafts remediation PRs for reachable vulnerabilities — including zero-day candidates — with the taint path, exploit hypothesis, and disproof attempt attached. A human approves before merge. Frontier models matched to the tasks they are actually good at, with eval-gated fallbacks and a published benchmark.

73%
Auto-PRs Compile Unchanged
87%
Pass With Minor Edits
5
Eval Families, Every Release
Human
Reviewer Required

Why "Just Use ChatGPT" Isn't A Strategy

AI remediation without structure produces unreliable PRs. With structure, it ships.

01

One-Size-Fits-All Model Choice

Using the same frontier model for every task wastes budget on trivial work and hits reasoning limits on complex work. Different security tasks want different model families.

02

No Eval Harness, No Confidence

If your team can't tell you the current refusal rate, citation accuracy, and regression delta against last release — you don't have an AI workflow, you have a demo.

03

Fixes Without Reachability Context

A patch for a vulnerability that isn't actually reachable from user input is wasted work. Fixes for reachable vulnerabilities, with the taint path shown, are triage-ready.

04

Auto-Merge Without Human Review

Automated PR merges for AI-generated fixes is where programs explode publicly. Every remediation needs a human checkpoint before reaching main.

Structure Over Hype

Model Selection. Tool Orchestration. Human Gate.

LLM Selection By Task Type

Reasoning-heavy triage routes to Opus-class models; high-throughput scanning uses Haiku-class for cost; multi-step remediation uses Sonnet-class for the balance. Eval-gated fallbacks per task family.

Opus · triage + hypothesis
Sonnet · remediation drafting
Haiku · bulk scanning

Tool Orchestration With Bounded Privilege

Every LLM gets exactly the tools it needs — via scoped service identities. Irreversible actions require out-of-band confirmation. Tool-call distribution shifts trigger alerts.

Capability-scoped tools
Out-of-band confirmation
Tool-call drift alerts

Remediation PR With Full Context

Griffin AI drafts the fix PR carrying the taint path, exploit hypothesis, disproof attempt, and ranked evidence. A human reviewer approves before merge. The workflow ships, not just demos.

Reachability-gated fixes
Disproof-attached evidence
Human merge gate
Published Benchmarks

Griffin AI Eval Harness — Measured, Not Marketed

Five eval families run on every release: exploit-hypothesis accuracy, remediation-PR correctness, advisory summarization, cross-finding correlation, and adversarial resistance. Regressions of more than one standard deviation block the release. The published benchmark numbers are the same numbers our internal pipeline gates against — no marketing gloss, no cherry-picking.

81%
Hypothesis Accuracy
0.89
Summary Semantic Similarity
100%
Canary Leak Pass Rate
Scenarios

Where This Bites In Real Life

Concrete shapes the remediation workflow takes once it actually ships.

01

PR-Time Auto-Fix

Around 90% of low-risk fixes — patch bumps, lockfile churn, transitive resolves — get applied automatically when the project's test suite passes.

02

Backlog Burn-Down

A single approval applies the same upgrade across 200 services. Griffin generates a per-service patch and orchestrates the rollout in safe waves.

03

Coordinated Disclosure

Griffin proposes a fix to an upstream OSS dependency. The platform opens a clean upstream PR with the hypothesis, the patch, and the reproduction.

04

LLM-Vendor Portability

The same remediation quality whether you route to Griffin, an on-prem model, or a customer-chosen LLM. Eval gates the swap; nothing else changes.

Step By Step

How Safeguard Handles It

01

Finding Lands With Hypothesis

Griffin attaches the exploit hypothesis, the taint path, and the ranked evidence to the finding before it ever reaches a queue.

02

Patch Generated With Reasoning

A candidate patch is drafted with a cited reasoning trace — which docs, which prior fixes, which test cases informed the change.

03

Tests Run In Sandbox

The project's own test suite executes inside an ephemeral sandbox. No infrastructure load, no flaky environments.

04

If Green, PR Opens

A clean PR opens with the diff, the rationale, and the test-run artifact attached. The reviewer reads the why before reading the what.

05

Reviewer Approves & Merges

Human gate, every time. The reviewer can request a regeneration with constraints, accept as-is, or take it as a starting point.

06

Metrics Roll Up To The Console

Auto-fix rate, time-to-fix, reviewer-edit distance, and rollback rate stream into the console. The program is measured, not vibed.

07

If Red, Retry With Next-Best Patch

Failing tests feed back into the loop. Griffin retries with the next-best patch up to N times before escalating to a human-driven fix.

Surfaces

What You See, Ship, And Report

IDE / CLI

Inline Apply Suggested Fix

The editor surfaces an inline 'apply suggested fix' lens on a finding. Diff preview, reasoning trace, test outcome — all without context switching.

CI / PR

PR Comment With Diff & Tests

A scoped bot comment lays out the diff, the rationale, the test results, and any caveats. Reviewers approve from the same surface they use for human PRs.

Exec Console

Auto-Remediation Metrics

Percent auto-remediated, time-to-fix histogram, and Griffin variant routing distribution. The board sees which model is doing what work, and what it costs.

AI Remediation, Without The Hand-Waving.

See the current benchmark numbers. Run the eval harness on your workloads. Ship fixes you can defend in a review.