Eagle. The surface scanner.
Eagle is the 13B mid-tier triage model. It does wide-angle sweeps across the cross-package call graph, ranks taint paths, and feeds the queue Griffin draws from — so the heavyweight reasoner spends its budget on candidates that already survived a real prioritisation pass.
Sweep, rank, dedupe.
Three jobs. All cheap, all batched, all biased toward the dataflow that actually matters.
Surface sweep
Runs across the full cross-package call graph in seconds and clusters near-duplicate taint flows so the queue isn't full of the same finding wearing different jackets.
Path ranking
Emits a ranked queue of candidate exploit paths so Griffin doesn't spend its reasoning budget on noise. Top-of-queue is signal-dense by design.
Cluster dedup
Collapses variants of the same root cause into one finding with N affected sinks. Reviewers see the cause once, not the same fix copy-pasted across packages.
Sized for fleet sweeps.
Architectural commitments
- ~13B dense transformer, distilled from Griffin.
- Ranking and clustering head fine-tuned on labelled taint-path datasets.
- Attention biased toward dataflow tokens (sources, sinks, sanitiser ops).
- Batched inference optimised for full-repo sweeps.
- INT8 quantisation for cost-per-scan at fleet scale.
Between the commit and the reasoning budget.
Inline catches a suspicious sink at commit time.
Sweeps the repo and gathers every taint flow that could feed that sink.
Clusters duplicates, scores each path, hands a short ranked queue to Griffin.
Hypothesises the exploit, attempts a disproof, writes the patch.
Eagle exists so Griffin only ever sees ranked, deduped, reachable candidates.
From ranking head to standalone triage.
How Eagle was built, milestone by milestone, and what is on the bench right now.
- Q1 2024
First Eagle prototype as a ranking head.
Eagle started as a ranking head bolted on top of an early Griffin checkpoint. The motivation was simple: Griffin produced high-quality reasoning per candidate path, but production scanners surfaced tens of thousands of taint paths per repo. Spending Griffin budget on every candidate was infeasible. Eagle's job was to triage that queue before Griffin saw it.
- Q3 2024
Eagle as a standalone 13B dense model.
The ranking head was split out into its own dense 13B model trained specifically on labelled taint-path data — source/sink pairs annotated by senior security engineers across roughly 800k open-source paths. Attention was biased toward dataflow tokens (taint operators, package coordinates, sink categories), which produced a measurable lift on cross-package taint path recall over generic code models.
- Q1 2025
Clustering head + dedup.
Production scans were surfacing near-duplicate taint flows — the same root cause showing up across many sinks. Eagle's clustering head was added to collapse those into a single finding with N affected sinks. Median finding count per repo dropped roughly 40% with no loss of recall, which materially reduced the Griffin reasoning budget Eagle was feeding.
- Q3 2025
INT8 quantisation pipeline.
Eagle quantised to INT8 weights for shared-cloud-tier deployments. p95 sweep latency on a 5,000-package monorepo dropped to roughly 510ms. The quantisation pipeline was designed so that the dense weights and the quantised weights ship from the same training checkpoint, with the quantised path used in any tier that pays per scan.
- Q1 2026
Ranking head v2.
Retrained the ranking head on a larger labelled dataset, raising top-5 candidate-path recall to 94%. Eagle now emits a confidence score per candidate path so Griffin routes only above-threshold candidates by default. p95 sweep latency on the same monorepo benchmark dropped to roughly 420ms.
- Now
Current research direction.
Two research tracks: (1) cross-language taint awareness — improving Eagle's behaviour on polyglot repos where a JavaScript front-end calls into a Go backend via an inter-process boundary; (2) feedback-loop training from the disproof pass — when Griffin disproves an Eagle-ranked candidate, that signal is folded back into the ranker's next training run.
Input, sweep, rank, output.
Eagle's job, end to end, in four boxes.
Input
Full call graph from the scanner fusion (11 scanners deduped) plus taint paths surfaced from the deterministic engine.
Sweep
Eagle walks the candidate paths in parallel, batched, with INT8 inference for cost-per-scan.
Rank + cluster
Top-N candidates emerge with a confidence score and a cluster ID; near-duplicates collapse.
Output
Ranked queue feeds Griffin's reasoning pass. Below-threshold candidates remain available in the console for human triage.
Sweep your repo, rank the surface.
Run Eagle over your call graph and see a ranked, deduped queue land in front of Griffin.