Model Family · Eagle

Eagle. The surface scanner.

Eagle is the 13B mid-tier triage model. It does wide-angle sweeps across the cross-package call graph, ranks taint paths, and feeds the queue Griffin draws from — so the heavyweight reasoner spends its budget on candidates that already survived a real prioritisation pass.

~13B
Parameters, dense
<500ms
p95 per package
94%
Top-5 taint-path recall
87%
Triage precision
What Eagle does

Sweep, rank, dedupe.

Three jobs. All cheap, all batched, all biased toward the dataflow that actually matters.

Surface sweep

Runs across the full cross-package call graph in seconds and clusters near-duplicate taint flows so the queue isn't full of the same finding wearing different jackets.

Path ranking

Emits a ranked queue of candidate exploit paths so Griffin doesn't spend its reasoning budget on noise. Top-of-queue is signal-dense by design.

Cluster dedup

Collapses variants of the same root cause into one finding with N affected sinks. Reviewers see the cause once, not the same fix copy-pasted across packages.

Architecture

Sized for fleet sweeps.

Architectural commitments

  • ~13B dense transformer, distilled from Griffin.
  • Ranking and clustering head fine-tuned on labelled taint-path datasets.
  • Attention biased toward dataflow tokens (sources, sinks, sanitiser ops).
  • Batched inference optimised for full-repo sweeps.
  • INT8 quantisation for cost-per-scan at fleet scale.
Where Eagle fits

Between the commit and the reasoning budget.

01Lion
Lion flag

Inline catches a suspicious sink at commit time.

02Eagle
Eagle sweep

Sweeps the repo and gathers every taint flow that could feed that sink.

03Eagle
Eagle ranks

Clusters duplicates, scores each path, hands a short ranked queue to Griffin.

04Griffin
Griffin reasons

Hypothesises the exploit, attempts a disproof, writes the patch.

Eagle exists so Griffin only ever sees ranked, deduped, reachable candidates.

Development history

From ranking head to standalone triage.

How Eagle was built, milestone by milestone, and what is on the bench right now.

  1. Q1 2024

    First Eagle prototype as a ranking head.

    Eagle started as a ranking head bolted on top of an early Griffin checkpoint. The motivation was simple: Griffin produced high-quality reasoning per candidate path, but production scanners surfaced tens of thousands of taint paths per repo. Spending Griffin budget on every candidate was infeasible. Eagle's job was to triage that queue before Griffin saw it.

  2. Q3 2024

    Eagle as a standalone 13B dense model.

    The ranking head was split out into its own dense 13B model trained specifically on labelled taint-path data — source/sink pairs annotated by senior security engineers across roughly 800k open-source paths. Attention was biased toward dataflow tokens (taint operators, package coordinates, sink categories), which produced a measurable lift on cross-package taint path recall over generic code models.

  3. Q1 2025

    Clustering head + dedup.

    Production scans were surfacing near-duplicate taint flows — the same root cause showing up across many sinks. Eagle's clustering head was added to collapse those into a single finding with N affected sinks. Median finding count per repo dropped roughly 40% with no loss of recall, which materially reduced the Griffin reasoning budget Eagle was feeding.

  4. Q3 2025

    INT8 quantisation pipeline.

    Eagle quantised to INT8 weights for shared-cloud-tier deployments. p95 sweep latency on a 5,000-package monorepo dropped to roughly 510ms. The quantisation pipeline was designed so that the dense weights and the quantised weights ship from the same training checkpoint, with the quantised path used in any tier that pays per scan.

  5. Q1 2026

    Ranking head v2.

    Retrained the ranking head on a larger labelled dataset, raising top-5 candidate-path recall to 94%. Eagle now emits a confidence score per candidate path so Griffin routes only above-threshold candidates by default. p95 sweep latency on the same monorepo benchmark dropped to roughly 420ms.

  6. Now

    Current research direction.

    Two research tracks: (1) cross-language taint awareness — improving Eagle's behaviour on polyglot repos where a JavaScript front-end calls into a Go backend via an inter-process boundary; (2) feedback-loop training from the disproof pass — when Griffin disproves an Eagle-ranked candidate, that signal is folded back into the ranker's next training run.

Where Eagle fits in the pipeline

Input, sweep, rank, output.

Eagle's job, end to end, in four boxes.

Step 01

Input

Full call graph from the scanner fusion (11 scanners deduped) plus taint paths surfaced from the deterministic engine.

Step 02

Sweep

Eagle walks the candidate paths in parallel, batched, with INT8 inference for cost-per-scan.

Step 03

Rank + cluster

Top-N candidates emerge with a confidence score and a cluster ID; near-duplicates collapse.

Step 04

Output

Ranked queue feeds Griffin's reasoning pass. Below-threshold candidates remain available in the console for human triage.

Sweep your repo, rank the surface.

Run Eagle over your call graph and see a ranked, deduped queue land in front of Griffin.