Model Family · Griffin

Griffin. The hypothesis engine.

Griffin is the heavyweight reasoning family — five size variants spanning 8B to a 671B-MoE flagship, all weighted purely on a cybersecurity corpus. It hypothesises exploit chains, cites the call-graph path, attempts a disproof against the project's sanitiser config, and writes the patch.

Size variants

One brain, five reasoning budgets.

Every variant shares the corpus, tokeniser and reasoning trace format. They differ in parameter count, context window and where they run.

VariantParametersContext windowLatency p95Deployment shapeTypical use
Griffin Lite8B32k~1.2sIDE-side cloud burst / CLI deep-scanFast single-finding reasoning.
Griffin S14B64k~2.8sCloudMid-depth call-graph reasoning, PR-level reviews.
Griffin M32B128k~5.5sCloudRepo-wide reasoning, transitive taint chains.
Griffin L70B128k~8sDedicated GPUMulti-hop cross-package exploit hypothesis. Default production tier.
Griffin Zero671B-MoE (~37B active)256k~12sMulti-GPU cluster / sovereignDeepest reasoning, supply-chain-scale audits.
Architecture

The internals that earn the verdict.

Architectural commitments

  • Mixture-of-experts (Zero: 8 experts, top-2 routing, ~5.5% activated params per token).
  • Security-augmented tokeniser with ~28k extra tokens covering CWE / CVE IDs, taint operators, package coordinates, and attack-pattern shorthand.
  • Sliding-window plus landmark attention for long-context call-graph reasoning at 256k.
  • Structured reasoning trace: hypothesise the exploit, cite the path, propose a disproof, propose a patch.
  • Security-domain RLHF using preference data labelled by senior offensive-security engineers, not generic annotation vendors.
Eval highlights

Measured against known ground truth.

81%
Exploit-hypothesis accuracy
98%
Adversarial prompt resistance
0.6%
Hallucination rate on security Q&A
94%
Top-5 candidate path retention vs CVE ground truth
Reasoning trace

The trace is the finding.

Every Griffin call emits a four-stage trace. Reviewers see the chain, not a single label, and can reject at any stage.

griffin-L · finding #4129
[01] HYPOTHESIS
     class: CWE-502 (unsafe deserialization)
     entry: HTTP POST /api/import-config
     gadget: pkg:maven/com.fasterxml.jackson.core/jackson-databind@2.9.10

[02] CITED PATH
     handler.parseRequest()  -> service.importConfig()
       -> codec.decode(bytes)  -> ObjectMapper.readValue(InputStream, Object.class)
     6 hops, 3 package boundaries, 1 sanitiser bypassed (allow-list mismatch).

[03] DISPROOF ATTEMPT
     - polymorphic typing disabled?  no (DefaultTyping.NON_FINAL active)
     - allow-list enforced?           partial; missing on nested key 'plugins'
     - sandbox or seccomp profile?    none on this code path
     refutation failed; finding stands.

[04] PROPOSED PATCH
     - replace ObjectMapper.readValue with constrained reader
       using ALLOWED_TYPES allow-list
     - bump jackson-databind to >= 2.15.2 (advisory-aligned)
     - add SecurityManager-equivalent unit test covering nested 'plugins'.
Auto-routing

Each finding goes to the cheapest variant that can handle it.

A triage score decides which Griffin size handles each candidate. You don't pay Zero-tier compute for an in-package call.

01
Triage score

Eagle assigns a complexity score from the call graph: depth, sanitiser ambiguity, cross-package edges, sink severity.

02
Variant selection

Cheap, in-package candidates route to Lite. Mid-depth PR work routes to S or M. Multi-hop cross-package paths route to L. Sovereign or long-budget audits route to Zero.

03
Reasoning pass

The chosen variant runs the hypothesise / cite / disprove / patch trace. The trace ships with the finding so reviewers can audit which variant produced what.

Put Griffin on your hardest path.

Pick the variant that fits your budget and watch it reason through your real call graph, not a benchmark.