Model Family · Lifecycle

How the lineup is built. End-to-end, in detail.

This is the unabridged technical pipeline from raw corpus material through to a shipped model — for buyers who want to verify the work rather than read marketing. Eight stages, each gated, each versioned, each documented in a per-release provenance bundle that customers can audit in-house.

Back to the family

11M+

Curated documents

~28k

Security-specific tokens

Lifecycle stages

Customer code in training

The eight lifecycle stages

Raw documents to a signed, staged release.

Each stage has a hard gate. A regression at any stage blocks release, regardless of capability gains elsewhere.

Stage 01

Corpus ingest

The lifecycle begins with a curated set of document sources, not a general web crawl. Crawlers are restricted to defender-frame material: CVE feeds, exploit research write-ups, advisory text, taint graphs extracted from public source, MITRE ATT&CK technique descriptions, OSV / NVD / GHSA records, vendor PSIRT bulletins, security-relevant RFCs, OWASP guidance, and NIST publications. Every source has its content licence verified before it enters the pipeline; the licence and the SHA of each ingested document are recorded in a per-release manifest. General-web pages, social media, marketing copy, and chat logs are excluded by allowlist. The active corpus holds roughly 11 million documents and grows on a rolling weekly cadence as new disclosures land.

Specs

Document volume~11M
Source allowlistCVE / OSV / NVD / GHSA / PSIRT / ATT&CK / RFCs / OWASP / NIST
Refresh cadenceWeekly rolling
ProvenancePer-document SHA, licence pinned

Stage 02

Deduplication + decontamination

Documents are pushed through cross-source near-duplicate detection using MinHash signatures and locality-sensitive hashing. A second pass decontaminates against the held-out evaluation sets — we do not want to train on what we evaluate against, and the held-out splits are pinned per release. A third pass removes machine-generated text: self-training on prior model outputs is explicitly prohibited, since it amplifies hallucinations on CWE classification. A fourth pass scans for any document containing patterns characteristic of customer code and drops it. The whole de-dup + decontamination stage typically removes 18-22% of raw ingest before training even begins.

Specs

Near-dupMinHash + LSH, threshold 0.85
Eval decontamn-gram + embedding overlap
Synthetic textRemoved at ingest
Customer-code patternsDrop-if-match

Stage 03

Annotation + labelling

Senior offensive-security engineers label a curated subset of the surviving corpus for ground-truth supervision. The annotation rubric is owned by the security team, not delegated to crowdworkers. Labels include CWE class per finding, source-and-sink taint paths per repository, exploit class per public write-up, and sanitiser-quality grades per code path. The labelled subset is what the reward model and the trace-quality auditors see — most of the corpus stays unlabelled for unsupervised pretraining, but the labelled head is what teaches the family to reason about security as opposed to memorising security text. Inter-annotator agreement is tracked per CWE class and reviewed quarterly.

Specs

Eagle taint pairs~800k source/sink
Griffin chains~12k multi-hop
Remediation eval~650 patch-pass-test repos
LabellersSenior offensive-security engineers

Stage 04

Pretraining

Base pretraining runs against the curated corpus with the security-augmented tokeniser, which adds roughly 28k tokens covering CWE / CVE identifiers, taint operators, sink names, sanitiser primitives, and purl coordinates. The Aegis attention architecture handles long context through sliding-window plus landmark attention with retrieval gates that page in the relevant call-graph slice rather than feeding the whole repo through dense attention. Griffin Zero adds an 8-expert mixture-of-experts head with top-2 routing per token, biased toward sink-handling experts at the embedding layer. Pretraining compute scales by tier — from an 11x H100 footprint at the Growth tier through to a 22x H100 multi-AZ shape at the Mature tier — with full SHA-pinned weight attestation per checkpoint.

Specs

Tokeniser~28k security tokens
AttentionAegis: sliding-window + landmark + retrieval
Zero MoE8 experts, top-2 routing
Hardware11x H100 (Growth) → 22x H100 multi-AZ (Mature)

Stage 05

Security RLHF

Reinforcement learning from human feedback is run against preference data labelled exclusively by senior offensive-security engineers. No crowdworkers. The reward model targets three failure modes that general-purpose RLHF rubrics tend to miss: hallucinated CVE numbers (the model inventing identifiers that look plausible but do not exist), refusal on legitimate security research (defensive flinching at the word 'exploit'), and unstructured reasoning that drifts away from the trace contract. A response that reaches the right verdict but skips a stage of the trace is rated below a response that reaches the same verdict with the full chain shown. The rubric is versioned per release and shipped with the attestation bundle.

Specs

LabellersInternal security engineers only
Penalty class AHallucinated CVE / advisory IDs
Penalty class BRefusal on legitimate research
Penalty class CTrace-contract drift

Stage 06

Adversarial red team

Every checkpoint that clears RLHF is handed to the internal red team. The red team runs a 1,800-item adversarial suite covering known jailbreak families, prompt-injection payloads, role-play coercion, refusal-rate probes against legitimate security questions, and internal variations that have not been published. The suite is owned by the red team, not by the modelling team, and is refreshed each release with new families that landed in the public literature. A checkpoint that regresses on any of the suites — including refusal-rate suites — does not ship, regardless of capability gains elsewhere. This is the hard gate that holds the rest of the pipeline honest.

Specs

Suite size1,800 items, refreshed per release
CoverageJailbreak / injection / coercion / refusal
OwnershipRed team, separate from modelling
Regression policyBlock-release on any suite regression

Stage 07

Distillation (for Lino)

Lino is not trained from scratch on a smaller corpus — it is distilled from a Griffin L teacher through a trace distillation pipeline. The pipeline samples security-relevant prompts from the labelled subset, runs the Griffin L teacher to capture both the input-to-final-label pair and the input-to-intermediate-trace pair, and trains the 1B student against both objectives simultaneously. The student learns the verdict and the reasoning shape at the same time, which is what keeps Lino's behaviour consistent with Griffin on the same finding rather than drifting into a different model. After distillation the student is INT8 quantised for shipping inside the IDE extension and the pre-commit hook.

Specs

TeacherGriffin L (70B)
StudentLino 1B
TargetsLabel + intermediate trace, jointly
Ship quantisationINT8

Stage 08

Eval gate + cited-trace audit

The final gate is two-sided. The quantitative side runs the checkpoint against six held-out evaluation sets — taint-path recall, exploit-hypothesis accuracy, remediation patch-pass-test, adversarial prompt resistance, refusal-rate on legitimate research, and CVE-classification calibration — with thresholds linked directly to the published benchmarks. The qualitative side is a manual audit of 300 reasoning traces by the engineering team. Auditors grade trace structure, evidence citation, disproof attempts, and patch-proposal quality. A capability win that comes with a trace-quality regression is rejected the same way a capability regression would be. Only when both sides clear does the checkpoint enter staged rollout.

Specs

Quant evals6 held-out sets
Manual audit300 traces per release
AuditorsEngineering team
Block-releaseTrace-quality regressions count as regressions

Quantisation lineage

One checkpoint, three numerical paths.

The same training checkpoint flows to FP16 for the default precision path, INT8 for the production cost path, and FP8 for sovereign deployments on H100-class hardware. Each path is calibrated against the same held-out eval set so accuracy parity is verifiable before shipping.

FP16 — default

The reference precision. Every checkpoint is trained and evaluated in FP16; benchmark numbers are reported against this path. Customers who want maximum reasoning fidelity run this shape.

INT8 — production cost path

Post-training quantisation with security-eval calibration. Activation-aware weight quantisation preserves the sink-handling experts in Zero and the dataflow-attention heads in Eagle. Throughput roughly doubles at parity accuracy on the security suite.

FP8 — sovereign

H100-only path. FP8 is the shape we ship into sovereign and air-gapped deployments where customers want lower memory footprint than FP16 but stricter calibration than INT8. Accuracy delta against FP16 is published per release.

Variant	FP16	INT8	FP8	Notes
Griffin Lite (8B)	Default	Production cost path	Sovereign	All three paths share the same RLHF checkpoint.
Griffin S (14B)	Default	Production cost path	Sovereign	FP8 only on H100-class hardware.
Griffin M (32B)	Default	Production cost path	Sovereign	INT8 is the typical shared-cloud shape.
Griffin L (70B)	Default	Production cost path	Sovereign	Default Safeguard production tier.
Griffin Zero (671B-MoE)	Sparse FP16	—	Sovereign	MoE shapes ship FP16 or FP8 only.
Eagle (13B)	Available	Default	—	Batched INT8 is the shipping shape.
Lino (1B)	—	Default	—	INT8 is the only shipped shape.

Quantisation paths are post-training. The same RLHF-completed weights are calibrated independently for each numerical format.

Hardware & deployment

Where each weight physically lives.

GPU shapes per variant. Dense Griffin on A100 / H100, sparse Zero on multi-GPU clusters, Eagle as a batched INT8 service, Lino entirely on the developer machine.

Dense Griffin (Lite / S / M / L)

A100 / H100

Dense Griffin variants run on standard datacenter-class GPUs. Lite and S fit on a single A100-80G with INT8; M and L are tensor-parallel across multiple H100s for the production FP16 / FP8 paths.

Griffin Lite (8B) — 1x A100-80G INT8 / 1x H100 FP16
Griffin S (14B) — 1x H100-80G FP16 / 2x A100-80G INT8
Griffin M (32B) — 2x H100-80G FP16 / 4x A100-80G INT8
Griffin L (70B) — 4x H100-80G FP16 / 2x H100-80G FP8

MoE Griffin Zero

Multi-GPU cluster

Griffin Zero is a 671B-parameter mixture-of-experts head with eight experts and top-2 routing per token. Sparse activation keeps roughly 5.5% of parameters live per forward pass, so cluster sizing tracks active parameters not nominal count.

Inference: 16x H100-80G, NVLink expert sharding
FP8 path: 8x H100-80G, expert-aware quantisation
Sovereign air-gapped: on-prem H100 cluster with audit-log export
Context: 256k usable via Aegis + retrieval gates

Eagle batched inference

INT8 fleet

Eagle runs as a batched INT8 service across the whole repo on every push. Batching is across packages, not across tenants — each batch carries one tenant identity and one signed call-graph slice from the deterministic engine.

Default: 4x A100-40G INT8, batch 32 packages
Burst: 8x A100-80G INT8, batch 128 packages
p95 latency: ~420 ms per package
Throughput target: full repo sweep on every push

Lino on-device

CPU / Apple Silicon / laptop GPU

Lino is the only family member that does not require datacenter hardware. The INT8 1B head runs locally inside the IDE extension, the CLI, and the pre-commit hook — source code never leaves the developer machine. Apple Silicon is a first-class target.

Apple Silicon: M2 / M3 / M4, ~80 ms p95
x86 CPU: AVX2 + INT8 kernels, ~140 ms p95
Laptop GPU: any 6GB+ discrete card, ~50 ms p95
Memory footprint: ~900 MB resident

Sampling & decoding

What ships at inference time.

The trace contract is enforced at the decoder, sampling is constrained against the CWE / CVE token classes, and Griffin Zero runs an adversarial disproof pass in parallel with the main decode.

Structured-output decoding

The trace contract is enforced at the decoder, not learned in expectation. Every Griffin call emits a structured trace — hypothesise, cite, disprove, patch — and the decoder is constrained so that a malformed trace cannot be produced. Reviewers always see all four stages or the call fails, no probabilistic best-effort.

Constrained sampling on CWE / CVE token classes

Sampling against CWE and CVE identifiers is constrained to the canonical token classes that the security tokeniser exposes. The decoder physically cannot emit a CVE identifier that is not in the published namespace, which is what closes the door on hallucinated CVE numbers — the most embarrassing failure mode for a security model.

Adversarial disproof pass (Griffin Zero)

Zero runs an adversarial disproof pass in parallel with the main decode. Once the primary trace lands on a verdict, a second decode is conditioned on the same input plus the trace and asked to refute the conclusion. If the disproof pass succeeds, the survivor is downgraded; if it fails to refute, the verdict ships with the disproof attempt log attached.

Trace-quality gating at the decoder

Trace structure is checked online during decode. If a trace drops a stage or fails the citation pass — for instance, claiming reachability without producing a call-graph path — the decoder rolls back to the last valid checkpoint and re-samples. The user never sees a malformed trace; they either see a valid one or no answer.

Release pipeline

Six gates, in order.

No release skips a gate. Telemetry from one release feeds curation for the next — the cycle is closed but customer code never enters it.

Curation

Ingest, dedup, decontamination, labelling. Per-release dataset SHA pinned.

Pretraining + RLHF

Aegis pretraining followed by security-domain RLHF on engineer-labelled preferences.

Adversarial red team

1,800-item suite. Any regression blocks release, including refusal-rate regressions.

Eval gate + trace audit

Six held-out evals plus 300-trace manual audit. Both sides must clear.

Staged rollout

Shared cloud → dedicated → VPC-isolated → sovereign, 14-day soak per tier.

Post-release telemetry

Anonymised eval signal feeds the next curation pass. Customer code is never part of it.

Staged rollout uses a 14-day soak per tier. A regression caught in soak rolls back the tier before promotion to the next.

Reproducibility & provenance

Verifiable, not aspirational.

Recipes, datasets, and weights are versioned per release. The provenance bundle is shippable to customers for in-house verification.

Training recipe versioned per release

Every release ships with a recipe hash covering tokeniser version, RLHF rubric version, red-team suite version, and eval set versions. Recipes are stored in an append-only ledger and are reviewable under NDA.

Dataset snapshot SHA per release

The exact set of source documents used for pretraining and labelling is fixed at a SHA per release. We can show you which 11M documents went into release N and which delta was added for release N+1.

Model weights signed and attested

Weights are signed at the artefact level. The signature plus the hardware attestation chain means a customer can verify the weights running in their VPC match the weights shipped from the build pipeline.

Customer-verifiable provenance bundle

On request we ship a provenance bundle: recipe hash, dataset SHA, RLHF rubric, red-team suite version, eval scores, trace audit notes, and the signed weight manifest. Verification is in-customer.

What is NOT in the lifecycle

Exclusions are contractual, not aspirational.

NOT in pretraining

The base pretraining corpus excludes the following categories by allowlist and ingest-time filters.

Customer code — at any tier, ever.
General web crawl, social media text, scraped chat logs.
LLM-generated synthetic security text.
Marketing copy, product collateral, vendor announcements.
StackOverflow snippets without a security frame.
Personally identifiable information or scraped credentials.

NOT in fine-tuning

Fine-tuning, RLHF, and trace-distillation passes carry the same exclusion list, plus tighter restrictions on disclosure-derived material.

Proprietary disassembly from closed-source binaries.
Unconsented disclosure threads from private mailing lists.
Individual customer telemetry from the inference plane.
Per-tenant prompt history, KV-cache contents, or finding bundles.
Customer-flagged false positives unless the customer opted in.
Any data covered by a customer-side opt-out flag.

The model family

Griffin, Eagle, Lino — one corpus, three speeds. The architectural overview of the lineup.

Read

Security corpus

The eleven-million-document defender-frame corpus the family is built on, what's in and what's out.

Read

Trace distillation

How Lino inherits Griffin's reasoning shape — joint label and intermediate-trace objectives.

Read

Benchmarks

Numbers next to the alternatives across six held-out evaluation sets. Methodology under NDA.

Read

Audit the work, not the marketing.

Provenance bundles, recipe hashes, dataset SHAs, red-team suites, and eval scores are available under NDA. Bring your security and ML teams; we will walk them through the pipeline.

See the benchmarks

How the lineup is built. End-to-end, in detail.

Raw documents to a signed, staged release.

Corpus ingest

Deduplication + decontamination

Annotation + labelling

Pretraining

Security RLHF

Adversarial red team

Distillation (for Lino)

Eval gate + cited-trace audit

One checkpoint, three numerical paths.

Where each weight physically lives.

Dense Griffin (Lite / S / M / L)

MoE Griffin Zero

Eagle batched inference

Lino on-device

What ships at inference time.

Structured-output decoding

Constrained sampling on CWE / CVE token classes

Adversarial disproof pass (Griffin Zero)

Trace-quality gating at the decoder

Six gates, in order.

Verifiable, not aspirational.

Training recipe versioned per release

Dataset snapshot SHA per release

Model weights signed and attested

Customer-verifiable provenance bundle

Exclusions are contractual, not aspirational.

NOT in pretraining

NOT in fine-tuning

Keep reading the stack.

The model family

Security corpus

Trace distillation

Benchmarks

Audit the work, not the marketing.