A general-internet-trained model drifts on security questions because most of its training data is not about security. It has seen orders of magnitude more cooking recipes and product tutorials than CVE patches. When you ask it about a deserialisation gadget, it answers from inference rather than memory — and that is where hallucinations come from.
A curated security corpus shifts that distribution materially. The model has actually read the disclosure, seen the patch, and been graded on whether it can describe the reachability path. Refusal rates fall, hallucination rates fall, and the tokeniser stops shattering CWE IDs into byte-pair noise. The Griffin, Eagle, and Lino lineup is trained on one of these — and the contents of that corpus matter more than the parameter count.
Each class earns its place because it carries supervision signal a general corpus cannot. Every document has provenance metadata, a security tag, and a label that lets it participate in training rather than just inflate token counts.
Each disclosure is linked to its remediating commit, so the model sees the vulnerable code, the fix, and the diff between them — not just an abstract description.
Curated, deduplicated, and stripped of weaponised payloads. The signal is the reasoning chain, not a ready-to-run weapon.
Authoritative disclosure text from upstream maintainers and coordination bodies — the ground truth for how a class of bug is described publicly.
Real analyzer output graded by senior engineers as truly exploitable, latent, or false positive — the supervision signal that teaches the model what reachable actually means.
Annotated source-to-sink paths across hundreds of thousands of public repositories, with sanitiser edges, sink severities, and CWE classes attached.
Tactic, technique, and procedure mappings link source-level patterns to the adversary behaviour they enable, so the model can talk about both.
Input grammars, crash classifications, and triage decisions from public fuzz campaigns — coverage of the bugs that pattern scanners cannot find.
Typosquats, dependency-confusion incidents, and post-install script analyses — the literature of supply-chain attacks rather than benign README files.
Selected reverse-engineering snippets with vulnerability annotations, so the model gains intuition for binary-level patterns and not just source-level ones.
Public discussion of how a bug was found, triaged, and fixed — including the dead ends — captures the reasoning style of real triage, not just the verdict.
Canonical guidance text grounds the model in the vocabulary defenders actually use, instead of approximations inferred from general web text.
Everything below is filtered out at ingest, not after the fact. The point isn't purity — it's that contamination from these classes degrades the model's behaviour on security tasks in ways that are hard to undo with later fine-tuning.
Repos and findings that pass through Safeguard's scanners stay in the customer's tenant. They are never used as training data, at any tier.
Common-crawl-style dumps drag in marketing fluff, low-signal forum text, and untrustworthy code — exactly the noise we want absent from a security model.
Q&A code lifted out of disclosure context teaches a model patterns that look reasonable but ship CVEs in production. We exclude it by default.
Personal data and private conversations have no place in a model that reasons about exploit primitives. Ingestion explicitly screens them out.
Disassembly we don't have the right to redistribute stays out, regardless of how useful it might be — provenance has to be auditable.
Synthetic security text loops a model's mistakes back into itself. We refuse the convenience and pay for human-labelled exemplars instead.
Vendor decks describe a sanitised world. Defenders don't live in that world, and a model trained on it learns to flinch at the wrong words.
General-internet-trained models flinch at words like 'exploit' and 'gadget' because their RLHF rewarded refusal. A security corpus weighted on disclosures, write-ups, and patch diffs gives canonical answers and lets the model help defenders instead of stonewalling them.
CWE-89, CVE-2024-XXXX, source→sink arrows, sanitiser markers — these become single tokens with neighbours in embedding space that are other vulnerable patterns. Long-context behaviour improves because the model isn't burning attention on shattered byte-pair fragments.
Every plausible-sounding statement about a CWE class or reachability pattern is anchored to labelled exemplars in training. The model has seen the canonical answer enough times that it stops inventing alternatives that sound right but aren't.
The pipeline runs end-to-end on every training cut. Provenance metadata flows from ingest into the labelled exemplars and into the RLHF preference data, so any claim about the corpus can be traced back to a document and an annotator.
Disclosures, patch diffs, advisories, fuzzing corpora, and taint graphs are pulled from authoritative sources with provenance metadata attached at the document level.
Near-duplicate disclosures, mirrored advisories, and reposted PoCs are collapsed. The dedup keys are CVE-aware so we don't accidentally erase a critical variant.
Senior security engineers grade exemplars for reachability, exploitability, CWE class, and sanitiser coverage. The labels become supervision signal — not annotation gig-work output.
Preference data is collected against a rubric written by offensive-security engineers. Plausible-sounding hallucinations on CWE classification are penalised; unverified reachability claims are treated as failures.
The corpus is one half of the story. The other half is how the lineup turns it into three different models — each with a different latency and reasoning budget — without losing the security taste the corpus encodes.
Ask Griffin a question a general model would refuse. Compare the answer. The corpus is the difference.