← Concepts & Glossary
Concept · Model Distillation

How a 1B model inherits a 70B model's security taste.

Lino runs inline on a developer's laptop in under 100 ms. Griffin L is seventy times its size and takes seconds to answer. The student inherits most of the teacher's judgement on a narrow but useful slice of security tasks — sink detection, sanitiser scoring, inline triage — and concedes the rest. The technique that makes that trade work is distillation.

What distillation does

Three sentences, no hand-waving.

A smaller "student" model is trained to mimic the input-to-output behaviour of a larger "teacher", recovering most of the teacher's capability at a fraction of the parameter count.

In plain label distillation, the student only sees the teacher's final answer. In trace distillation — the variant Lino uses — the student also sees the teacher's intermediate reasoning, so it learns the reasoning shape and not just the verdict.

The result is a model that is dramatically smaller and faster, that gives up some of the teacher's reach — long context, deep multi-hop reasoning, novel-pattern generalisation — and keeps the parts that matter for its specific job.

The Lino recipe

Four steps from Griffin L to Lino.

Order matters. Skipping the trace step produces a faster classifier that confidently misjudges sanitised flows. Skipping quantisation produces a model that's too slow to live inline. Skipping the realistic prompt distribution produces a model that fails on the prompts it will actually see.

Step 01

Sample security-relevant prompts

Inputs are drawn from real engineering workflows: sink detections, sanitiser-quality checks, dangerous-import flags, suspicious deserialisation patterns, and inline questions a developer would actually ask their IDE. We don't sample from synthetic prompt collections — the distribution has to look like what the inline model will face in production.

Step 02

Run Griffin L on each prompt

Griffin L produces the answer and, crucially, the structured reasoning trace that led to it: the hypothesised exploit, the cited path through the call graph, the disproof attempt, the proposed patch. The trace is the supervision signal — not just the final label — and is what turns label distillation into trace distillation.

Step 03

Train the 1B student on labels AND traces

The student model is optimised against two objectives in parallel: (input, final-label) gives it the verdict, (input, intermediate-trace) forces it to learn the reasoning shape. Both signals are weighted; trace distillation prevents the student from collapsing to a confident-but-shallow classifier.

Step 04

Quantise to INT8 and ship

Once the student passes the eval harness against Griffin, its weights are quantised to INT8 for on-device inference, packaged with the IDE extension and CLI, and pinned by SHA. The same artifact runs identically on every developer machine — no cloud round-trip, no per-call cost, no source code leaving the laptop.

Why trace distillation matters

The label is cheap. The reasoning shape is the asset.

Plain label distillation hands the student a stack of (input, answer) pairs and tells it to fit them. The student learns to predict the verdict — and forgets the shape of the reasoning that produced it. That is fine for image classification. It is not fine for security, because security verdicts depend on conditions the model has to actively check: is this sink reachable, is this sanitiser sufficient, does this dangerous import actually get called.

Trace distillation forces the student to walk through the same intermediate steps the teacher walked through. The supervision signal includes the hypothesis, the cited path, the sanitiser check, the disproof attempt. The student doesn't just learn the answer — it learns the procedure that produces the answer, which is what generalises to inputs it hasn't seen.

That procedure is what gives Lino its accuracy at sub-100 ms. The student is small, but it's following a reasoning recipe inherited from a model seventy times its size.

Label distillation vs. trace distillation

Teacher (Griffin L)
   prompt ─┐
           ▼
     hypothesise
           │
           ▼
    cite path
           │
           ▼
  attempt disproof
           │
           ▼
       verdict

────────────────────────────

Plain label distillation
   prompt ──────────► verdict
   (student never sees the steps)

Trace distillation (Lino)
   prompt ──► hypothesise
              │
              ▼
            cite path
              │
              ▼
       attempt disproof
              │
              ▼
            verdict
   (student is supervised on
    every intermediate step)
Keeps vs. gives up

Honest about the trade.

A 1B student does not match a 70B teacher across the board. The point of the lineup is that it doesn't have to — Eagle and Griffin pick up everything Lino concedes, and the routing layer knows when to defer.

Keeps

  • Sink detection across the common dangerous APIs.
  • Sanitiser-quality scoring for known sanitiser libraries.
  • Fast triage of inline patterns at commit time.
  • Conservative refusal behaviour — defers to Griffin when uncertain.
  • On-device inference with no network egress.
  • Tokeniser fluency on CWE / CVE IDs and taint operators.

Gives up

  • Deep multi-hop call-graph reasoning across packages.
  • Adversarial disproof passes that try to refute a hypothesis.
  • Long-context behaviour — only ~8k tokens are usable.
  • CWE classification on novel patterns it hasn't seen distilled.
  • Repo-wide ranking — that's Eagle's job, not Lino's.
  • Cross-language inference of the same vulnerability class.
Related concepts

Connect the lineup.

Distillation is what makes Lino possible. The corpus is what makes Griffin worth distilling from. The evaluation harness is what proves the student didn't silently regress on the cases that matter.

Run the student against your own code.

Drop Lino into the IDE. Measure its latency, refusal rate, and agreement with Griffin on the cases that matter. The distillation work shows up in those three numbers.

Browse all concepts