Model Family · Lion

Lion. The commit-time gut check.

Lion is the ~1B distilled-from-Griffin inline model. It runs locally inside the IDE, CLI, and pre-commit hook with sub-100 ms latency and zero source-code egress — so a developer never has to choose between speed and a real second pair of eyes.

~1B
Distilled parameters
INT8
Quantised weights
<80ms
p95 inline latency
100%
On-device inference
What Lion does

Fast, local, never the bottleneck.

Three jobs in the editor. Sub-100 ms so the developer never disables it.

Inline sink detection

Catches obvious dangerous sinks — unsafe deserialization, SSRF-able URL builders, unsanitised SQL, command-exec, path traversal — before code ever reaches CI.

Sanitiser awareness

Flags weak or missing sanitiser usage in known dangerous flows. Knows the difference between a real allow-list and a check that looks like one.

Local-only inference

Model weights ship with the IDE extension and CLI. No source code, prompts, or embeddings leave the developer machine. Zero network egress required.

Distillation lineage

Griffin's reasoning, pruned to inline shape.

How Lion inherits Griffin

  • Distilled from Griffin L with security-task distillation: both label distillation and intermediate-trace distillation.
  • INT8 quantised weights with calibration on security-task tensors.
  • Runs on CPU and modern Apple Silicon / x86 GPU laptops with no separate runtime.
  • Ships in the IDE extension, CLI, and pre-commit hook from one signed artefact.
  • Same security-augmented tokeniser as Griffin, pruned to the inline-relevant vocab.
Privacy posture

On the laptop, or not at all.

What stays on the developer machine

  • Weights are signed and verified at install; tampered binaries refuse to load.
  • No telemetry by default. The IDE extension makes no outbound requests at rest.
  • Opt-in anonymised telemetry, off until explicitly enabled, scoped to model latency stats — never source text.
  • Air-gapped operation supported out of the box; no online activation, no licence phone-home.
  • Source code, prompts, and embeddings never leave the developer machine under any configuration.
  • Update channel is verifiable: signed manifests, reproducible builds, content-addressed weight hashes.
Where Lion fits

Before the queue ever exists.

01IDE / CLI / pre-commit
Developer edits

Lion runs as the developer types and on every staged change.

02Lion
Local inference

Sub-80 ms sink + sanitiser check, fully on device, no network call.

03Editor
Editor surface

Finding shows up inline with a one-line explanation and the offending span.

04Developer
Fix before push

Issue is gone before CI ever sees it. No queue, no triage tax.

Catch it inline, fix it in the editor, never spend a triage hour on it.

Development history

From label distillation to trace distillation.

How Lion was built, milestone by milestone, and what is on the bench right now.

  1. Q2 2025

    First Lion prototype distilled from Griffin S.

    The motivation was to get reasonable security-task accuracy into the IDE without round-tripping to a cloud model. The first prototype was distilled from a Griffin S (14B) checkpoint via straight label distillation — student matches teacher's final label on a curated prompt set. Latency on Apple Silicon laptops hit ~140ms p95. Accuracy was acceptable, but the trace-quality story was missing.

  2. Q3 2025

    Trace distillation pipeline.

    Plain label distillation forgot the reasoning shape the bigger model walked. The pipeline was extended to trace distillation: the student is supervised on both (input, final label) AND (input, intermediate reasoning trace) from the teacher. This is what gave Lion its accuracy at sub-100ms — the student inherits the reasoning steps, not just the answer.

  3. Q4 2025

    Lion 1.0 GA.

    Lion 1.0 shipped with the VS Code extension, distilled from Griffin L (70B) rather than Griffin S, with INT8 weights. Sub-100ms p95 on M-series Apple Silicon and recent x86 GPU laptops. Sink detection F1 on the held-out evaluation set crossed 0.78.

  4. Q1 2026

    Signed weights + JetBrains + Cursor.

    Lion weights now ship as signed sigstore bundles; the IDE extension verifies on install and refuses an unsigned weight file. JetBrains plugin (IntelliJ IDEA, PyCharm, GoLand, WebStorm) and Cursor extension shipped. Identical inline feature surface across editors.

  5. Now

    Current research direction.

    Three tracks: (1) longer reasoning depth without breaking the latency budget — distilling deeper traces while quantising more aggressively; (2) language-specific Lion heads — a JVM-focused student, a Python-focused student, a Go-focused student, with shared base weights but task-specific fine-tunes; (3) sanitiser-quality scoring — moving beyond binary sink detection to a graded score of how robust the sanitiser path is.

Why on-device, why now

Four constraints, one model.

Each design constraint Lion is built around, and what falls out of it.

Latency budget

Sub-100ms p95 because anything slower interrupts the developer's flow. This is the load-bearing constraint that drives the parameter count, the quantisation, and the runtime choice.

No source code egress

Lion weights ship with the IDE extension. Prompts and code never leave the developer machine by default. Air-gapped operation is supported with no additional install.

Signed weights

Every weight bundle is sigstore-signed. The extension verifies before loading and refuses to run unsigned weights. Lion doesn't ship through pip or npm.

Distilled from the production teacher

The teacher is the Griffin L variant currently in production. When the teacher changes, the distillation pipeline re-runs and the next Lion weight bundle ships with the IDE extension's next release.

Put Lion in the editor.

Sub-80 ms on the developer machine, no egress, distilled from the same brain as the rest of the lineup.