Lion is the ~1B distilled-from-Griffin inline model. It runs locally inside the IDE, CLI, and pre-commit hook with sub-100 ms latency and zero source-code egress — so a developer never has to choose between speed and a real second pair of eyes.
Three jobs in the editor. Sub-100 ms so the developer never disables it.
Catches obvious dangerous sinks — unsafe deserialization, SSRF-able URL builders, unsanitised SQL, command-exec, path traversal — before code ever reaches CI.
Flags weak or missing sanitiser usage in known dangerous flows. Knows the difference between a real allow-list and a check that looks like one.
Model weights ship with the IDE extension and CLI. No source code, prompts, or embeddings leave the developer machine. Zero network egress required.
Lion runs as the developer types and on every staged change.
Sub-80 ms sink + sanitiser check, fully on device, no network call.
Finding shows up inline with a one-line explanation and the offending span.
Issue is gone before CI ever sees it. No queue, no triage tax.
Catch it inline, fix it in the editor, never spend a triage hour on it.
How Lion was built, milestone by milestone, and what is on the bench right now.
The motivation was to get reasonable security-task accuracy into the IDE without round-tripping to a cloud model. The first prototype was distilled from a Griffin S (14B) checkpoint via straight label distillation — student matches teacher's final label on a curated prompt set. Latency on Apple Silicon laptops hit ~140ms p95. Accuracy was acceptable, but the trace-quality story was missing.
Plain label distillation forgot the reasoning shape the bigger model walked. The pipeline was extended to trace distillation: the student is supervised on both (input, final label) AND (input, intermediate reasoning trace) from the teacher. This is what gave Lion its accuracy at sub-100ms — the student inherits the reasoning steps, not just the answer.
Lion 1.0 shipped with the VS Code extension, distilled from Griffin L (70B) rather than Griffin S, with INT8 weights. Sub-100ms p95 on M-series Apple Silicon and recent x86 GPU laptops. Sink detection F1 on the held-out evaluation set crossed 0.78.
Lion weights now ship as signed sigstore bundles; the IDE extension verifies on install and refuses an unsigned weight file. JetBrains plugin (IntelliJ IDEA, PyCharm, GoLand, WebStorm) and Cursor extension shipped. Identical inline feature surface across editors.
Three tracks: (1) longer reasoning depth without breaking the latency budget — distilling deeper traces while quantising more aggressively; (2) language-specific Lion heads — a JVM-focused student, a Python-focused student, a Go-focused student, with shared base weights but task-specific fine-tunes; (3) sanitiser-quality scoring — moving beyond binary sink detection to a graded score of how robust the sanitiser path is.
Each design constraint Lion is built around, and what falls out of it.
Sub-100ms p95 because anything slower interrupts the developer's flow. This is the load-bearing constraint that drives the parameter count, the quantisation, and the runtime choice.
Lion weights ship with the IDE extension. Prompts and code never leave the developer machine by default. Air-gapped operation is supported with no additional install.
Every weight bundle is sigstore-signed. The extension verifies before loading and refuses to run unsigned weights. Lion doesn't ship through pip or npm.
The teacher is the Griffin L variant currently in production. When the teacher changes, the distillation pipeline re-runs and the next Lion weight bundle ships with the IDE extension's next release.
Sub-80 ms on the developer machine, no egress, distilled from the same brain as the rest of the lineup.