AI Security

Griffin AI vs Gemma for Lightweight Scanning

Gemma is built for efficiency. Can a small open-weight model replace Griffin AI for lightweight scanning workflows, or does the engine still matter?

Shadab Khan
AI Platform Engineer
7 min read

Google's Gemma family, from the original 2B and 7B releases through Gemma 2 and the later variants, is built around a specific premise: capable open-weight models that run well on modest hardware. Gemma 2 9B and 27B in particular have earned a reputation for strong output per watt, which makes them attractive for always-on scanning workloads where running a 70B model would be overkill.

For security teams, the natural question is whether Gemma can replace Griffin AI for lightweight scanning: the high-volume, low-latency work of processing every commit, every pull request, every SBOM ingestion, and every dependency update. This is a fair question, and the answer is more nuanced than for larger-model comparisons.

What lightweight scanning actually is

Before comparing tools, we need to agree on what "lightweight scanning" means. In our usage, it covers:

  • Per-commit static pattern scanning for secrets, risky APIs, and known anti-patterns
  • SBOM ingestion and normalisation as new builds complete
  • Dependency manifest diffing when a PR touches package.json, requirements.txt, go.mod, or equivalent
  • Initial triage of new findings to decide which deserve deeper review
  • Routine classification tasks: is this a vulnerability, is this an informational finding, is this a duplicate?

The workload shape is high-volume, short per-call, and latency-sensitive. A heavy reasoning model is overkill. A small, fast model is the right tool, if it is accurate enough.

Where Gemma is genuinely good

Gemma 2 9B, in our internal evaluations, hits surprisingly high accuracy on:

  • Pattern-driven secret detection where the regex layer has already surfaced candidates
  • SBOM component name normalisation across CycloneDX and SPDX formats
  • Single-line classification like "does this line import a dangerous module"
  • Short-form explanations of a finding, suitable for a Slack alert

That capability is real. If you strip a lightweight scanning workflow down to its raw classification steps, Gemma 2 can do most of them well at a fraction of the cost of a frontier model.

Where the engine still matters

The subtlety is that lightweight scanning, in a production context, is not actually stateless classification. It is classification plus context plus routing plus feedback. Griffin AI's lightweight path looks like:

  1. A fast extractor pulls structured data from the input (a diff, an SBOM, a manifest)
  2. A retrieval step augments the extraction with relevant context from the asset graph
  3. A small classifier makes the initial judgement
  4. A routing step decides whether the finding is confidently resolved or needs escalation
  5. A feedback step records the outcome for future calibration

Gemma 2 slots naturally into steps one and three. Steps two, four, and five are engine work. You can build them yourself, and for focused use cases that is a reasonable investment. For broad coverage across a full security surface area, the engine work dominates the model cost.

Latency budgets, realistically

One of the reasons small models are attractive is the latency budget. A secrets scanner that runs on every commit needs to respond in under a second to avoid becoming a bottleneck. A 70B model with chain-of-thought reasoning will not meet that bar. A Gemma 2 9B on a single GPU, or even on a well-tuned CPU deployment, can.

Griffin AI's lightweight path is specifically designed to hit sub-second latencies. Internally, the engine might use Gemma-class models for the fast path and escalate only the ambiguous cases to larger models. The customer does not see the routing; they see a fast, accurate result.

A team running Gemma directly can hit the same latency, but they cannot hit the same accuracy, because the accuracy comes from the routing and the escalation, not from the model alone. A small model used standalone will produce a confidently wrong answer on the hard cases. A small model used as the fast path in a tiered engine will hand those cases to something larger.

The accuracy floor

There is an honest floor to what any small model can do. On genuinely ambiguous cases, where the correct answer depends on context that does not fit in a short prompt, a 9B model will guess. The guess might be right, but the confidence will not be well-calibrated.

Griffin AI measures calibration continuously. When the fast-path model is confident and historically correct on similar inputs, the output ships. When the confidence is lower, the case routes to a heavier model. When the heavier model is still uncertain, the case routes to a human queue. The result is a system whose accuracy exceeds the accuracy of any individual model in it.

Gemma on its own cannot do this, because calibration requires a population of outcomes to measure against, and a stateless model does not have access to that population. You can build the calibration layer, but the calibration layer is the engine.

SBOM normalisation and the long tail

SBOM normalisation is a good concrete example. A raw SBOM arrives with component names like @scope/package, scope:package:version, or pkg:npm/%40scope/package@version. A dependency graph has to use a canonical representation. A classifier that gets this right 99 percent of the time still leaves thousands of broken edges in a large graph.

Griffin AI's normalisation pipeline combines deterministic rules for the known patterns with a model-based fallback for the ambiguous cases. The model output is validated against a canonical schema, and inputs that fail validation are either routed to a slower path or flagged for human review.

Gemma on its own can normalise the common cases but will produce plausible-looking garbage on the edges. Without the validation layer, the garbage lands in the graph and silently degrades every downstream query.

Cost per finding, not cost per token

When comparing costs, the wrong unit is cost per token. The right unit is cost per correctly-handled finding. A cheaper model with a higher error rate can easily produce a more expensive system, once you count the cost of human review on the errors.

In our internal measurements, the Gemma-as-fast-path configuration inside Griffin AI is dramatically cheaper per correct finding than Gemma used standalone, even accounting for the occasional escalation to larger models. The escalation path only fires on the cases where it is needed, and the cases where it fires are the cases where a standalone small model would have been wrong anyway.

When Gemma used directly is the right fit

There are real scenarios where Gemma deployed directly is the right tool:

  • Embedded scanning inside developer tooling where the full engine is overkill
  • On-device scanning in constrained environments where network calls are impractical
  • Batch reprocessing of historical data where the per-item accuracy requirement is lower
  • Research workflows where understanding small-model behaviour is the point

For production security scanning across a real portfolio, Gemma is a useful component but not a complete system. Griffin AI uses Gemma-class models heavily. Griffin AI is not Gemma.

The pragmatic framing

The question "should I use Gemma or Griffin AI?" often resolves to "should I build a scanning engine, or use one?" Gemma is a good ingredient. A scanning engine is the dish. Teams that have the bandwidth to build and maintain the dish from ingredients can do great work. Teams that want to focus on their own product usually discover, after a year or two of trying, that the engine was the thing they were missing. Griffin AI is that engine, with Gemma-class models as one of the backends that make the fast path fast.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.