Matching the right model family to the right security task for cost and quality.
LLM selection is the engineering discipline of choosing which model runs which step of an AI workflow — based on the quality bar, latency budget, and cost envelope of that specific step. It is not a vendor decision made once at the top of the architecture diagram. It is a task-by-task routing question, answered with evals.
Modern model families — Anthropic's Opus, Sonnet, and Haiku classes; OpenAI's comparable tiers; strong open-source options — span two orders of magnitude in cost and a wide spread in reasoning depth. Using the top-tier model for every step wastes budget on tasks that don't need it. Using the smallest for every step misses the problems that actually required reasoning. The win is routing.
A working playbook for a security pipeline:
At security-team scale, "always use the best model" is a cost line that will eventually be questioned by finance and "always use the cheapest" is a quality line that will eventually be questioned by engineering. Task-by-task selection avoids both conversations by making the tradeoff explicit and per-step.
The deeper reason it matters: model families drift, new models ship, prices move. A pipeline with per-task selection can absorb those changes incrementally — swap the drafting step, re-run evals, ship — instead of re-architecting end-to-end. Selection is a lever, not a commitment.
Swapping drafting and classification steps from Opus-class to Sonnet- or Haiku-class routinely cuts end-to-end spend by an order of magnitude with no measurable quality drop.
Concentrating the big-model budget on the genuinely hard reasoning (exploit hypothesis, complex remediation) lifts the scores that matter instead of spreading them thin.
Sonnet- and Haiku-tier models return in fractions of what Opus-tier takes. Interactive surfaces become possible on steps that used to be "come back in 30 seconds."
When a provider retires or reprices a model, you replace it per-step against the eval suite instead of doing a full pipeline rewrite. Migration costs amortise.
"Here's the model assigned to each step, the eval score we require, and the cost per case" is a conversation you can have with a CFO. "We're paying a lot for AI" is not.
Inside Griffin AI, each step of the pipeline — triage, hypothesis, patch draft, reviewer — is bound to a specific model family chosen against the eval rubric for that step. Every routing decision in AI remediation is gated by the eval harness so a cheaper model can't silently take over a step it shouldn't.
See how Safeguard routes each Griffin step to the cheapest model that meets the eval bar — and escalates only when the evidence says to.