OpenAI's per-token API pricing is transparent and competitive on individual calls. For security workloads at enterprise scale, the total spend looks different than the per-token table suggests — every scan is dozens of calls, every finding is more calls, every incident spike multiplies the rate. Griffin AI's pricing model reflects the engine-plus-LLM architecture that gates LLM calls to high-leverage points. The difference is not in the raw token price but in how many tokens the workload actually consumes.
Where pure-LLM security workloads burn tokens
Three high-volume patterns:
- Per-finding analysis. Every finding sent to the model for triage. A codebase with 1,200 findings consumes 1,200 calls minimum.
- Conversation iteration. Multi-turn triage conversations multiply call counts.
- Incident spike amplification. During IR, every query is a call. Spike ratios of 5-10x are common.
Per-token pricing at fractions of a cent adds up quickly at this volume.
Where Griffin AI's architecture keeps cost bounded
Three structural gates:
- Reachability filtering. The 1,200 findings shrink to ~150 reachable ones before any LLM call.
- Cached reasoning. Similar finding shapes reuse prior analysis rather than re-querying.
- Model tiering. Routine analysis uses smaller, cheaper models; only exploit-hypothesis work runs on Opus-class.
Workload-level token spend is an order of magnitude lower than naive per-finding analysis.
A concrete number
A 300-engineer organisation running raw GPT-5 for security triage: approximately $5,000-$15,000 monthly in token spend, depending on workload.
The same organisation on Griffin AI: approximately $500-$2,000 monthly in token spend (included in platform pricing).
The difference comes from the gating architecture. Neither is doing the other's work incorrectly; the architectures simply call the model at different frequencies.
What to evaluate
Three questions:
- How many model calls per scan? How does that scale with codebase size?
- How does token spend change during incident response spikes?
- What caching is in place? Is similar analysis reused?
How Safeguard Helps
Safeguard's pricing reflects the engine-plus-LLM architecture: bounded token spend, predictable scale-up behaviour, incident spike containment. For security workloads at enterprise scale, the architecture choice shows up directly in the monthly invoice.