Enterprise LLM spend is a new line item for finance teams, and the first year of forecasts is typically wrong by uncomfortable margins. Patterns for producing predictable LLM budgets have emerged from organisations that have been running AI workloads at scale for 18+ months. The patterns are specific, they work, and they port across vendors.
What produces unpredictability
Three drivers:
- Usage scaling nonlinearly. Successful deployments multiply use cases faster than projected.
- Model price changes. Vendor price changes on existing models.
- Incident spikes. IR bursts use more tokens than steady state.
Each must be managed separately.
Patterns that work
Six:
- Per-use-case budgets. Allocate LLM spend to specific use cases; track per-case burn rate.
- Rate limits at the platform level. Prevent runaway usage in any single workflow.
- Hard caps with alerting. Before budget burn, alerts fire.
- Batch API for non-time-sensitive work. ~50% savings on applicable workloads.
- Prompt caching for repeated context. Up to 90% savings on cached tokens.
- Task routing to appropriately-sized models. Haiku for bulk, Opus for reasoning.
Griffin AI implements all six as platform features.
What finance should track
Five metrics:
- Cost per finding
- Cost per scan
- Monthly spend trajectory
- Peak spend during IR
- Cost per use case
These produce forecasts that hold.
How Safeguard Helps
Safeguard's pricing model reflects the patterns above. Per-use-case tracking, rate limits, caching, task routing — all are part of normal platform operation. For finance teams budgeting for AI-for-security, Safeguard produces predictable line items.