Anthropic's prompt caching reduces cost by 90% on cached tokens and reduces latency. For security workloads, the cacheable surface is large — system prompts, codebase context, SBOM data, advisory references. Griffin AI uses caching aggressively; direct API consumers often do not because the caching strategy requires specific engineering work.
What caching saves
Two dimensions:
- Cost. Cached tokens billed at ~10% of normal rate.
- Latency. Cached context loads faster, producing lower response times.
For workloads where large context is reused across requests, the savings are material — 40-70% lower total cost in practice for security analysis workloads.
What security workloads can cache
Five categories:
- System prompts. Griffin AI's security-specific prompts are large and stable.
- SBOM context. The organisation's dependency inventory changes slowly.
- Policy context. Organisational policy rarely changes between scans.
- Recent finding history. Prior findings inform current triage.
- Codebase snapshots. Per-repo context can be cached across analyses of the same repo.
Each represents substantial token volume that caching converts from full-cost to discounted.
How Griffin AI uses it
Three integrated cache strategies:
Hierarchical caching. System prompt, organisation policy, and codebase context are cached in order. Each level is reused across more queries.
TTL tuning. Caches live as long as the underlying data is stable. Codebase caches invalidate on commit; policy caches invalidate on policy change.
Cache-aware request routing. Queries are routed to leverage available cache entries rather than producing duplicate cache slots.
When direct API use misses caching benefit
Teams using the Claude API directly for security workloads often:
- Don't structure prompts to maximise cache hits.
- Don't invalidate caches at appropriate times.
- Don't route requests to exploit existing caches.
The result is paying full price on tokens that could be cached.
How Safeguard Helps
Safeguard's Griffin AI implements prompt caching as a first-class architectural feature. Customers get the caching benefit automatically without building cache orchestration themselves. For security workloads that have large reusable context (which is most of them), this shows up as lower token costs on the platform pricing than equivalent direct-API approaches.