AI Security

Griffin AI vs Claude Prompt Caching: Security

Claude's prompt caching gives you 90% discount on cached tokens. Security workloads have massive cacheable surface area. Griffin AI takes advantage; direct API use often does not.

Nayan Dey
Senior Security Engineer
2 min read

Anthropic's prompt caching reduces cost by 90% on cached tokens and reduces latency. For security workloads, the cacheable surface is large — system prompts, codebase context, SBOM data, advisory references. Griffin AI uses caching aggressively; direct API consumers often do not because the caching strategy requires specific engineering work.

What caching saves

Two dimensions:

  • Cost. Cached tokens billed at ~10% of normal rate.
  • Latency. Cached context loads faster, producing lower response times.

For workloads where large context is reused across requests, the savings are material — 40-70% lower total cost in practice for security analysis workloads.

What security workloads can cache

Five categories:

  • System prompts. Griffin AI's security-specific prompts are large and stable.
  • SBOM context. The organisation's dependency inventory changes slowly.
  • Policy context. Organisational policy rarely changes between scans.
  • Recent finding history. Prior findings inform current triage.
  • Codebase snapshots. Per-repo context can be cached across analyses of the same repo.

Each represents substantial token volume that caching converts from full-cost to discounted.

How Griffin AI uses it

Three integrated cache strategies:

Hierarchical caching. System prompt, organisation policy, and codebase context are cached in order. Each level is reused across more queries.

TTL tuning. Caches live as long as the underlying data is stable. Codebase caches invalidate on commit; policy caches invalidate on policy change.

Cache-aware request routing. Queries are routed to leverage available cache entries rather than producing duplicate cache slots.

When direct API use misses caching benefit

Teams using the Claude API directly for security workloads often:

  • Don't structure prompts to maximise cache hits.
  • Don't invalidate caches at appropriate times.
  • Don't route requests to exploit existing caches.

The result is paying full price on tokens that could be cached.

How Safeguard Helps

Safeguard's Griffin AI implements prompt caching as a first-class architectural feature. Customers get the caching benefit automatically without building cache orchestration themselves. For security workloads that have large reusable context (which is most of them), this shows up as lower token costs on the platform pricing than equivalent direct-API approaches.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.