Responsibility · Transparency Report

What we count. What we publish. What we get wrong.

Aggregate platform numbers, government requests, incidents, and — the section nobody likes writing — the commitments we missed. Published quarterly. Methodology is open.

Last updated: Q2 2026
Aggregate Platform · Last 12 Months

The numbers, in one place.

Anonymised, aggregate. No individual-customer attribution. Methodology link below.

Active tenants

1,240

Single- and multi-tenant deployments combined.

Findings shipped

3.42M

Survived adversarial disproof; reached a customer queue.

Auto-remediated

61%

Findings closed by an automatic Safeguard patch.

Adversarial resistance

0.948

Mean score across the Griffin family this quarter.

Red-team first-pass

82%

Model releases that cleared the gate on the first attempt. Not 100%, on purpose.

Coordinated disclosures

147

Published with upstream maintainer in the loop.

Threat-feed items

412

Public posts with cited evidence and reproducible artefact.

Corpus rotations

4

Training-corpus refresh events; each one is recipe-versioned.

Government Requests

Requests received, complied with, challenged.

Single-digit counts. Illustrative; the live numbers are updated each quarter. The principle is: minimal compliance, robust challenge of overbroad requests, and customer notification wherever legally permitted.

JurisdictionReceivedCompliedChallenged
United States742
European Union431
United Kingdom211
Singapore110
Other311

Illustrative counts. The live report updates these on a quarterly cadence with sealed-request caveats noted where applicable.

Incidents and Post-mortems

What broke and what we changed.

Illustrative excerpts from recent quarters. The live incident log is at status.

SEV-2INC-2026-014

Eagle triage stream backlog after upstream advisory burst

Scope · Eagle queue latency rose to 14 minutes for ~3 hours on 12-Mar; multi-tenant only.

Root cause · Advisory-feed burst exceeded queue-worker concurrency cap; rate limiter held the queue rather than the producer.

Full RCA
SEV-3INC-2026-009

Griffin reasoning-trace truncation regression

Scope · 0.4% of Griffin M traces truncated below the documented 8k token contract for 9 hours.

Root cause · Inference batching change silently lowered the per-trace token budget; caught by a downstream contract test, not the release gate.

Full RCA
SEV-1INC-2026-002

Cross-tenant trace metadata leak in the audit export

Scope · Two enterprise tenants saw tenant-id metadata fields belonging to other tenants in an exported audit bundle. No findings content leaked.

Root cause · Export bundler shared a non-thread-local context across tenant boundaries during parallel export. Affected customers were notified within 18 hours.

Full RCA
Where We Did Not Meet a Commitment

The honest section.

Two misses in the last reporting period. Listed in plain language. No hedge.

Model-tier parity SLA missed by 12 days on the Griffin L → Griffin Zero promotion

Our public commitment is full-lineup parity across deployment shapes within 30 days of a tier promotion. On the most recent Griffin L → Griffin Zero promotion, sovereign customers waited 42 days. Cause was an unplanned export-control review on one corpus subset. We mis-scoped the review window when we announced the date. The new commitment, learned from this miss, includes the review window inside the SLA — not outside it.

Customer-facing latency regression on Eagle confidence calls

A 280ms p95 regression on Eagle confidence-score calls persisted for 11 days in Q1. Our commitment to root-cause and ship a fix is 5 business days for a customer-facing performance regression of that magnitude. We took longer because we mis-prioritised the trace from "performance" into the standard backlog rather than the regression queue. The triage rule has been changed; the miss is logged here.

How we count what we count

Methodology, in two paragraphs.

Findings are counted at the moment they survive adversarial disproof and become available to a customer queue, not at the moment a candidate is generated. Auto-remediation is counted only when a Safeguard-proposed patch is the one that lands in the customer's main branch. Adversarial-resistance scores are mean scores across the held-out evaluation set; the per-model breakdown is in the research notes for that release. Threat-feed items are counted once per first publication. Disclosure counts include only items the upstream maintainer was contacted on; silent posts are not counted, because we do not do silent posts.

The full data-flow diagram, the metric definitions, and the pipeline that produces this page live on the architecture page. Each number on this page is reproducible from the source events. Where a count cannot be reproduced — because of a sealed-disclosure or sealed-request constraint — that limitation is noted next to the number rather than papered over.

Want to see the full quarterly data set? Ask.