Scaling model capability has to be paired with scaling our ability to red-team, disprove, and govern. We scale only when the safety posture scales with the capability. This page is the public framework.
On-device only. No exfiltration paths. Sub-100ms p95. Signed weights verified at install. Deployable on any developer machine; no network egress required.
Multi-tenant inference; structured trace contract; per-tenant audit log; standard adversarial-resistance gate (≥0.94 prompt-injection block-rate).
Single-tenant inference cluster; advanced adversarial-resistance gate (≥0.97); coordinated-disclosure obligations; quarterly red-team rotation; structured-trace human audit on 300 samples per release.
Air-gapped or VPC-isolated. Full red-team + manual trace audit per release. Customer-controlled key material. ITAR/export-control review. Adversarial-resistance ≥0.99.
If safety regresses, the tier ceiling lowers. The model continues to ship — but at the lower safety level — until the posture catches up. The triggers:
Weekly review of red-team findings, eval regressions, and customer safety reports. Membership rotates across engineering, research, and security.
Quarterly rotation of independent red teams with sector-specific specialisations (offensive security, prompt injection, AI safety).
This page is the public commitment. Updated quarterly. Material changes flagged on the changelog.
Annual third-party audit of the eval methodology, the corpus curation, and the release pipeline. Summary published in the transparency report.