Anthropic published Responsible Scaling Policy version 3.0 in November 2025, with an effective date of February 24, 2026. RSP v3 is the most consequential rewrite of the document since the original September 19, 2023 release: it introduces a new capability threshold for CBRN development uplift, disaggregates the prior AI R&D threshold into two distinct levels, and formalizes the production of Risk Reports for every frontier model deployment. For procurement, model risk, and policy teams that already rely on the RSP as a contractual anchor when buying Anthropic models, v3 changes the shape of what you are promised. This post walks through the substantive differences.
Why a third revision now?
Anthropic activated ASL-3 safeguards for Claude Opus 4 in May 2025 and extended them to Sonnet 4.5 in September. Once a deployment is under ASL-3, the practical operational pressure shifts from "are we ready for ASL-3?" to "what are the next tripwires?" V3 answers that question by spelling out two specific successor capabilities that, if reached, would push deployment into the still-undefined ASL-4 tier: full automation of entry-level AI research work, and dramatic acceleration in the rate of effective scaling. The previous policy lumped these together as a single AI R&D threshold; v3 splits them because Anthropic argues they require different mitigations. Full automation of junior research has economic and labor-market implications that justify governance review; acceleration of effective scaling has safety implications that justify capability evaluations and possible pauses.
The new CBRN development threshold
RSP v3 adds a CBRN development uplift threshold to the existing CBRN weaponization threshold. The distinction matters. Weaponization uplift covers models that materially help an end-user assemble or deploy a CBRN weapon. Development uplift covers models that help "moderately-resourced state programs" advance their underlying CBRN capability — synthesizing precursors, designing payloads, optimizing dispersal mechanisms. The threshold is structured as a tripwire: if internal evaluations or external red-team reports indicate the model crosses it, the deployment tier escalates and additional safeguards are required, including stricter weight-storage protocols and a moratorium on broad API exposure pending mitigation.
Risk Reports as a delivery vehicle
The most operationally significant addition in v3 is the formalization of Risk Reports. Previously, Anthropic published a system card per model release. Risk Reports are described as "a more systematic, comprehensive approach that will provide detailed information on the safety profile of models at the time of publication, going beyond describing model capabilities to explain how capabilities, threat models, and active risk mitigations fit together." In effect, the Risk Report is the system card's threat-model-aware sibling: it documents not just "here is what the model can do" but "here is the threat landscape we modeled, here are the residual scenarios, and here are the deployment-specific mitigations that reduce them." For enterprise buyers, that connective tissue is exactly what was missing from earlier system cards when filling out a third-party-risk questionnaire.
How the v3 thresholds map to deployment safeguards
The deployment safeguards lattice in v3 looks roughly like this. ASL-2 covers most current Anthropic models (Haiku, older Sonnet variants) and requires standard secure development practices, usage policies, and abuse monitoring. ASL-3 covers Opus 4 and Sonnet 4.5 today and adds hardened weight storage, Constitutional Classifiers++ in production, deployment-time policy gates, and a Responsible Scaling Officer with sign-off authority on material configuration changes. ASL-4 is undefined in v3 but explicitly described as the tier triggered by the new CBRN development threshold or either of the disaggregated AI R&D thresholds. Anthropic commits to publishing ASL-4 safeguards before any model crosses into that tier. The honest read is that the timeline for that publication is implicit — Anthropic expects ASL-4 specifications to be ready before the next 10x effective-compute training run.
Reading v3 alongside the OpenAI Preparedness Framework
Anthropic and OpenAI now have parallel but non-identical safety policies. The Preparedness Framework v2 (effective April 15, 2025) uses two tiers (High, Critical) versus RSP v3's three (ASL-2, ASL-3, ASL-4-pending). Preparedness covers biological, chemical, cybersecurity, and AI self-improvement. RSP covers CBRN weaponization, CBRN development, AI R&D entry-level automation, and AI R&D scaling acceleration. The cybersecurity capability domain is conspicuous in its absence from the RSP. Anthropic addresses cyber risk through its Acceptable Use Policy and through model-card-level evaluations, but does not commit to a dedicated tripwire there. If your model risk policy requires that vendor safety policies include cyber capability tripwires, that is a contract-language item to raise with Anthropic procurement.
What to include in your vendor due-diligence checklist
A practical checklist when reviewing an Anthropic deployment against RSP v3:
vendor_review:
model: claude-opus-4
rsp_version: 3.0
effective_date: 2026-02-24
evidence_required:
- latest_system_card_or_risk_report
- asl_tier_declaration
- rso_signoff_log_for_material_changes
- third_party_red_team_summary
- deployment_safeguards_attestation
policy_alignment:
cbrn_weaponization_tripwire: covered
cbrn_development_tripwire: covered
ai_rd_entry_level_tripwire: covered
ai_rd_scaling_tripwire: covered
cyber_capability_tripwire: NOT covered (raise question)
contractual_anchors:
- data_retention: ZDR
- region: EU OR US
- model_version_pinning: dated_string
How Safeguard Helps
Safeguard maintains a vendor policy registry that tracks each frontier lab's safety framework version, effective date, and capability tiers. When Anthropic publishes RSP v3.1 (or whoever publishes the next revision), Safeguard normalizes the diff against the prior version and surfaces it in the vendor scorecard. Policy gates allow security teams to enforce that any model in production must be supplied by a vendor whose current safety framework satisfies a minimum capability-tripwire coverage matrix. Griffin AI cross-references model system cards and Risk Reports against the AIBOM, flagging products that depend on a model whose deployment tier has changed. TPRM workflows raise findings when a vendor's RSP/Preparedness Framework cadence falls behind your policy SLA, giving compliance teams a defensible audit trail.