AI Security

Anthropic's Responsible Scaling Policy v3: What Changed

RSP v3.0 takes effect February 24, 2026. It splits the AI R&D threshold, adds a CBRN-development tier, and formalizes Risk Reports.

Anthropic published Responsible Scaling Policy version 3.0 in November 2025, with an effective date of February 24, 2026. RSP v3 is the most consequential rewrite of the document since the original September 19, 2023 release: it introduces a new capability threshold for CBRN development uplift, disaggregates the prior AI R&D threshold into two distinct levels, and formalizes the production of Risk Reports for every frontier model deployment. For procurement, model risk, and policy teams that already rely on the RSP as a contractual anchor when buying Anthropic models, v3 changes the shape of what you are promised. This post walks through the substantive differences.

Why a third revision now?

Anthropic activated ASL-3 safeguards for Claude Opus 4 in May 2025 and extended them to Sonnet 4.5 in September. Once a deployment is under ASL-3, the practical operational pressure shifts from "are we ready for ASL-3?" to "what are the next tripwires?" V3 answers that question by spelling out two specific successor capabilities that, if reached, would push deployment into the still-undefined ASL-4 tier: full automation of entry-level AI research work, and dramatic acceleration in the rate of effective scaling. The previous policy lumped these together as a single AI R&D threshold; v3 splits them because Anthropic argues they require different mitigations. Full automation of junior research has economic and labor-market implications that justify governance review; acceleration of effective scaling has safety implications that justify capability evaluations and possible pauses.

The new CBRN development threshold

RSP v3 adds a CBRN development uplift threshold to the existing CBRN weaponization threshold. The distinction matters. Weaponization uplift covers models that materially help an end-user assemble or deploy a CBRN weapon. Development uplift covers models that help "moderately-resourced state programs" advance their underlying CBRN capability — synthesizing precursors, designing payloads, optimizing dispersal mechanisms. The threshold is structured as a tripwire: if internal evaluations or external red-team reports indicate the model crosses it, the deployment tier escalates and additional safeguards are required, including stricter weight-storage protocols and a moratorium on broad API exposure pending mitigation.

Risk Reports as a delivery vehicle

The most operationally significant addition in v3 is the formalization of Risk Reports. Previously, Anthropic published a system card per model release. Risk Reports are described as "a more systematic, comprehensive approach that will provide detailed information on the safety profile of models at the time of publication, going beyond describing model capabilities to explain how capabilities, threat models, and active risk mitigations fit together." In effect, the Risk Report is the system card's threat-model-aware sibling: it documents not just "here is what the model can do" but "here is the threat landscape we modeled, here are the residual scenarios, and here are the deployment-specific mitigations that reduce them." For enterprise buyers, that connective tissue is exactly what was missing from earlier system cards when filling out a third-party-risk questionnaire.

How the v3 thresholds map to deployment safeguards

The deployment safeguards lattice in v3 looks roughly like this. ASL-2 covers most current Anthropic models (Haiku, older Sonnet variants) and requires standard secure development practices, usage policies, and abuse monitoring. ASL-3 covers Opus 4 and Sonnet 4.5 today and adds hardened weight storage, Constitutional Classifiers++ in production, deployment-time policy gates, and a Responsible Scaling Officer with sign-off authority on material configuration changes. ASL-4 is undefined in v3 but explicitly described as the tier triggered by the new CBRN development threshold or either of the disaggregated AI R&D thresholds. Anthropic commits to publishing ASL-4 safeguards before any model crosses into that tier. The honest read is that the timeline for that publication is implicit — Anthropic expects ASL-4 specifications to be ready before the next 10x effective-compute training run.

Reading v3 alongside the OpenAI Preparedness Framework

Anthropic and OpenAI now have parallel but non-identical safety policies. The Preparedness Framework v2 (effective April 15, 2025) uses two tiers (High, Critical) versus RSP v3's three (ASL-2, ASL-3, ASL-4-pending). Preparedness covers biological, chemical, cybersecurity, and AI self-improvement. RSP covers CBRN weaponization, CBRN development, AI R&D entry-level automation, and AI R&D scaling acceleration. The cybersecurity capability domain is conspicuous in its absence from the RSP. Anthropic addresses cyber risk through its Acceptable Use Policy and through model-card-level evaluations, but does not commit to a dedicated tripwire there. If your model risk policy requires that vendor safety policies include cyber capability tripwires, that is a contract-language item to raise with Anthropic procurement.

What to include in your vendor due-diligence checklist

A practical checklist when reviewing an Anthropic deployment against RSP v3:

vendor_review:
  model: claude-opus-4
  rsp_version: 3.0
  effective_date: 2026-02-24
  evidence_required:
    - latest_system_card_or_risk_report
    - asl_tier_declaration
    - rso_signoff_log_for_material_changes
    - third_party_red_team_summary
    - deployment_safeguards_attestation
  policy_alignment:
    cbrn_weaponization_tripwire: covered
    cbrn_development_tripwire: covered
    ai_rd_entry_level_tripwire: covered
    ai_rd_scaling_tripwire: covered
    cyber_capability_tripwire: NOT covered (raise question)
  contractual_anchors:
    - data_retention: ZDR
    - region: EU OR US
    - model_version_pinning: dated_string

How Safeguard Helps

Safeguard maintains a vendor policy registry that tracks each frontier lab's safety framework version, effective date, and capability tiers. When Anthropic publishes RSP v3.1 (or whoever publishes the next revision), Safeguard normalizes the diff against the prior version and surfaces it in the vendor scorecard. Policy gates allow security teams to enforce that any model in production must be supplied by a vendor whose current safety framework satisfies a minimum capability-tripwire coverage matrix. Griffin AI cross-references model system cards and Risk Reports against the AIBOM, flagging products that depend on a model whose deployment tier has changed. TPRM workflows raise findings when a vendor's RSP/Preparedness Framework cadence falls behind your policy SLA, giving compliance teams a defensible audit trail.

anthropic responsible-scaling ai-policy asl-3 frontier-models

Back to all articles

More on #anthropic

View all →

AI Security

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.

Anthropic's Responsible Scaling Policy v3: What Changed

Why a third revision now?

The new CBRN development threshold

Risk Reports as a delivery vehicle

How the v3 thresholds map to deployment safeguards

Reading v3 alongside the OpenAI Preparedness Framework

What to include in your vendor due-diligence checklist

How Safeguard Helps

More on #anthropic

Griffin AI vs Claude Agent Skills for Security

Griffin AI vs Claude Haiku for Bulk Scanning

Griffin AI vs Claude Sonnet for Remediation

Griffin AI vs Claude Opus for Triage

Related articles in AI Security

Building an Eval Suite for Your Security LLM Workflows

Zero-Day Discovery With LLM-Augmented Reachability: A Safeguard Engine Walkthrough

Frontier LLM Vendors Are Not Your Supply Chain Security Vendor

Never miss an update

Product

Solutions

Compare

Resources

Company

Legal

Developers