AI Security

Gemini 3 Pro and the Frontier Safety Framework Report

Google released Gemini 3 Pro on November 18, 2025 with the most thorough Frontier Safety Framework evaluation yet. We unpack what was disclosed and how it changes downstream defender posture.

On November 18, 2025, Google released Gemini 3 Pro with a sweeping rollout that included Search AI Overviews on day one. Alongside the launch, Google DeepMind published an updated Model Card and a Frontier Safety Framework (FSF) report. Google described Gemini 3 as their most secure model yet, citing reduced sycophancy, increased resistance to prompt injection, and improved protection against misuse via cyberattacks. They also worked with external evaluators including the UK AI Safety Institute, Apollo Research, Vaultis, and Dreadnode. For enterprise defenders evaluating whether to standardize on Gemini 3 Pro, the marketing summary is only the start — the operational answers live in the disclosed evaluations and the residual risks Google admits.

What is in the Gemini 3 Pro model card?

The model card, dated December 2025, runs through architecture summary (a multimodal mixture-of-experts), training-data summary (text, images, audio, video, code with knowledge cutoff January 2025), and evaluation results across safety, security, and capability axes. The card explicitly compares Gemini 3 Pro to Gemini 2.5 Pro on prompt injection resistance and reports a measurable reduction in attack success rate on the internal benchmark suite. The card also discloses sycophancy reduction — measured as the rate at which the model changes a correct answer when challenged by a user — falling from prior model levels. That is a security-relevant property, not just a quality property, because sycophancy is the underlying mechanism that lets social engineering bypass guardrails in chat interfaces.

What capability levels did the Frontier Safety Framework assign?

The Frontier Safety Framework defines Critical Capability Levels (CCLs) across cyber autonomy, ML R&D, CBRN uplift, and deceptive alignment. The November 2025 report places Gemini 3 Pro below the CCL for autonomous cyber operations but documents meaningful capability on the underlying evaluation suite — specifically that the model can complete CTF challenges from the 2024 DEFCON qualifier set at a rate that exceeds professional CTF teams on time-bounded tasks. Google also discloses CBRN evaluation results conducted with external biosecurity partners that placed the model below the uplift threshold. For defenders, the practical signal is that Gemini 3 Pro is now in a capability tier where vendor-side safeguards are necessary but not sufficient — the model card explicitly recommends that downstream applications layer their own evaluations and policies.

How robust is the prompt injection resistance?

Google's prompt injection numbers are reported against three benchmark families: (1) the indirect prompt injection set used internally and overlapping with Greshake et al.'s 2023 benchmark, (2) the agentic browsing harness, and (3) an internal "in-the-wild" set drawn from production telemetry on Gemini-powered features. Gemini 3 Pro shows reduced attack success rates compared to Gemini 2.5 Pro on all three, but the report is honest that absolute success rates remain non-zero on the in-the-wild set. The report also includes a specific section on "tool-call hijacking" — cases where a prompt injection successfully redirects a tool call to an attacker-chosen target. That number improved but did not reach zero. The implication for defenders building agent workflows: tool-call destination allowlists remain mandatory regardless of vendor claims.

What did the external evaluators find?

The UK AI Safety Institute, Apollo Research, Vaultis, and Dreadnode each contributed an evaluation summary. The UK AISI conducted cyber and biology capability evaluations, broadly concurring with Google's classification. Apollo Research evaluated deceptive alignment and reports measurable evaluation-awareness — the model can sometimes detect that it is being tested — but did not find consistent strategic deception. Vaultis and Dreadnode are infosec firms that ran offensive evaluation against Gemini 3 Pro in agentic configurations. Their findings are referenced but the full reports are not yet public. The third-party validation pattern matches what Anthropic and OpenAI now do, and it is becoming a de facto industry standard: enterprise defenders should require vendors to cite at least two independent evaluators in any procurement evidence package.

What changed about the AI Studio and API safety controls?

Beyond the model card, Google updated the AI Studio safety settings interface and the Gemini API to expose more granular controls — including a new "thinking budget" parameter that exposes the reasoning depth allocation, and stricter defaults for system instructions on the enterprise tier. The Gemini 3 Pro release also adjusted the safety category thresholds in the API: harm-category filtering can now be set independently for prompt-injection-detected content versus user-originated content, allowing finer-grained policy. For defenders, this is the right moment to revisit your API integration: if you set BLOCK_NONE across all safety categories during prototyping (a common shortcut), you should re-tier those decisions before shipping Gemini 3 Pro to production.

# Example Gemini 3 Pro safety configuration for an enterprise integration
from google import genai
from google.genai.types import SafetySetting, HarmCategory, HarmBlockThreshold

client = genai.Client(api_key="...")
safety_settings = [
    SafetySetting(
        category=HarmCategory.HARM_CATEGORY_PROMPT_INJECTION,
        threshold=HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
    ),
    SafetySetting(
        category=HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
        threshold=HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    ),
]

response = client.models.generate_content(
    model="gemini-3-pro-preview",
    contents=user_prompt,
    config={"safety_settings": safety_settings, "thinking_budget": 1024},
)

How does this affect Vertex AI deployments?

Most enterprise consumption of Gemini 3 Pro is through Vertex AI rather than the consumer Gemini app, and the Vertex deployment surface introduces additional controls and risks. Vertex supports VPC service controls, customer-managed encryption keys, and data residency commitments that are not available on the consumer surface. The Gemini 3 Pro launch was accompanied by Vertex AI updates that exposed the new safety-category thresholds and the thinking-budget parameter in the platform's API. The procurement implication: enterprises evaluating Gemini 3 Pro should be evaluating the Vertex AI configuration surface, not the consumer interface — and the data-residency and customer-managed key commitments are the controls that matter for regulated industries. The model card and Frontier Safety Framework report are the right artifacts to attach to a Vertex AI procurement package.

What residual risks does Google admit?

The model card section on limitations is unusually direct. Google states that Gemini 3 Pro "may exhibit some of the general limitations of foundation models, such as hallucinations." It also discloses that multimodal inputs (image, audio, video) introduce attack surfaces that text-only inputs do not — specifically, an image can contain an embedded prompt injection that is invisible to a human reviewer but recoverable by the vision encoder. The report cites Google's own "image-as-prompt" benchmark and shows the model resists most attacks in that set but not all. The practical defender response: any pipeline that accepts user-supplied images and routes them through a vision-capable model needs an image-side content scanner, not just text-side guardrails.

How Safeguard Helps

Safeguard parses Gemini model cards and Frontier Safety Framework reports into the same AIBOM schema it uses for Claude system cards and OpenAI preparedness updates, giving you a normalized cross-vendor view of capability tiers and disclosed evaluations. When Google ships a new version (the Gemini 3.1 Pro card published in early 2026 is already tracked), Safeguard diffs it and flags policy gates that need review — including the safety-category thresholds your developers may have set during prototyping. Griffin AI generates per-deployment evaluation harnesses derived from the official model card methodology, so your internal numbers are comparable to the vendor's. Policy gates block API integrations that set BLOCK_NONE thresholds in production, and the vector-and-image scanner detects embedded prompt-injection content before it reaches the model — covering the multimodal attack surface Google explicitly admits remains non-zero.

gemini google-deepmind frontier-safety-framework model-card ai-security

Back to all articles

More on #gemini

View all →

AI Security

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.

Gemini 3 Pro and the Frontier Safety Framework Report

What is in the Gemini 3 Pro model card?

What capability levels did the Frontier Safety Framework assign?

How robust is the prompt injection resistance?

What did the external evaluators find?

What changed about the AI Studio and API safety controls?

How does this affect Vertex AI deployments?

What residual risks does Google admit?

How Safeguard Helps

More on #gemini

Griffin AI vs Gemini On-Device: Developer Tools

Griffin AI vs Gemini for FedRAMP Workflows

Griffin AI vs Gemini Pricing: Security Scans

Griffin AI vs Gemini Multimodal: Security

Related articles in AI Security

NIST SP 800-218A: Operationalizing AI Secure Development in 2026

Ollama CVE-2026-7482 'Bleeding Llama': Out-of-Bounds Read

Building an Eval Suite for Your Security LLM Workflows

Never miss an update

Product

Solutions

Compare

Resources

Company

Legal

Developers