AI Security

Leakage Testing Methods For Security Benchmarks

A benchmark that the model has seen in training is a benchmark of memorisation. Specific leakage-testing methods separate generalisation from recall.

Shadab Khan
Security Engineer
2 min read

A security benchmark that the evaluated model has seen in training produces inflated accuracy numbers. Detecting this leakage requires specific methods — not just "is this CVE public?" but statistical tests of whether the model's responses exhibit memorisation signatures. The methods are well-understood in ML research and underapplied in AI-for-security marketing. For procurement, leakage testing is a competency signal.

What leakage looks like

Three signatures:

  • Verbatim recall. The model reproduces text from training data word-for-word.
  • Detail-fidelity mismatch. The model gets specific numbers exactly right but reasoning about new scenarios is weak.
  • Prompt-sensitivity gap. Small prompt variations produce large accuracy swings.

Each is measurable.

Specific methods

Four:

  • Canary string insertion. Embed unique strings in held-out content; if the model reproduces them, they leaked.
  • Paraphrase consistency. Ask the same question in different phrasings; large variance suggests memorisation of specific wordings.
  • Temporal comparison. Measure accuracy on pre-cutoff vs post-cutoff data. Big drop post-cutoff suggests memorisation of pre-cutoff.
  • Membership inference attacks. Statistical tests for whether a specific example was in training.

Each produces a different signal; used together, they triangulate leakage.

Why it matters for security specifically

Security eval benchmarks draw heavily from public CVE and advisory data, which is well-represented in training. The contamination floor is high. A vendor quoting 95% on a CVE-based benchmark without leakage testing is reporting an inflated number.

What customers should ask

Three questions:

  1. What leakage testing has been done on the quoted benchmarks?
  2. What is the accuracy delta between contaminated and uncontaminated subsets?
  3. Are post-cutoff numbers published separately?

How Safeguard Helps

Safeguard's Griffin AI eval methodology includes temporal splits, paraphrase variance testing, and private-dataset evaluation. The published benchmark numbers reflect uncontaminated capability to the extent the methods can certify. For customers whose procurement prioritises evidence quality, these are the details that separate defensible numbers from inflated ones.

Related articles in AI Security

AI Security

Safeguard Now Supports Every Major AI Model Family for Zero-Day Discovery: Anthropic, OpenAI, Gemini, Microsoft, Meta, and Your Own Models

You should not have to choose between your organization's AI strategy and your security platform. Safeguard's agentic zero-day discovery and remediation pipeline now works on Anthropic Claude Fable 5, OpenAI GPT, Google Gemini, Microsoft Phi, Meta Llama, Safeguard native models, and privately hosted custom models — all running as first-class agents in the same Multi-Agent TAOR Deep Think AI Engine.

June 9, 2026Read
AI Security

Anthropic Claude Mythos Releases Tomorrow: Capabilities, Benchmarks, and What Security Teams Must Do Now

Anthropic's Claude Mythos model goes public on June 10, 2026 — a frontier AI that scored 97.6% on the Math Olympiad, completed expert-level hacking tasks at 73% success, and found 271 vulnerabilities in Firefox 150. Here is everything security teams need to know before it lands, and how Safeguard already supports Mythos zero-day discovery natively.

June 9, 2026Read
AI Security

Claude Fable 5: Anthropic's Most Capable Public Model Is Here — Benchmarks, Capabilities, and What It Means for Security

Anthropic just released Claude Fable 5, its most capable publicly available model and the first Mythos-class AI open to everyone. 80.3% on SWE-Bench Pro, 88% on Terminal-Bench 2.1, state-of-the-art across software engineering, vision, and scientific research. Safeguard has already integrated Fable 5 natively — here is everything you need to know.

June 9, 2026Read

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.