AI Security

Griffin AI vs GPT-5: Context Grounding

A million-token context window is a tool, not a solution. Context grounding for security requires architecture, not just capacity.

Shadab Khan
Head of AI Research
7 min read

One of the more common misconceptions about modern frontier models is that large context windows have solved the grounding problem. It is an understandable intuition. If a model can read a million tokens, surely you can just hand it everything relevant and let it reason over the whole picture. In practice, this works about as well as giving a new employee every document the company has ever produced and asking them to make a decision by lunch.

Griffin AI's position on this is straightforward. We use frontier models—specifically Anthropic's Claude family—for the reasoning layer, and we do not compete with GPT-5 or any other frontier model as a general intelligence. But context grounding for security is a problem we have spent real engineering effort on, and we think the distinction between "large context window" and "grounded context" is worth drawing clearly.

What a context window actually is

A context window is the working memory a model can reference during a single inference. Larger windows let the model hold more material at once: longer documents, more conversation history, bigger codebases. This is genuinely useful. Tasks that previously required summarization or chunking can now run end-to-end.

What a large context window does not do is retrieve the right material. The model is still reasoning over whatever the application chose to put in front of it. If the application puts the wrong documents in, the model reasons over the wrong documents. The fact that it can hold a million tokens of wrong material is not a feature.

The pure-GPT-5 pattern and its limits

The common pattern with a large-context model looks like this: the engineer, or their application, dumps a lot of material into the prompt—the whole repository, the full CVE database entry, a month of logs, the entire policy document—and asks a question. The model reads it all and answers.

For some questions, this works. The model is good at finding needles in haystacks, especially in recent generations. But the pattern has several failure modes that become obvious in production security work:

  • Relevance decay. The model attends to everything, and important signal gets diluted by noise. A critical VEX statement buried in page forty of a pasted bundle is there, but it might not drive the conclusion it should.
  • Stale snapshots. Whatever was in the prompt at the moment of the query is what the model sees. An SBOM that was accurate last week is wrong this week, and the model has no way to know.
  • Unaudited grounding. The application chose what to include. That choice is not visible to the security team reviewing the output. If the model missed a finding, it is impossible to tell whether the model erred or the prompt-builder erred.
  • Permission blur. Everything in the prompt is available to the model regardless of who is asking. A developer and the CISO end up with the same context, which is a problem for least privilege.

None of these are model problems. They are architecture problems. They are what happens when context grounding is treated as a prompt-engineering exercise rather than an engineering discipline.

What Griffin does instead

Griffin treats context grounding as a first-class subsystem, not a prompt pattern. Every query goes through a scoped retrieval layer before it reaches the frontier model. That layer is responsible for four things:

  1. Identity scoping. Who is asking, and what are they permitted to see? The user's role, team membership, and explicit access grants determine which projects, products, and findings are in play.
  2. Freshness. The retrieval layer pulls the latest version of each record at query time. An SBOM is not a copy in the prompt; it is a live reference. A finding's status is whatever it is right now, not whatever it was when the conversation started.
  3. Relevance. Griffin does not paste everything. It pulls the records that relate to the specific question, joined across entities—SBOM to product, product to policy, policy to gate, gate to finding—so the model reasons over a small, focused, meaningful context.
  4. Auditability. Every record pulled is logged. If someone asks why the model reached a particular conclusion, Griffin can replay the retrieval and show exactly what the model saw.

This is what we mean by an engine around the model. The frontier reasoning is the same. The quality of the context it reasons over is not.

A concrete example

Suppose a developer asks, "What is the status of CVE-2025-67890 across my services?"

In a pure large-context workflow, the prompt-builder has to decide what to include. It might include a list of the developer's services, the SBOMs for each, any findings matching the CVE, and some policy text. If it misses one service, the model misses that service. If an SBOM is stale, the answer is stale. If the policy snippet is out of date, the recommendation is wrong.

In a Griffin workflow, the retrieval layer resolves the question. It identifies the developer's services based on their access, fetches the current SBOMs, joins to active findings, pulls any VEX statements, and includes the applicable policy. The frontier model then reasons over a focused, fresh, scoped context and produces an answer whose evidence is traceable end to end.

The model did not get smarter between the two workflows. The context did.

Why this matters for security specifically

Security is an unusually unforgiving domain for context errors. A general question answered over slightly stale data is still usually useful. A vulnerability question answered over slightly stale SBOM data is wrong, and the wrongness is hard to notice until something breaks. The cost of getting grounding wrong in security is not a subpar answer; it is a false sense of safety.

This is the deeper reason Griffin invests in grounding architecture even though the underlying model is perfectly capable of holding large contexts. The model is capable. The organization is not capable of making sure the right things land in that context without a system doing the grounding work. A large-context model is a large dinner table. It does not decide what to serve.

The "retrieval-augmented generation" question

Readers familiar with the AI space will recognize this as a variant of retrieval-augmented generation, and ask how Griffin's grounding differs from a standard RAG setup bolted on top of a large-context model.

The difference is that a standard RAG system retrieves from a single vector store based on semantic similarity. Griffin's grounding is structural. It knows that an SBOM belongs to a product, that a product has findings, that a finding may have a VEX, that a policy references a category of finding. The retrieval walks these relationships rather than matching embeddings. For security, this matters because the right answer often requires joining across entities the way a SQL query would, not fetching the nearest neighbor in a vector space.

Griffin does use semantic retrieval for free-text content like advisories and policy language. But the backbone of grounding is the structured relationships between tenant records. A general RAG pattern, no matter how well tuned, does not capture those relationships out of the box.

Where frontier reasoning carries the day

The reasoning layer still matters. Once Griffin has assembled a grounded, scoped, fresh context, the frontier model does the work of weighing the evidence, writing the explanation, and drafting the recommendation. This is the part where model quality matters, and it is the part where Griffin benefits from using a capable frontier model. We are not trying to replace GPT-5 or any other model as a reasoning engine. We are making sure the reasoning engine sees the right input.

The test to run

If you are comparing a pure large-context workflow to a grounded workflow like Griffin's, the useful test is the "what did the model see?" test. Ask the system a question. Get the answer. Then ask, "show me every record the model used to reach that conclusion." A grounded system produces a list of specific, versioned records. A large-context-with-prompt-pasting system produces a prompt, which is a very different artifact and a much less useful one for audit.

Context grounding is not a model feature. It is a system property. Griffin is a system built to have it.

Related articles in AI Security

AI Security

Safeguard Now Supports Every Major AI Model Family for Zero-Day Discovery: Anthropic, OpenAI, Gemini, Microsoft, Meta, and Your Own Models

You should not have to choose between your organization's AI strategy and your security platform. Safeguard's agentic zero-day discovery and remediation pipeline now works on Anthropic Claude Fable 5, OpenAI GPT, Google Gemini, Microsoft Phi, Meta Llama, Safeguard native models, and privately hosted custom models — all running as first-class agents in the same Multi-Agent TAOR Deep Think AI Engine.

June 9, 2026Read
AI Security

Anthropic Claude Mythos Releases Tomorrow: Capabilities, Benchmarks, and What Security Teams Must Do Now

Anthropic's Claude Mythos model goes public on June 10, 2026 — a frontier AI that scored 97.6% on the Math Olympiad, completed expert-level hacking tasks at 73% success, and found 271 vulnerabilities in Firefox 150. Here is everything security teams need to know before it lands, and how Safeguard already supports Mythos zero-day discovery natively.

June 9, 2026Read
AI Security

Claude Fable 5: Anthropic's Most Capable Public Model Is Here — Benchmarks, Capabilities, and What It Means for Security

Anthropic just released Claude Fable 5, its most capable publicly available model and the first Mythos-class AI open to everyone. 80.3% on SWE-Bench Pro, 88% on Terminal-Bench 2.1, state-of-the-art across software engineering, vision, and scientific research. Safeguard has already integrated Fable 5 natively — here is everything you need to know.

June 9, 2026Read

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.