AI Security

Griffin AI vs GPT-5: Context Grounding

A million-token context window is a tool, not a solution. Context grounding for security requires architecture, not just capacity.

One of the more common misconceptions about modern frontier models is that large context windows have solved the grounding problem. It is an understandable intuition. If a model can read a million tokens, surely you can just hand it everything relevant and let it reason over the whole picture. In practice, this works about as well as giving a new employee every document the company has ever produced and asking them to make a decision by lunch.

Griffin AI's position on this is straightforward. We use frontier models—specifically Anthropic's Claude family—for the reasoning layer, and we do not compete with GPT-5 or any other frontier model as a general intelligence. But context grounding for security is a problem we have spent real engineering effort on, and we think the distinction between "large context window" and "grounded context" is worth drawing clearly.

What a context window actually is

A context window is the working memory a model can reference during a single inference. Larger windows let the model hold more material at once: longer documents, more conversation history, bigger codebases. This is genuinely useful. Tasks that previously required summarization or chunking can now run end-to-end.

What a large context window does not do is retrieve the right material. The model is still reasoning over whatever the application chose to put in front of it. If the application puts the wrong documents in, the model reasons over the wrong documents. The fact that it can hold a million tokens of wrong material is not a feature.

The pure-GPT-5 pattern and its limits

The common pattern with a large-context model looks like this: the engineer, or their application, dumps a lot of material into the prompt—the whole repository, the full CVE database entry, a month of logs, the entire policy document—and asks a question. The model reads it all and answers.

For some questions, this works. The model is good at finding needles in haystacks, especially in recent generations. But the pattern has several failure modes that become obvious in production security work:

Relevance decay. The model attends to everything, and important signal gets diluted by noise. A critical VEX statement buried in page forty of a pasted bundle is there, but it might not drive the conclusion it should.
Stale snapshots. Whatever was in the prompt at the moment of the query is what the model sees. An SBOM that was accurate last week is wrong this week, and the model has no way to know.
Unaudited grounding. The application chose what to include. That choice is not visible to the security team reviewing the output. If the model missed a finding, it is impossible to tell whether the model erred or the prompt-builder erred.
Permission blur. Everything in the prompt is available to the model regardless of who is asking. A developer and the CISO end up with the same context, which is a problem for least privilege.

None of these are model problems. They are architecture problems. They are what happens when context grounding is treated as a prompt-engineering exercise rather than an engineering discipline.

What Griffin does instead

Griffin treats context grounding as a first-class subsystem, not a prompt pattern. Every query goes through a scoped retrieval layer before it reaches the frontier model. That layer is responsible for four things:

Identity scoping. Who is asking, and what are they permitted to see? The user's role, team membership, and explicit access grants determine which projects, products, and findings are in play.
Freshness. The retrieval layer pulls the latest version of each record at query time. An SBOM is not a copy in the prompt; it is a live reference. A finding's status is whatever it is right now, not whatever it was when the conversation started.
Relevance. Griffin does not paste everything. It pulls the records that relate to the specific question, joined across entities—SBOM to product, product to policy, policy to gate, gate to finding—so the model reasons over a small, focused, meaningful context.
Auditability. Every record pulled is logged. If someone asks why the model reached a particular conclusion, Griffin can replay the retrieval and show exactly what the model saw.

This is what we mean by an engine around the model. The frontier reasoning is the same. The quality of the context it reasons over is not.

A concrete example

Suppose a developer asks, "What is the status of CVE-2025-67890 across my services?"

In a pure large-context workflow, the prompt-builder has to decide what to include. It might include a list of the developer's services, the SBOMs for each, any findings matching the CVE, and some policy text. If it misses one service, the model misses that service. If an SBOM is stale, the answer is stale. If the policy snippet is out of date, the recommendation is wrong.

In a Griffin workflow, the retrieval layer resolves the question. It identifies the developer's services based on their access, fetches the current SBOMs, joins to active findings, pulls any VEX statements, and includes the applicable policy. The frontier model then reasons over a focused, fresh, scoped context and produces an answer whose evidence is traceable end to end.

The model did not get smarter between the two workflows. The context did.

Why this matters for security specifically

Security is an unusually unforgiving domain for context errors. A general question answered over slightly stale data is still usually useful. A vulnerability question answered over slightly stale SBOM data is wrong, and the wrongness is hard to notice until something breaks. The cost of getting grounding wrong in security is not a subpar answer; it is a false sense of safety.

This is the deeper reason Griffin invests in grounding architecture even though the underlying model is perfectly capable of holding large contexts. The model is capable. The organization is not capable of making sure the right things land in that context without a system doing the grounding work. A large-context model is a large dinner table. It does not decide what to serve.

The "retrieval-augmented generation" question

Readers familiar with the AI space will recognize this as a variant of retrieval-augmented generation, and ask how Griffin's grounding differs from a standard RAG setup bolted on top of a large-context model.

The difference is that a standard RAG system retrieves from a single vector store based on semantic similarity. Griffin's grounding is structural. It knows that an SBOM belongs to a product, that a product has findings, that a finding may have a VEX, that a policy references a category of finding. The retrieval walks these relationships rather than matching embeddings. For security, this matters because the right answer often requires joining across entities the way a SQL query would, not fetching the nearest neighbor in a vector space.

Griffin does use semantic retrieval for free-text content like advisories and policy language. But the backbone of grounding is the structured relationships between tenant records. A general RAG pattern, no matter how well tuned, does not capture those relationships out of the box.

Where frontier reasoning carries the day

The reasoning layer still matters. Once Griffin has assembled a grounded, scoped, fresh context, the frontier model does the work of weighing the evidence, writing the explanation, and drafting the recommendation. This is the part where model quality matters, and it is the part where Griffin benefits from using a capable frontier model. We are not trying to replace GPT-5 or any other model as a reasoning engine. We are making sure the reasoning engine sees the right input.

The test to run

If you are comparing a pure large-context workflow to a grounded workflow like Griffin's, the useful test is the "what did the model see?" test. Ask the system a question. Get the answer. Then ask, "show me every record the model used to reach that conclusion." A grounded system produces a list of specific, versioned records. A large-context-with-prompt-pasting system produces a prompt, which is a very different artifact and a much less useful one for audit.

Context grounding is not a model feature. It is a system property. Griffin is a system built to have it.

griffin-ai openai gpt-5 ai-security

Back to all articles

More on #griffin-ai

View all →

AI Security

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.

Griffin AI vs GPT-5: Context Grounding

What a context window actually is

The pure-GPT-5 pattern and its limits

What Griffin does instead

A concrete example

Why this matters for security specifically

The "retrieval-augmented generation" question

Where frontier reasoning carries the day

The test to run

More on #griffin-ai

Total Cost of Ownership: Griffin AI vs Mythos

API Surface Reviewed: Griffin AI vs Mythos

Real-World Deployment: Griffin AI vs Mythos

Safeguard Griffin AI: Eval Benchmarks Published

Related articles in AI Security

Building an Eval Suite for Your Security LLM Workflows

Zero-Day Discovery With LLM-Augmented Reachability: A Safeguard Engine Walkthrough

Frontier LLM Vendors Are Not Your Supply Chain Security Vendor

Never miss an update

Product

Solutions

Compare

Resources

Company

Legal

Developers