AI Security

Context Window As A Security Limit

The context window is usually marketed as a capability parameter. In a security setting, it behaves like a budget, a forgetting function, and an attack surface all at once.

When vendors publish model cards, the context window is listed next to parameters like latency and throughput, as if it were just another capability dial. For a security team trying to reason about what a frontier model will and will not do, this framing is misleading in ways that matter. The context window is not simply the maximum length of a prompt. It is the mechanism by which the model holds every rule, every piece of evidence, every tool description, and every piece of user-supplied input simultaneously in view, and it is the place where most practical failures of AI security controls begin.

The context window is a shared bus

A useful mental model is to treat the context window as a shared memory bus inside the model. Everything the model considers at decision time has to live on that bus: the system prompt, the safety policy, the tool schemas, the retrieved documents, the conversation history, the user's latest message, and any intermediate reasoning the model has produced. There is no privilege separation on this bus. A token placed at position five hundred by an engineer writing a system prompt and a token placed at position five thousand by an attacker who controls a retrieved web page are, from the model's perspective, indistinguishable as data.

This is the first structural consequence. Whatever isolation you thought you had between "trusted instructions" and "untrusted input" does not exist inside the model. It exists only in your mental model of the system. The model itself sees one long sequence of tokens and attends to all of them.

Forgetting is not the end of the risk

Context windows have limits, and when those limits are exceeded, something has to give. Most systems use some combination of truncation, summarization, and sliding windows to keep the prompt within the model's capacity. Each of these strategies has security implications that are rarely stated out loud.

Truncation is the simplest: drop the oldest turns. If your safety policy is injected once at the start of the conversation and then falls off the back of the window, the model is now operating without it. Many early agent systems exhibited exactly this failure, where constraints that applied to the first few turns quietly disappeared by turn thirty.

Summarization moves the problem rather than solving it. When the model compresses earlier turns into a summary, it is making an editorial decision about what was important. An attacker who knows this can craft inputs designed to survive the summarization step, while the safety caveats that accompanied them are compressed away. The summary looks clean. The instruction to ignore previous rules lives on inside it.

Sliding windows with anchored system prompts help, but they introduce a different problem. The anchored region consumes a fixed portion of the budget forever, which means that as the window fills, the model has less and less room to attend to relevant evidence. The safety policy is preserved; the capacity to reason about whether a given action complies with it is degraded.

Longer windows do not solve it

The obvious response is to buy a larger context window. Frontier models now advertise windows measured in millions of tokens, and the marketing copy suggests that the forgetting problem is solved. It is not, for two reasons.

The first is that attention is not uniform across the window. Empirical evaluations consistently show that models attend more strongly to tokens near the start and end of the context, and less strongly to tokens in the middle. A fact placed in the middle of a million-token window is present but not necessarily influential. Security-relevant information that lands in the attention trough is effectively invisible to the decision, even though it is technically "in context."

The second is that larger windows expand the attack surface. Every additional token is another opportunity for an attacker to inject an instruction, a poisoned document, or a misleading piece of retrieval. Agents that pull in tens of thousands of tokens from untrusted sources are not safer than agents that pull in hundreds; they are less safe, because the volume of untrusted input has grown and the model's ability to attend carefully to any one piece of it has not.

The budget framing

A more useful way to think about the context window is as a budget. Every token you spend on a system prompt, a tool description, a retrieved document, or a conversation turn is a token you cannot spend on something else. Security-relevant content is competing for space with everything else the agent needs to do its job.

This matters because it makes security visible as an engineering tradeoff. Making the safety policy twice as long costs tokens that could have gone to retrieval. Adding a second tool schema costs tokens that could have gone to conversation history. Injecting a lengthy audit log of previous decisions costs tokens that could have gone to the user's actual question.

Teams that do not think about the context window as a budget end up in one of two failure modes. Either they pack it too tightly, and the model runs out of capacity to reason about the problem at hand, or they pack it too loosely, and the signal they care about drowns in noise.

The attack surface framing

The budget framing is about efficiency. The attack surface framing is about risk. Every byte that enters the context window is a potential injection point, and the question to ask about each one is: who controls this, and what would happen if they were hostile?

The system prompt is controlled by the developer. Tool schemas are usually controlled by the developer, though third-party tools change this. Conversation history is a mix of user and assistant, and can include pasted content from anywhere. Retrieved documents are often controlled by arbitrary parties on the internet. Tool outputs are controlled by whatever server responded to the tool call, which may itself be compromised or hostile.

A context window that is dominated by developer-controlled content is easier to reason about. A context window that is dominated by retrieved documents and tool outputs is a context window in which an attacker is writing most of what the model reads. The security properties of the agent are determined much more by this composition than by the cleverness of the system prompt.

What to actually do

Given all this, a few practices distinguish teams that use context windows defensively from teams that use them naively.

The first is to treat every source of context as a trust tier, and to track the composition of the window over time. If seventy percent of your agent's context is coming from untrusted retrieval, you have a retrieval-dominated agent, and the security properties you care about are almost entirely determined by what can enter your retrieval pipeline.

The second is to avoid putting anything in context that you would not want an attacker to see or exploit. Internal credentials, user PII, and sensitive business logic do not belong in prompts that will also contain untrusted input. The model cannot be relied on to keep them secret, because the model does not have a concept of secrecy that is stronger than the concept of helpfulness.

The third is to measure attention empirically rather than assuming it. For any safety-critical instruction, test whether the model actually follows it when the window is full, when it is sparse, when the instruction is at the top, and when it is at the bottom. The answers are often surprising, and they change between model versions.

The structural point

The context window is not a number on a spec sheet. It is the arena in which every frontier model decision is made, and the rules of that arena are not the rules of classical security. There is no memory protection, no privilege separation, no enforced ordering. There is only attention, distributed across a long sequence of tokens whose provenance the model cannot verify.

Accepting this is the first step toward designing systems that are safe to deploy. Pretending otherwise is how agents end up following instructions they were never supposed to see.

ai-security frontier-models limitations structural

Back to all articles

More on #ai-security

View all →

AI Security

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.

Context Window As A Security Limit

The context window is a shared bus

Forgetting is not the end of the risk

Longer windows do not solve it

The budget framing

The attack surface framing

What to actually do

The structural point

More on #ai-security

API Surface Reviewed: Griffin AI vs Mythos

Real-World Deployment: Griffin AI vs Mythos

Scaling Across Repos: Griffin AI vs Mythos

Tool-Call Hijacking: Griffin AI vs Mythos

Related articles in AI Security

Building an Eval Suite for Your Security LLM Workflows

Zero-Day Discovery With LLM-Augmented Reachability: A Safeguard Engine Walkthrough

Frontier LLM Vendors Are Not Your Supply Chain Security Vendor

Never miss an update

Product

Solutions

Compare

Resources

Company

Legal

Developers