AI Security

Tool-Call Privilege Escalation In Practice

When an agent can call tools, the permission boundary is no longer between the user and the system. It is between the model's current beliefs and everything the model can reach. That is a much harder boundary to defend.

A frontier model that can only produce text is a contained thing. It can be wrong, offensive, or persuasive, but the damage it can do is bounded by what a human will do with its output. A frontier model that can call tools is a different kind of artifact. It reaches into systems, moves data, and triggers actions. The permission model of the surrounding environment has to stretch to accommodate it, and the way that stretching typically happens is how privilege escalation gets into production.

The fundamental asymmetry

The classical picture of authorization has a user requesting an action and a system deciding whether the user is allowed to take it. The decision is based on identity, role, and policy, and the audit log records who did what. Tool-calling agents break this picture in a specific way. The user's identity is known to the system, but the actual decision about what action to take is made by the model, on the basis of whatever happens to be in its context at the moment.

This produces an asymmetry that is easy to miss. The user's permissions are the upper bound on what the agent can do, but the agent's choice of what to do is not constrained by the user's intent in any reliable way. An attacker who cannot log in as the user can still influence the agent's behavior by writing content that the agent will ingest: a document the user asks the agent to summarize, a web page the user asks the agent to analyze, an email the user asks the agent to act on. The attacker's leverage is not on the authorization system. It is on the model's beliefs.

Where the privilege actually lives

In a traditional system, the privilege to perform a sensitive action lives in a credential that is held, presented, and checked. An API key grants what an API key grants, and the security of the action depends on the security of the key.

In an agent system, the privilege lives in the space of "actions the model might take." Each tool the agent has access to is a gun that the model can fire, pointing in whatever direction the tool allows. The relevant security question is not whether the credentials are valid, because by the time the tool is called, the credentials are always valid. The question is whether the action the model chose was one the user would have endorsed if they had been asked.

That question does not have a clean answer. The user did not inspect every tool call. The model did not ask for each one. The system prompt implicitly authorized a broad class of actions by granting the tool, and now the model is picking from that class. The escalation path is not "attacker acquires credential." It is "attacker influences the picking."

The typical shapes of the attack

Several attack patterns recur in deployed agent systems.

The first is the cross-tool jump. An agent is given access to a tool that reads data and a tool that writes data, with the intention that the model use them for legitimate workflows. An attacker provides input through the read channel that causes the model to decide that the right next step is a write. The authorization system sees two correctly credentialed tool calls. The user sees a consequence they did not intend.

The second is the scope creep within a single tool. An agent has access to a search API that can query customer records. The policy allows querying for legitimate business reasons. The attacker crafts input that convinces the model that it has a legitimate business reason to enumerate records, and the model issues a query pattern that exfiltrates data within the rate limits the API allows. No tool was called outside its permitted scope, and yet the effect is a data breach.

The third is the delegated escalation. An agent is given access to a tool that can spawn another agent or delegate to another model with different permissions. The outer agent's permissions are narrow, but the inner call has broader access, and the attacker's input is designed to cause the outer agent to hand off in a way that lets the attacker's intent survive into the inner call. The permission boundary is crossed at the handoff, and the audit trail makes this look like a normal nested invocation.

Why "just restrict the tools" is harder than it sounds

The obvious mitigation is to restrict what the agent can do. Give it only the tools it needs, with only the scopes it needs, for only the duration it needs. This is correct in principle and essential in practice, but it runs into a product pressure that is easy to underestimate.

Agents are purchased and deployed because they can do useful things. Each useful thing corresponds to a tool, or to a set of tools, or to a range of parameters. Restricting the tools reduces the utility. The team that restricts too much ends up with an agent that its users stop using. The team that restricts too little ends up with an agent that can be weaponized.

The practical result is that most agent deployments sit at a local maximum of utility and a local maximum of risk. The tools are broad enough to be useful and broad enough to be dangerous. The security program's job is to find architectural patterns that keep the broadness from turning into damage.

The confused deputy in new clothes

The classical confused deputy problem is a good lens for thinking about tool-call privilege escalation. The agent has permissions that the attacker does not, and the attacker is tricking the agent into using those permissions on the attacker's behalf. This is not a new problem. It is a new instance of a very old problem, now running in a substrate where the "tricking" is done through natural language and retrieved content rather than through URL parameters and session tokens.

What is new is the scale and the flexibility of the tricking mechanism. The attacker does not need to understand the agent's internal state. They just need to produce content that is plausibly within the agent's distribution of inputs and that nudges the model toward the action they want. The art of exploitation is mostly prompt craft, and the art of defense is mostly reducing the agent's dependence on its own judgment for consequential choices.

Patterns that actually reduce risk

A few patterns have proven useful in practice, and they tend to share a common thread: they move consequential decisions out of the model and into deterministic layers.

Explicit user confirmation for high-impact actions is the oldest and still the most effective. If a tool call would move money, change access controls, or send external communication, a deterministic policy layer can pause the flow and require the user to approve the specific action with the specific parameters. The model proposes; the user disposes.

Tool-level allow-lists and parameter validation are a second layer. Rather than giving the agent a tool that can query arbitrary records, give it a tool that can query records matching a specific pattern. The pattern is enforced outside the model, and no amount of prompt cleverness changes what the tool accepts.

Separation of read and write pipelines is a third pattern. Make the agent that reads untrusted content a different process from the agent that takes actions, and pass only structured intermediate output between them. The action-taking agent never sees the raw attacker-controlled text.

Rate limiting and anomaly detection on tool usage catch the attacks that slip through the other layers. If the agent usually issues one or two write calls per session and suddenly issues a hundred, the monitoring layer should react regardless of what the model thinks it is doing.

The structural point

Tool-call privilege escalation is not a vulnerability class that will be eliminated by better models. It is a structural consequence of handing action-taking capabilities to a component whose decisions are influenced by untrusted input. The mitigations are architectural, not model-level. The teams that deploy agents safely are the ones that treat each tool as a privilege, each tool call as a decision with consequences, and each piece of context as a potential attacker.

The alternative is to trust the model's judgment, which is exactly the thing the model itself will not promise you.

ai-security frontier-models limitations structural

Back to all articles

More on #ai-security

View all →

AI Security

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.

Tool-Call Privilege Escalation In Practice

The fundamental asymmetry

Where the privilege actually lives

The typical shapes of the attack

Why "just restrict the tools" is harder than it sounds

The confused deputy in new clothes

Patterns that actually reduce risk

The structural point

More on #ai-security

API Surface Reviewed: Griffin AI vs Mythos

Real-World Deployment: Griffin AI vs Mythos

Scaling Across Repos: Griffin AI vs Mythos

Tool-Call Hijacking: Griffin AI vs Mythos

Related articles in AI Security

Building an Eval Suite for Your Security LLM Workflows

Zero-Day Discovery With LLM-Augmented Reachability: A Safeguard Engine Walkthrough

Frontier LLM Vendors Are Not Your Supply Chain Security Vendor

Never miss an update

Product

Solutions

Compare

Resources

Company

Legal

Developers