AI Security

AI Agent Blast Radius Management

Every agent in production has a blast radius. Most teams have not measured theirs. Here is how to measure it and how to bring it under control.

Nayan Dey
Senior Security Engineer
7 min read

Why blast radius is the right unit

When teams talk about agent security, the conversation usually defaults to either model-level concerns like prompt injection or process-level concerns like vulnerability scans. Those matter, but they miss the metric that actually matters in production. The metric is blast radius, which is the maximum harm a single agent can cause if its decisions are entirely wrong.

Blast radius is what a security engineer measures when deciding whether an agent is safe to deploy. It is what an incident responder measures when deciding whether an active incident is bounded or unbounded. It is what an executive should be asking about when evaluating whether to expand an agent's role. And it is the metric that drives every other defence in the stack, because the goal of those defences is to keep the blast radius bounded.

This post walks through how to measure agent blast radius and the four levers that bring it under control.

Measuring blast radius

Blast radius for an agent is the union of every action the agent can take, weighted by the harm of each action. In practice, you measure it by enumerating the tools the agent can call, the credentials those tools use, the resources those credentials can reach, and the side effects of the actions those resources support.

The output of the measurement is a list. For a documentation agent, the list might be read access to public documentation and nothing else. For a deployment agent, the list might include read and write access to a specific cluster, ability to trigger rollbacks, ability to pause deployments. For a customer support agent, the list might include read access to a customer database, ability to issue refunds up to a limit, ability to send emails on behalf of the company. Each list is the blast radius of the agent.

The measurement has to be honest. The temptation is to describe the agent in terms of what it usually does. The blast radius is what it could do, not what it usually does. An agent that usually only reads but has the credentials to write has the blast radius of a writer.

The four levers

Once you have measured blast radius, four levers reduce it. Each lever applies to a different part of the agent's authority.

The first lever is tool selection. The agent should have access only to the tools it needs for its declared purpose. This sounds obvious and is regularly violated, because adding tools is easy and removing them creates resistance from users who liked having them. A periodic review of which tools an agent actually called over the last quarter is the cheapest way to find tools that can be removed without anyone noticing.

The second lever is scope. Tools the agent does need should be scoped to the smallest envelope that still allows the agent to do its job. A deployment agent that always operates on the checkout cluster should not have credentials that can deploy to other clusters. The scope of the credentials defines the upper bound on the agent's blast radius regardless of what the model decides to do.

The third lever is rate limiting and value capping. Many actions are recoverable as long as they are bounded in volume or in monetary value. An agent that can issue refunds is fine if the refund limit per customer per day is small. An agent that can send emails is fine if the email send rate is bounded. The bounds turn unbounded blast radius into bounded blast radius without removing the action entirely.

The fourth lever is confirmation. Actions that cannot be made safe through scope or rate limiting should require a human in the loop. The confirmation flow we covered in out-of-band-confirmation-for-irreversible-tool-calls is the right pattern, with a small set of actions that genuinely warrant it.

Per-user vs per-tenant blast radius

A subtle point that catches many teams is that blast radius has to be measured at multiple levels. There is the blast radius of a single agent acting on behalf of a single user. There is the blast radius of all the agents in a tenant in aggregate. And there is the blast radius across all tenants.

A single low-privilege user's agent might have a small blast radius. The aggregate blast radius across all users in a tenant might still be large if there are enough users. And if the agent platform itself can be compromised, the blast radius across all tenants is everything every agent could do.

The implication is that the controls have to apply at all three levels. Per-user controls bound what one user can do. Per-tenant controls bound what one tenant can do, which matters for incidents that cross multiple users. Platform controls bound what the entire system can do, which matters for incidents at the platform layer. Skipping any level leaves a gap that an attacker will eventually find.

Reducing blast radius without losing the agent

The hardest part of blast radius management is reducing it without making the agent useless. An agent with no tools cannot do anything. An agent with read-only tools can answer questions but cannot take action. The point of the work is to find the smallest blast radius that still lets the agent be useful for its declared purpose.

The pattern that works is to start small and expand deliberately. A new agent is launched with a minimal toolset, scoped credentials, and tight rate limits. Over time, as the agent's behavior is observed and the team gains confidence in its decisions, the toolset can be expanded with explicit reviews. The reverse direction, starting with broad authority and trying to narrow it later, almost never works, because users have already built workflows that depend on the broader authority and removing it breaks them.

Blast radius in incident response

When an incident happens, blast radius is the first thing the responder needs to know. If the agent's blast radius is small and well-understood, the incident is bounded almost by definition. The responder can move quickly to contain it because they know what the upper bound looks like. If the blast radius is large or poorly understood, the incident expands into an investigation that has to figure out what the agent could have done before it can decide what it actually did.

The investment in measuring blast radius in advance pays off most clearly during an incident. A team that has not done the measurement spends the first day of an incident doing the measurement under pressure, with all the errors that come with that. A team that has done the measurement starts the incident with an answer and spends the day responding rather than reconstructing.

What good blast radius management looks like

A team that has blast radius under control has a few visible signs. There is a documented blast radius for every agent in production, reviewed quarterly and updated when the agent changes. There are explicit limits at the agent level, the user level, the tenant level, and the platform level. New tools require an explicit blast radius review before they can be added. Periodic exercises test the limits to make sure they actually hold under realistic conditions.

If you can see all of those signs, your blast radius is bounded and you can defend that boundedness in front of an auditor or a board. If any of them are missing, the blast radius is whatever the system happens to allow today, which is the wrong place to be.

How Safeguard Helps

Safeguard computes blast radius automatically from your agent and tool configurations, surfaces the highest-risk agents in your environment, and lets you set limits at every level the policy needs. Scope analysis runs as part of MCP server registration, rate limits and value caps are policy primitives rather than custom code, and the audit log gives you the data to validate that real traffic stays within the declared blast radius. When an incident happens, the answer to the first question your responder asks is already in the platform.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.