← Concepts & Glossary
AI Security

Capability Scoping

Restricting what each MCP server or AI tool can actually do — even if the model asks for more.

What is capability scoping?

Capability scoping is the principle — and the set of mechanisms — that enforces what an AI tool is allowed to do, independent of what a model asks it to do. The distinction matters: the model is the requester, not the arbiter. A scope sits between the two and says yes or no based on policy, not on persuasion.

The shorthand: the model asks, the scope decides. If your safety story is "we told the model in the system prompt not to do that," you do not have capability scoping — you have a suggestion. Capability scoping is what happens when the suggestion fails and the tool still refuses.

How it works

Enforcement lives at the tool layer, not the model layer. Three techniques carry most of the load:

  1. Per-capability IAM. Each tool exposes fine-grained operations — repo:read, repo:write, secret:read — and holds a service identity scoped to exactly the subset its bundle needs. A call for a capability outside the identity's grant returns a hard no from the API, not a polite refusal from the model.
  2. Policy evaluation before invocation. Every tool call passes through a policy engine that checks agent, tool, arguments, and context against rules ("this agent may write to repos it owns, but not to main branches, and never outside business hours"). The engine decides before the side effect happens.
  3. Data envelope enforcement. Beyond "can the agent call this?" comes "what data is allowed through the boundary?" Secrets, PII, and sensitive fields get redacted or blocked at the tool layer so a clever prompt can't smuggle them out in arguments or results.

Why it matters

Models are non-deterministic, their context windows are untrusted, and their system prompts lose to sufficiently motivated user input. These are not failures of individual models — they are properties of the technology. A control strategy that depends on the model refusing is not a strategy.

Capability scoping is the inverse: it assumes the model will eventually get confused, tricked, or jailbroken, and it builds the controls around the tool layer instead. That is where they can be tested deterministically, audited, and upgraded without depending on which model shipped this quarter.

What value it adds

  • Enforcement survives model failure

    Whether the model hallucinates, the context gets injected, or the prompt gets jailbroken, the scope holds. Controls that depend on model compliance do not.

  • Testable, not aspirational

    A scope can be unit-tested: given this agent and this request, the answer is deterministically allow or deny. System-prompt "rules" cannot be tested that way.

  • Blast radius collapses to the capability set

    The worst thing a compromised agent can do is the worst action in its scope — not the worst action in your entire stack.

  • Model swaps become safe

    Moving from Claude to GPT to an open-source model does not change your security posture, because the enforcement layer lives below the model. You re-evaluate quality, not controls.

  • Compliance evidence is concrete

    "Here is the policy, here is the evaluation log, here is the denied call" maps directly onto audit expectations. "Here is the system prompt we wrote" does not.

How Safeguard uses it

Capability scoping is the enforcement core of MCP server security. Every tool call on the Safeguard control plane is policy-evaluated before it runs, and the platform security layer adds tenant-level and data-envelope checks on top.

Stop trusting the system prompt.

See how Safeguard enforces capability scoping at the tool layer — where policy is testable, deterministic, and model-independent.