AI Security

Model Context Protocol Permissions Model Explained

MCP's permissions model is subtle. Here is a careful walkthrough of how tool scoping, sampling, and resource access actually work in production.

Nayan Dey
Senior Security Engineer
6 min read

The Model Context Protocol's permissions model is one of those designs that looks deceptively simple on first read and turns out to contain most of the actual security decisions an agent platform has to make. The protocol itself is short. The specification for tools, resources, and prompts fits in a couple of dozen pages. But the behavior a host application has to implement to turn that specification into a safe runtime is considerably richer, and it is where production teams spend most of their security engineering effort.

This post unpacks the permissions model as it stands in the April 2026 protocol revision. We cover tool invocation, resource access, sampling, the OAuth 2.1 flow, and the way host applications are expected to mediate between them. The goal is to equip security engineers to review an MCP deployment and identify the controls that are missing, because the protocol is deliberately quiet about many of them.

How does tool invocation permission work?

Tool invocation permission works as a three-layer gate: server advertises, host selects, user approves. The server publishes a list of tools it exposes, each with a name, description, and input schema. The host filters that list according to its policy, which might allowlist specific tools or scope them to the current user. The user, or a policy acting on their behalf, approves the actual invocation.

The subtlety is that the protocol does not dictate where the approval happens. A CLI host might surface every tool call as a prompt. A server-side host might pre-approve entire categories. An enterprise host might route high-risk calls to a separate approval queue. The protocol is neutral, and the security posture of an MCP deployment is almost entirely determined by the approval logic in the host. Teams that skim the spec and assume "the protocol handles it" ship agents with wide-open tool invocation, which is the single most common MCP misconfiguration we see.

What about resource access?

Resource access is a read-only surface for attaching context, and its permission model mirrors tool invocation but with lighter defaults. Resources are identified by URIs, the server advertises which ones are accessible, and the host decides which to include in a conversation. Unlike tools, resources do not execute; they return content. That sounds safer, and it is, but it is not without risk.

The risk comes from two places. First, a resource that returns attacker-controlled content is still a prompt injection vector, so the "read-only" label is misleading in terms of downstream behavior. Second, resource URIs can encode parameters that look like low-risk paths but actually trigger server-side work. We have seen a Jira MCP server where reading a resource URI with a specific query parameter caused the server to run a synchronous search across every project the user had access to. Read access is not the same as free access, and resource permissions should be scoped with the same care as tool permissions.

How does sampling change the trust model?

Sampling changes the trust model by inverting the direction of inference. In a standard MCP conversation, the host controls the model and the server is a tool. Sampling lets the server ask the host to run inference on its behalf. A server that needs a summary of some internal data can request a completion, and the host decides whether to honor the request.

Sampling is powerful and genuinely useful, especially for servers that need to do language-heavy work without exposing raw content outside the host's trust boundary. It is also the feature most likely to surprise security reviewers who focused on tool invocation. The host should apply at least the same approval logic to sampling requests as to tool calls, and it should tag the resulting completion with provenance so downstream handling knows the prompt came from a server rather than a user. The current spec supports this, but the default host behaviors are underdeveloped.

How does OAuth 2.1 fit in?

OAuth 2.1 in MCP supports servers that need to authenticate with external resources on behalf of the user. The flow follows the 2.1 draft: dynamic client registration, PKCE, resource indicators, and short-lived tokens. The host implements the client side of the flow, handles the redirect, and stores the resulting token. The server uses the token to make authenticated calls downstream.

The common mistake is granting broad scopes at token issuance time. An MCP server for Google Workspace that asks for drive.readonly, gmail.modify, and calendar all at once has a much bigger blast radius than three separate servers with one scope each. The cost of splitting is operational, but the benefit is that a compromised server can only exercise its narrow scope. For any MCP deployment that touches production data, we strongly recommend splitting servers along scope boundaries even when it means running more processes.

How should hosts mediate all of this?

Hosts should mediate through a policy engine that evaluates every tool call, resource read, sampling request, and OAuth exchange against a declarative policy. The policy should be versioned, human-readable, and auditable. It should differentiate between users, between tools, and between data classifications. Production hosts we have reviewed typically express policies in a rule language like Rego or Cedar and log every decision to a central audit sink.

The policy engine should also be where human-in-the-loop approvals are implemented. High-risk actions, writing to production systems, touching financial data, issuing refunds, should route to a human reviewer before execution. The MCP protocol supports the asynchronous approval pattern through its request-response structure, but the host has to wire up the UI and the queue. Skipping this wiring is the most common reason we see agent deployments stall in pilot.

Where does the protocol still fall short?

The protocol still falls short on data classification, cross-server isolation, and long-running operations. There is no standard way to tag a tool or resource with a data classification, so hosts have to maintain their own side tables. There is no built-in isolation between servers running in the same host process, so a crash in one can affect others. And long-running operations, anything that takes more than a few seconds, lack a standardized progress and cancellation model, which makes timeouts and partial failures harder to reason about.

These are all solvable at the host layer, and the better MCP implementations address them, but they are worth being aware of when evaluating a host for production use.

How Safeguard Helps

Safeguard treats every MCP server as a first-class supply chain asset. We generate SBOMs for each server, run reachability analysis to confirm which CVEs are callable from the tool handlers, and score TPRM risk for the supplier. Griffin AI reviews tool and resource metadata for injection patterns before deployment, and our policy gates in CI block server configurations that violate your scope, classification, or approval rules. Audit logs from MCP hosts are ingested and correlated with the rest of your supply chain telemetry, so incident response has a single pane of glass. The result is an MCP permissions model you can actually enforce, not just describe.

Related articles in AI Security

AI Security

Safeguard Now Supports Every Major AI Model Family for Zero-Day Discovery: Anthropic, OpenAI, Gemini, Microsoft, Meta, and Your Own Models

You should not have to choose between your organization's AI strategy and your security platform. Safeguard's agentic zero-day discovery and remediation pipeline now works on Anthropic Claude Fable 5, OpenAI GPT, Google Gemini, Microsoft Phi, Meta Llama, Safeguard native models, and privately hosted custom models — all running as first-class agents in the same Multi-Agent TAOR Deep Think AI Engine.

June 9, 2026Read
AI Security

Anthropic Claude Mythos Releases Tomorrow: Capabilities, Benchmarks, and What Security Teams Must Do Now

Anthropic's Claude Mythos model goes public on June 10, 2026 — a frontier AI that scored 97.6% on the Math Olympiad, completed expert-level hacking tasks at 73% success, and found 271 vulnerabilities in Firefox 150. Here is everything security teams need to know before it lands, and how Safeguard already supports Mythos zero-day discovery natively.

June 9, 2026Read
AI Security

Claude Fable 5: Anthropic's Most Capable Public Model Is Here — Benchmarks, Capabilities, and What It Means for Security

Anthropic just released Claude Fable 5, its most capable publicly available model and the first Mythos-class AI open to everyone. 80.3% on SWE-Bench Pro, 88% on Terminal-Bench 2.1, state-of-the-art across software engineering, vision, and scientific research. Safeguard has already integrated Fable 5 natively — here is everything you need to know.

June 9, 2026Read

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.