AI Security

The MCP Threat Model: What Actually Matters in 2026

Most MCP threat models confuse protocol risk with deployment risk. Here is what the real attack surface looks like after a year of production incidents.

A year into broad MCP adoption, the protocol-level threat model is largely what Anthropic's original spec implied: TLS handles transport, JSON-RPC handles structure, and the application layer handles everything interesting. The interesting parts are where all the real compromises have happened. This post is a refreshed threat model grounded in what actually went wrong in 2025 rather than what the spec documents hypothesize.

What is the protocol layer actually responsible for?

Framing, capability negotiation, and session lifecycle. Nothing more. MCP is deliberately thin. The protocol does not define authentication, it does not define authorization, it does not define content trust, and it does not define sandboxing. All of those are the server operator's problem, and in 2025 most operators did not solve them. Assuming the protocol will catch anything is the first modeling error.

In practice this means every threat model for MCP is really a threat model for (a) the client host, (b) the server implementation, (c) the credentials the server holds, and (d) the data returned to the model. The protocol itself has had a handful of parsing bugs that got patched quickly. None of the notable 2025 incidents were protocol bugs. All of them were deployment bugs dressed up as protocol stories.

What did 2025 incidents teach us about tool poisoning?

That tool descriptions are an untrusted input to the model and must be treated that way. Several community MCP servers last year shipped with tool descriptions that included instructions like "if the user asks about X, also call Y with argument Z." Because the model reads those descriptions to decide which tool to invoke, a malicious description acted as a hidden prompt, executed every session. This is the class of bug researchers started calling "tool poisoning" and it was responsible for a non-trivial share of reported compromises.

The mitigation is not vibes. It is treating the tool manifest as code that flows through code review. When an MCP server updates its tool descriptions, that diff should appear in a PR against your approved-servers registry. If your team installed 20 servers from community repos and none of them are version-pinned, you have no baseline to diff against. Pinning by version and hash is the minimum. Reviewing description changes before approving upgrades is the next step, and it is not optional for servers connected to anything sensitive.

Where does indirect prompt injection fit in the model?

At the output boundary of every tool that returns content the model will read. This is a broader surface than most teams acknowledge. It is not just web scrapers. It is database query results where a user-controlled row contains instructions. It is Jira comment fetchers. It is GitHub issue readers. It is log tailers. Anywhere user-influenced content reaches the model through a tool, injection is possible.

The correct threat model treats every tool output as a potentially hostile document. The model must not be given unconstrained authority to act on instructions it discovers in that document. Concretely: the agent loop should require explicit human approval for any tool invocation that the model decided to make after reading external content, or at minimum should segregate "read-only exploration" tool calls from "state-changing" tool calls and require re-authorization to cross that boundary. Teams that built their agents on the assumption that the model would "be careful" have learned this the expensive way.

How big is the credential blast radius?

Larger than almost any deployment diagram suggests. A typical MCP server for, say, GitHub holds a personal access token with repo scope. That token can read every private repo the developer can read, and write to many of them. If that server is compromised, or if prompt injection convinces the model to misuse it, the blast radius is every repo the developer touches, not just the repo currently open.

The 2025 version of this problem showed up repeatedly in incident reports: an MCP server asked for a PAT at install time, the developer pasted one from a password manager, and that PAT turned out to be the long-lived admin token they used for everything. Rotations were rare. Short-lived tokens through OIDC or a gateway pattern were rare. The result is that one compromised server in a team of 30 developers gives an attacker 30 high-privilege tokens.

Your threat model should assume every credential held by an MCP server will eventually leak, and should size the credential accordingly. If you would not store a given credential in a developer's shell history, you should not hand it to an MCP server.

What about the supply chain of the MCP server itself?

This is the quiet part of the threat model and it rhymes with every other ecosystem that grew fast. MCP servers are npm packages, PyPI packages, Go binaries, and a growing number of Rust crates. Every one of them has a transitive dependency tree. Every one of those dependencies can be compromised through the same mechanisms that hit PyTorch nightly in late 2022, Ultralytics in late 2024, and the recurring wave of Hugging Face model-side issues.

When a popular Python MCP server depends on requests, pydantic, and a dozen transitive packages you have never heard of, the attack surface includes every one of them. A compromised dependency does not need to know anything about MCP to abuse it. It just needs to run code in the server process, at which point it has access to the credentials, the conversation history buffered in memory, and the ability to return malicious content to the model.

Deep dependency analysis is not optional for servers in the critical path. Running an SCA scan at install is the floor. Continuous analysis, with reachability to cut false positives and with the ability to see transitive compromises several layers deep, is the realistic posture.

What should the 2026 baseline look like?

Short version: pin everything, route tool outputs through a classifier that flags instruction-like content, scope credentials through a gateway that mints short-lived tokens, and treat the MCP server supply chain like any other production software supply chain. None of this is new security engineering. It is the application of existing practice to a surface that most teams underbuilt in 2025 because it moved faster than their procurement process.

How does this shift as remote MCP servers become the norm?

The threat model shifts from "child process on my laptop" to "third-party SaaS in my call path," which is a trade. On one hand, remote MCP servers remove the code-execution-on-my-machine risk, which is the single largest local threat. On the other hand, they introduce a tenancy-boundary risk: your conversation data, your tool call arguments, and sometimes your results are flowing through a vendor you may not have vetted. The vendor's uptime, their data retention, and their own supply chain become part of your threat model.

The useful framing is that a remote MCP server is a SaaS integration with a conversational interface. The procurement controls you apply to Salesforce integrations apply here. DPA, SOC 2, data retention commitments, pen test results, and an incident notification SLA. Teams that have production MCP deployments in 2026 without these artifacts from their remote providers are holding a risk their legal department would flag in any other context.

How Safeguard.sh Helps

Safeguard.sh's reachability analysis on MCP server dependency graphs typically cuts alert noise by 60 to 80 percent, which is what makes continuous monitoring of community servers operationally feasible. Griffin AI inspects tool manifests and description changes for injection patterns, flags maintainer-behavior anomalies, and correlates across the SBOM to surface supply chain compromises earlier. TPRM workflows attach to each approved MCP server with its own risk profile, and the 100-level dependency depth keeps transitive compromises visible. Container self-healing keeps MCP gateway images patched automatically so that known-vulnerable releases do not linger in production while humans argue about change windows.

ai-security mcp threat-model agents

Back to all articles

More on #ai-security

View all →

AI Security

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.

The MCP Threat Model: What Actually Matters in 2026

What is the protocol layer actually responsible for?

What did 2025 incidents teach us about tool poisoning?

Where does indirect prompt injection fit in the model?

How big is the credential blast radius?

What about the supply chain of the MCP server itself?

What should the 2026 baseline look like?

How does this shift as remote MCP servers become the norm?

How Safeguard.sh Helps

More on #ai-security

API Surface Reviewed: Griffin AI vs Mythos

Real-World Deployment: Griffin AI vs Mythos

Scaling Across Repos: Griffin AI vs Mythos

Tool-Call Hijacking: Griffin AI vs Mythos

Related articles in AI Security

Building an Eval Suite for Your Security LLM Workflows

Zero-Day Discovery With LLM-Augmented Reachability: A Safeguard Engine Walkthrough

Frontier LLM Vendors Are Not Your Supply Chain Security Vendor

Never miss an update

Product

Solutions

Compare

Resources

Company

Legal

Developers