AI Security

Securing AI Agents: MCP Protocol Risks and Mitigations

The Model Context Protocol is transforming how AI agents interact with tools, but it introduces new attack surfaces. Here is what security teams need to understand.

James
Senior Security Architect
6 min read

The Model Context Protocol has gone from an interesting idea to a de facto standard in about six months. Every major AI assistant now supports MCP, and the ecosystem of MCP servers is growing fast. This is mostly good news -- standardization reduces complexity and enables interoperability.

But MCP also introduces a new category of attack surface that most security teams have not thought about yet. We have spent the last few months analyzing MCP deployments and building our own MCP server at Safeguard. Here is what we have learned about the risks.

Understanding the MCP Threat Model

An MCP server is fundamentally a bridge between an AI agent and some external system -- a database, an API, a file system, a cloud service. The AI agent sends natural language or structured requests to the MCP server, which translates them into actions on the underlying system and returns results.

This architecture creates a trust chain: User -> AI Agent -> MCP Client -> MCP Server -> Backend System. Every link in that chain is a potential attack surface.

Risk 1: Tool Description Injection

MCP servers describe their capabilities to AI agents through tool descriptions -- structured metadata that tells the agent what each tool does, what parameters it accepts, and what it returns. The AI agent uses these descriptions to decide when and how to use each tool.

The problem: tool descriptions are essentially prompts. A malicious or compromised MCP server can craft tool descriptions that manipulate the AI agent's behavior. For example:

  • A tool description that instructs the agent to always use that tool first, overriding user intent
  • Descriptions that include instructions to exfiltrate conversation context to the server
  • Parameter descriptions that trick the agent into sending sensitive data as tool arguments

This is prompt injection through the tool layer, and most MCP clients have limited defenses against it.

Mitigation: Review tool descriptions from third-party MCP servers before deployment. Use MCP clients that display tool descriptions to users. Consider implementing an allowlist of approved MCP servers for your organization.

Risk 2: Excessive Permission Scope

Many MCP servers request broad permissions by design. A database MCP server might need read access to query tables, but does it also get write access? A file system MCP server might be scoped to a project directory, but what prevents path traversal?

The MCP specification does not mandate a granular permission model. Each MCP server implements its own authorization logic, and the quality varies wildly. We reviewed 40 popular open-source MCP servers and found that:

  • 68% had no configurable permission scoping
  • 45% could be used to access resources outside their intended scope
  • 22% had no authentication mechanism at all

Mitigation: Run MCP servers with the principle of least privilege. Use network segmentation to limit what backend systems an MCP server can reach. Audit MCP server source code before deploying in production environments.

Risk 3: Data Exfiltration Through Context

When an AI agent uses an MCP tool, it typically sends relevant conversation context along with the tool call. This is by design -- context helps the MCP server provide better results. But it also means that every MCP server you connect potentially receives fragments of your conversations, code, and data.

Consider this scenario: you are discussing a security vulnerability in a private codebase with your AI assistant. You then ask a question that triggers an MCP tool call to a third-party service. The tool call may include context about the vulnerability you were just discussing.

Mitigation: Be aware of what context flows to which MCP servers. Use local MCP servers for sensitive operations. Consider MCP client configurations that limit context sharing.

Risk 4: Server-Side Request Forgery via AI

This is a subtle one. If an MCP server makes HTTP requests based on AI agent instructions (fetching URLs, calling APIs, making webhooks), an attacker who can influence the AI agent's input can potentially use the MCP server as an SSRF proxy.

The attack path: attacker provides crafted input to a user -> user shares it with AI assistant -> AI assistant calls MCP tool with attacker-controlled parameters -> MCP server makes request to internal network.

This is not theoretical. We demonstrated this attack path in a controlled environment against three different MCP servers.

Mitigation: MCP servers should validate and sanitize all parameters, especially URLs and identifiers. Implement egress filtering for MCP server processes. Never run MCP servers on internal networks without proper network controls.

Risk 5: Insecure Credential Storage

MCP servers need credentials to access backend systems. In many deployments, these credentials are stored in plain text in MCP client configuration files. Claude Desktop stores configuration in a JSON file in the user's home directory. VS Code stores it in workspace settings. These files are often readable by any process running as the same user.

Mitigation: Use environment variables or secret management systems for MCP credentials. Never commit MCP configuration files with credentials to version control. Rotate MCP server credentials on a regular schedule.

Building Secure MCP Servers

Based on our experience building the Safeguard MCP Server, here are principles we follow:

Input validation at every boundary. Do not trust that the AI agent will send well-formed requests. Validate every parameter against expected types and ranges. Reject requests that do not match the expected schema.

Scoped authentication. Our MCP server uses the same role-based access control as the Safeguard web interface. If a user does not have permission to view a project in the UI, they cannot query it through MCP.

Minimal context consumption. We designed our tools to require only the specific parameters they need, not broad conversation context. This limits data leakage.

Audit logging. Every tool call through our MCP server is logged with the user identity, parameters, and results. This is essential for incident investigation and compliance.

Rate limiting and anomaly detection. Unusual patterns of MCP tool usage -- high frequency, unusual parameter combinations, access to resources the user has never queried before -- trigger alerts.

The Bigger Picture

MCP is going to be the standard interface between AI agents and the tools they use. That makes MCP security a foundational concern, not a nice-to-have. The risks we outlined here are not reasons to avoid MCP -- they are reasons to deploy it thoughtfully, with the same rigor you would apply to any other interface that bridges trust boundaries.

How Safeguard Helps

The Safeguard MCP Server is built with security as a first-class concern. It uses scoped authentication, input validation, audit logging, and minimal context consumption. It connects your AI assistant to your supply chain security data without introducing new risks. And because Safeguard tracks your entire software inventory, you can use it to audit what MCP servers your organization is deploying and ensure they meet your security standards.

Related articles in AI Security

AI Security

Safeguard Now Supports Every Major AI Model Family for Zero-Day Discovery: Anthropic, OpenAI, Gemini, Microsoft, Meta, and Your Own Models

You should not have to choose between your organization's AI strategy and your security platform. Safeguard's agentic zero-day discovery and remediation pipeline now works on Anthropic Claude Fable 5, OpenAI GPT, Google Gemini, Microsoft Phi, Meta Llama, Safeguard native models, and privately hosted custom models — all running as first-class agents in the same Multi-Agent TAOR Deep Think AI Engine.

June 9, 2026Read
AI Security

Anthropic Claude Mythos Releases Tomorrow: Capabilities, Benchmarks, and What Security Teams Must Do Now

Anthropic's Claude Mythos model goes public on June 10, 2026 — a frontier AI that scored 97.6% on the Math Olympiad, completed expert-level hacking tasks at 73% success, and found 271 vulnerabilities in Firefox 150. Here is everything security teams need to know before it lands, and how Safeguard already supports Mythos zero-day discovery natively.

June 9, 2026Read
AI Security

Claude Fable 5: Anthropic's Most Capable Public Model Is Here — Benchmarks, Capabilities, and What It Means for Security

Anthropic just released Claude Fable 5, its most capable publicly available model and the first Mythos-class AI open to everyone. 80.3% on SWE-Bench Pro, 88% on Terminal-Bench 2.1, state-of-the-art across software engineering, vision, and scientific research. Safeguard has already integrated Fable 5 natively — here is everything you need to know.

June 9, 2026Read

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.