The Model Context Protocol has become the default way AI agents talk to tools, databases, and APIs. That is mostly good — a standard beats every vendor rolling their own integration format — but it has also silently turned every MCP server into a production-grade integration surface with the security maturity of a 2023 hackathon project. In the last six months I have reviewed MCP deployments that handed a coding agent root on a prod database, and others that piped a customer support agent's tool-call output directly to bash.
This post is the MCP hardening checklist I use when the engineering team wants agents in production but also wants to keep their cloud bill and their customer data intact.
What Is the MCP Threat Model?
The MCP threat model has three main attackers: a malicious or compromised MCP server that returns doctored tool responses to influence agent behavior, a malicious prompt or document consumed by the agent that exfiltrates data through legitimate tools, and a compromised upstream dependency of the MCP server itself. All three converge at the same question — what can the agent do with the credentials the MCP server holds?
Why Does MCP Make This Worse Than Traditional Integrations?
MCP makes the agent's authority implicit. A human integrating a database API has a mental model of what queries are reasonable; an agent is optimizing against a prompt and whatever tool descriptions the MCP server advertises. Tool descriptions are prompt-injectable by definition. If the server is compromised or malicious, the descriptions can steer the agent toward destructive actions that look plausible from the model's perspective.
Step-by-Step Implementation
Step 1: Inventory Every MCP Server Your Agents Can Reach
List every MCP server in use, its source, its version, and what credentials it holds. Treat this inventory like you would an IAM role inventory — it is the basis of every control that follows. Any server whose provenance you cannot trace gets shut off until you can.
Step 2: Run Servers in Per-Tenant Isolation
MCP servers should not run as long-lived shared processes with pooled credentials. Run one server process per user or per tenant, spawned on demand, with the exact credentials that user should be able to wield. Kubernetes Jobs, Firecracker microVMs, or gVisor-isolated containers are all reasonable substrates. The goal is that a prompt injection in Alice's session cannot read Bob's data because the server holding Bob's credentials is a different process.
apiVersion: batch/v1
kind: Job
metadata:
name: mcp-session-{{ .UserID }}
spec:
ttlSecondsAfterFinished: 300
template:
spec:
serviceAccountName: mcp-session-{{ .UserID }}
containers:
- name: mcp
image: ghcr.io/acme/mcp-postgres@sha256:abc...
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: mcp-creds-{{ .UserID }}
key: database_url
resources:
limits:
memory: 256Mi
cpu: 500m
securityContext:
runAsNonRoot: true
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
restartPolicy: Never
Step 3: Scope Tool Permissions Tight
MCP tools should be narrow verbs with narrow parameters, not "run SQL" or "call any HTTP URL." A good test: write down, for each tool, the worst thing an adversarial agent could do with it. If the answer is "extract the customer table," redesign the tool. Ship query_customer_orders(customer_id) instead of query_database(sql). The contraction from general to specific is where most MCP security lives.
Step 4: Enforce Authorization at the Server, Not the Agent
Never trust the agent to decide whether a tool call is authorized. The MCP server must independently check that the authenticated identity associated with the session has permission to invoke this tool with these parameters. Express policy as code and evaluate it inside the server before the tool executes:
# mcp_server.py
from mcp.server import Server
from opa_client import check
server = Server("postgres-mcp")
@server.tool("query_customer_orders")
async def query_customer_orders(ctx, customer_id: str):
decision = await check(
policy="mcp.postgres.allow",
input={
"user": ctx.auth.user_id,
"tool": "query_customer_orders",
"customer_id": customer_id,
"tenant": ctx.auth.tenant_id,
},
)
if not decision.allow:
raise PermissionError(decision.reason)
return await db.fetch_orders(customer_id)
Step 5: Require Signed Tool Manifests
Agents should only trust tool descriptions from MCP servers whose manifests are signed. Every published MCP server should ship a signed manifest (cosign-signed JSON) that describes the tools, parameter schemas, and version. On the agent side, refuse to initialize a session with a server whose manifest signature does not verify against your allowed identity list.
Step 6: Defend Against Prompt Injection Through Tool Output
Tool output can contain prompt injections — a malicious email body the agent just fetched, a stack trace with attacker-controlled strings, or a compromised document. Treat every tool output as untrusted input. Apply input sanitization and, more importantly, confirm destructive operations with an explicit human-in-the-loop step before the agent executes them. A simple require_confirmation wrapper on every tool with non-reversible effects goes a long way.
Step 7: Log Everything, Actually Review It
Every MCP call — tool name, parameters, agent identity, session ID, timestamp, response hash — goes to a tamper-evident log. Ship it to a SIEM with anomaly detection on parameter patterns. The reason agents get away with data exfiltration for weeks is that nobody watches the tool-call logs. A simple alert on "this agent made 10x more read_file calls than its 30-day baseline" catches a huge class of attacks.
# fluent-bit config
[OUTPUT]
Name loki
Match mcp.*
Host loki.internal
Labels tool=$tool, agent=$agent_id, tenant=$tenant_id
Line_format json
Step 8: Rate Limit on Identity and Parameter Fingerprints
Apply rate limits not just per session but per tool-and-parameter-fingerprint. A single agent calling read_file 500 times with different paths in 60 seconds is almost always attacker behavior. Build the limiter into the MCP server framework so it applies uniformly to every tool you add.
Step 9: Contain Credentials With Short-Lived Tokens
MCP servers should not hold long-lived API keys. They should exchange the caller's session-scoped credential for a short-lived downstream token (STS, OIDC token exchange, or a native equivalent) at the moment of tool execution. If the server is compromised, the blast radius is one session's lifespan, not the keys' lifespans.
Step 10: Test the Agent-Plus-MCP System Against a Red Team
Static review of MCP server code catches some bugs. It does not catch emergent behavior of the model-plus-tool system. Run adversarial red-team exercises where a prompt injection payload lives in data the agent will fetch, and see what gets exfiltrated. Automate this with agent-level fuzzing harnesses that try to cause the agent to call destructive tools via indirect prompt injection.
What Are the MCP-Specific CVEs to Worry About?
The MCP specification itself has had several vulnerabilities in 2025-2026 around authentication flows, schema validation, and transport security. The most impactful have been around SSE transport session hijacking and JSON-RPC batching bypassing authorization. Track the modelcontextprotocol/spec repo advisories and subscribe to the MCP-Security mailing list. Pin your MCP SDK version like you would any other critical dependency.
How Do You Handle MCP Servers You Did Not Write?
Treat them like any third-party service with elevated privileges. Pin to a signed release, review the code, restrict what credentials the server gets, put an egress firewall in front of it, and log everything it does. For community-maintained MCP servers, fork the repo and maintain your own build — that way an upstream compromise cannot push new code into your environment without your review.
How Do You Tell If an Agent Has Been Prompt-Injected in Production?
Watch for three patterns: tool-call sequences that deviate from the agent's normal workflow (reading files outside the task scope, making external network calls), sudden increases in token usage per session, and model outputs that reference data the agent should not be emitting. Ship these as detection rules in your SIEM. The same tool-call log that handles audit doubles as your IDS for agent behavior.
How Safeguard.sh Helps
Safeguard.sh monitors MCP servers as first-class supply-chain components. Our reachability engine walks 100-level depth through MCP server code to determine which tool calls actually reach vulnerable dependencies, turning CVE triage on agent infrastructure from a firehose into a queue. Griffin AI flags anomalous tool-call sequences against baseline patterns and drafts incident response context automatically. The TPRM module tracks the identity and provenance of every third-party MCP server your agents reach, complete with SBOM diffing between versions. Our container self-healing runtime rebuilds and redeploys MCP server images the moment a reachable CVE or malicious-package signal lands, so compromised agent infrastructure does not stay compromised for more than a release cycle.