Best Practices

How to Secure AI Agents on the MCP Protocol

MCP gives AI agents real tools, real credentials, and real blast radius. Here is a hardening guide for running MCP servers in production without torching your environment.

The Model Context Protocol has become the default way AI agents talk to tools, databases, and APIs. That is mostly good — a standard beats every vendor rolling their own integration format — but it has also silently turned every MCP server into a production-grade integration surface with the security maturity of a 2023 hackathon project. In the last six months I have reviewed MCP deployments that handed a coding agent root on a prod database, and others that piped a customer support agent's tool-call output directly to bash.

This post is the MCP hardening checklist I use when the engineering team wants agents in production but also wants to keep their cloud bill and their customer data intact.

What Is the MCP Threat Model?

The MCP threat model has three main attackers: a malicious or compromised MCP server that returns doctored tool responses to influence agent behavior, a malicious prompt or document consumed by the agent that exfiltrates data through legitimate tools, and a compromised upstream dependency of the MCP server itself. All three converge at the same question — what can the agent do with the credentials the MCP server holds?

Why Does MCP Make This Worse Than Traditional Integrations?

MCP makes the agent's authority implicit. A human integrating a database API has a mental model of what queries are reasonable; an agent is optimizing against a prompt and whatever tool descriptions the MCP server advertises. Tool descriptions are prompt-injectable by definition. If the server is compromised or malicious, the descriptions can steer the agent toward destructive actions that look plausible from the model's perspective.

Step-by-Step Implementation

Step 1: Inventory Every MCP Server Your Agents Can Reach

List every MCP server in use, its source, its version, and what credentials it holds. Treat this inventory like you would an IAM role inventory — it is the basis of every control that follows. Any server whose provenance you cannot trace gets shut off until you can.

Step 2: Run Servers in Per-Tenant Isolation

MCP servers should not run as long-lived shared processes with pooled credentials. Run one server process per user or per tenant, spawned on demand, with the exact credentials that user should be able to wield. Kubernetes Jobs, Firecracker microVMs, or gVisor-isolated containers are all reasonable substrates. The goal is that a prompt injection in Alice's session cannot read Bob's data because the server holding Bob's credentials is a different process.

apiVersion: batch/v1
kind: Job
metadata:
  name: mcp-session-{{ .UserID }}
spec:
  ttlSecondsAfterFinished: 300
  template:
    spec:
      serviceAccountName: mcp-session-{{ .UserID }}
      containers:
      - name: mcp
        image: ghcr.io/acme/mcp-postgres@sha256:abc...
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: mcp-creds-{{ .UserID }}
              key: database_url
        resources:
          limits:
            memory: 256Mi
            cpu: 500m
        securityContext:
          runAsNonRoot: true
          readOnlyRootFilesystem: true
          allowPrivilegeEscalation: false
          capabilities:
            drop: ["ALL"]
      restartPolicy: Never

Step 3: Scope Tool Permissions Tight

MCP tools should be narrow verbs with narrow parameters, not "run SQL" or "call any HTTP URL." A good test: write down, for each tool, the worst thing an adversarial agent could do with it. If the answer is "extract the customer table," redesign the tool. Ship query_customer_orders(customer_id) instead of query_database(sql). The contraction from general to specific is where most MCP security lives.

Step 4: Enforce Authorization at the Server, Not the Agent

Never trust the agent to decide whether a tool call is authorized. The MCP server must independently check that the authenticated identity associated with the session has permission to invoke this tool with these parameters. Express policy as code and evaluate it inside the server before the tool executes:

# mcp_server.py
from mcp.server import Server
from opa_client import check

server = Server("postgres-mcp")

@server.tool("query_customer_orders")
async def query_customer_orders(ctx, customer_id: str):
    decision = await check(
        policy="mcp.postgres.allow",
        input={
            "user": ctx.auth.user_id,
            "tool": "query_customer_orders",
            "customer_id": customer_id,
            "tenant": ctx.auth.tenant_id,
        },
    )
    if not decision.allow:
        raise PermissionError(decision.reason)
    return await db.fetch_orders(customer_id)

Step 5: Require Signed Tool Manifests

Agents should only trust tool descriptions from MCP servers whose manifests are signed. Every published MCP server should ship a signed manifest (cosign-signed JSON) that describes the tools, parameter schemas, and version. On the agent side, refuse to initialize a session with a server whose manifest signature does not verify against your allowed identity list.

Step 6: Defend Against Prompt Injection Through Tool Output

Tool output can contain prompt injections — a malicious email body the agent just fetched, a stack trace with attacker-controlled strings, or a compromised document. Treat every tool output as untrusted input. Apply input sanitization and, more importantly, confirm destructive operations with an explicit human-in-the-loop step before the agent executes them. A simple require_confirmation wrapper on every tool with non-reversible effects goes a long way.

Step 7: Log Everything, Actually Review It

Every MCP call — tool name, parameters, agent identity, session ID, timestamp, response hash — goes to a tamper-evident log. Ship it to a SIEM with anomaly detection on parameter patterns. The reason agents get away with data exfiltration for weeks is that nobody watches the tool-call logs. A simple alert on "this agent made 10x more read_file calls than its 30-day baseline" catches a huge class of attacks.

# fluent-bit config
[OUTPUT]
    Name  loki
    Match mcp.*
    Host  loki.internal
    Labels tool=$tool, agent=$agent_id, tenant=$tenant_id
    Line_format json

Step 8: Rate Limit on Identity and Parameter Fingerprints

Apply rate limits not just per session but per tool-and-parameter-fingerprint. A single agent calling read_file 500 times with different paths in 60 seconds is almost always attacker behavior. Build the limiter into the MCP server framework so it applies uniformly to every tool you add.

Step 9: Contain Credentials With Short-Lived Tokens

MCP servers should not hold long-lived API keys. They should exchange the caller's session-scoped credential for a short-lived downstream token (STS, OIDC token exchange, or a native equivalent) at the moment of tool execution. If the server is compromised, the blast radius is one session's lifespan, not the keys' lifespans.

Step 10: Test the Agent-Plus-MCP System Against a Red Team

Static review of MCP server code catches some bugs. It does not catch emergent behavior of the model-plus-tool system. Run adversarial red-team exercises where a prompt injection payload lives in data the agent will fetch, and see what gets exfiltrated. Automate this with agent-level fuzzing harnesses that try to cause the agent to call destructive tools via indirect prompt injection.

What Are the MCP-Specific CVEs to Worry About?

The MCP specification itself has had several vulnerabilities in 2025-2026 around authentication flows, schema validation, and transport security. The most impactful have been around SSE transport session hijacking and JSON-RPC batching bypassing authorization. Track the modelcontextprotocol/spec repo advisories and subscribe to the MCP-Security mailing list. Pin your MCP SDK version like you would any other critical dependency.

How Do You Handle MCP Servers You Did Not Write?

Treat them like any third-party service with elevated privileges. Pin to a signed release, review the code, restrict what credentials the server gets, put an egress firewall in front of it, and log everything it does. For community-maintained MCP servers, fork the repo and maintain your own build — that way an upstream compromise cannot push new code into your environment without your review.

How Do You Tell If an Agent Has Been Prompt-Injected in Production?

Watch for three patterns: tool-call sequences that deviate from the agent's normal workflow (reading files outside the task scope, making external network calls), sudden increases in token usage per session, and model outputs that reference data the agent should not be emitting. Ship these as detection rules in your SIEM. The same tool-call log that handles audit doubles as your IDS for agent behavior.

How Safeguard.sh Helps

Safeguard.sh monitors MCP servers as first-class supply-chain components. Our reachability engine walks 100-level depth through MCP server code to determine which tool calls actually reach vulnerable dependencies, turning CVE triage on agent infrastructure from a firehose into a queue. Griffin AI flags anomalous tool-call sequences against baseline patterns and drafts incident response context automatically. The TPRM module tracks the identity and provenance of every third-party MCP server your agents reach, complete with SBOM diffing between versions. Our container self-healing runtime rebuilds and redeploys MCP server images the moment a reachable CVE or malicious-package signal lands, so compromised agent infrastructure does not stay compromised for more than a release cycle.

mcp ai-agents agent-security llm-security zero-trust

Back to all articles

More on #mcp

View all →

AI Security

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.

How to Secure AI Agents on the MCP Protocol

What Is the MCP Threat Model?

Why Does MCP Make This Worse Than Traditional Integrations?

Step-by-Step Implementation

Step 1: Inventory Every MCP Server Your Agents Can Reach

Step 2: Run Servers in Per-Tenant Isolation

Step 3: Scope Tool Permissions Tight

Step 4: Enforce Authorization at the Server, Not the Agent

Step 5: Require Signed Tool Manifests

Step 6: Defend Against Prompt Injection Through Tool Output

Step 7: Log Everything, Actually Review It

Step 8: Rate Limit on Identity and Parameter Fingerprints

Step 9: Contain Credentials With Short-Lived Tokens

Step 10: Test the Agent-Plus-MCP System Against a Red Team

What Are the MCP-Specific CVEs to Worry About?

How Do You Handle MCP Servers You Did Not Write?

How Do You Tell If an Agent Has Been Prompt-Injected in Production?

How Safeguard.sh Helps

More on #mcp

Claude MCP Tool Poisoning Threat Model 2026

AI Agent Tool Confused Deputy Problem in 2026

MCP Server Inventory: Griffin AI vs Mythos

Safeguard MCP Server: Public Release Details

Related articles in Best Practices

Open Source vs Commercial Security Scanners 2026

Buyer Guide: Software Supply Chain Security 2026

Best Secret Scanning Tools 2026 Comparison

Never miss an update

Product

Solutions

Compare

Resources

Company

Legal

Developers