AI Security

MCP Server Capability Drift Detection

MCP servers do not stay still. Tool surfaces drift, scopes expand, and the server you approved is not the server in production. Here is how to catch that.

Nayan Dey
Senior Security Engineer
7 min read

The drift problem

The MCP server you approved last quarter is not the MCP server in production today. That is the uncomfortable truth most teams discover six months into running an MCP program. Tools get added, descriptions get rewritten, scopes expand, and the manifest the registry has on file no longer matches the manifest the server is actually serving. The server has drifted.

Drift is not necessarily malicious. Most of it is the natural evolution of software. A team that owns an internal MCP server adds a new tool because the agents need it. A vendor ships an update that adjusts a tool's argument schema. A documentation generator regenerates the manifest with a slightly different format. None of these changes is wrong on its face. What is wrong is that none of them went through the review the original server went through, and that the registry's record of what the server does is now incorrect.

The fix is continuous drift detection, with rules for what to do when drift is found. This post walks through the mechanics.

What to compare

Drift detection requires two things. The first is a record of what the server was approved as. The second is a way to observe what the server is now. The record is the manifest captured at registration time, plus any subsequent updates that were explicitly approved. The observation is the manifest the server reports today, fetched on a schedule.

The comparison is not a string diff. A useful drift detector understands the structure of an MCP manifest and reports differences in terms of meaningful changes. A new tool was added. An existing tool's description was rewritten. A tool's argument schema gained a new field. A tool was removed. Each of these has a different security implication, and the detector should classify them rather than treating them all as text changes.

Categories of drift

Drift falls into a small number of categories, each with its own response.

Additive drift is when the server gains a new capability. A new tool, a new argument, a new return field. Additive drift expands the surface and almost always requires review. The risk is that the new capability has not been scope-checked, and that an agent will start calling it without anyone realizing.

Subtractive drift is when the server loses a capability. A tool is removed, an argument is dropped. Subtractive drift is generally lower risk from a security perspective but can break agents that were depending on the removed capability. It still needs to be tracked, because the registry's record needs to stay accurate.

Semantic drift is the trickiest category. A tool's name and signature are unchanged, but its behavior has shifted. The description has been rewritten to suggest the tool can do more than it used to. The argument schema has been loosened to accept inputs that were previously rejected. The return value contains data that was not there before. Semantic drift can be invisible to a structural diff, which is why drift detection has to look at descriptions, schemas, and observed behavior, not just tool names.

Implementation drift is when the manifest is unchanged but the underlying implementation has changed. The tool still claims to do what it always did, but the actual API call it makes is different, the data it touches is different, or the side effects are different. Implementation drift is the hardest category to detect from outside the server, and it is the strongest argument for using the audit log to validate what tools are actually doing.

Detection mechanics

A drift detector needs three sources of input. The first is the live manifest, fetched from the server on a schedule. Polling every few minutes is reasonable for high-risk servers, hourly is fine for most, and daily is the floor. The second is the audit log, which provides ground truth for what tools the server actually called and what they actually did. The audit log catches implementation drift that the manifest cannot reveal. The third is the policy log, which records when a tool call was denied, because a sudden spike in denials often signals that drift has expanded the tool's scope beyond what is allowed.

The detector compares the live manifest to the approved manifest, classifies the differences, and produces a drift report. The report flows into a queue. Low-risk drift, like a description rewording with no scope change, can be auto-approved with a record. Higher-risk drift, like a new write tool, lands in a review queue that suspends the affected credential until a human signs off.

What to do when drift is detected

The response to drift has to be proportional. Cutting traffic on every minor change makes the system unusable. Letting everything through makes drift detection pointless. The pattern that has worked is a four-tier response.

For trivial drift, the registry updates its record automatically and the server keeps running. For minor drift, the registry updates its record but flags the change for the server's owner to review at their convenience. For significant drift, the server is moved into a degraded mode where existing approved tools work but new ones are not callable, and the owner has a window to either roll back the change or get the new capability approved. For major drift, the server's credential is suspended and traffic is blocked until a real review completes.

The thresholds between tiers are policy decisions. Adding a read-only tool that operates on data the server already has access to might be minor drift. Adding a write tool that operates on regulated data is major drift. The point is to be deliberate about where the lines are, not to pretend that all drift is equivalent.

Versioning the manifest

A subtle but important practice is versioning the manifest the registry has approved. Every approved manifest gets a version, and every drift detection runs against a specific version rather than the current floating state. When a server is updated, the previous approved manifest is preserved, and the new manifest goes through review before becoming the new approved version.

The benefit is that drift becomes a clear, auditable event. There is a moment when the manifest changed from version 7 to version 8, who reviewed it, and what changed between them. Without versioning, drift becomes a continuous fog where nobody can say definitively what was approved when.

Running drift detection at scale

Once you have more than a handful of MCP servers, manual review of every drift event stops scaling. The pattern that has worked is heavy automation for the common cases and human attention reserved for the unusual ones. Most drift is benign and can be classified by rules. The detector should aim to surface only the drift that needs human judgment, with everything else handled automatically and logged for later review.

This is the same pattern that mature dependency-management programs use. Most dependency upgrades are safe and run on autopilot. A small fraction are risky and get human review. The same pattern works for MCP server drift, with the same productivity benefits.

How Safeguard Helps

Safeguard runs drift detection continuously across every registered MCP server, classifies changes into the categories the policy cares about, and triggers the right response automatically. Manifests are versioned, drift events are part of the audit record, and when a major change is detected, the server's credentials are suspended without anyone having to remember to do it. The registry's record stays in sync with what is actually running, which is the foundation of every other control in the program.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.