When an application calls an AI model via API, it trusts the response came from the model it asked for. Most of the time, this trust is warranted. In specific conditions — compromised proxy, misconfigured router, man-in-the-middle at the network layer — the response can come from a different model. The application behaves normally from the user's perspective but is now executing against an attacker-controlled (or attacker-substituted) model. Model substitution is a structural risk that gets insufficient attention.
Why this is possible
Three paths:
- Network-layer substitution. A proxy in the path returns a response that doesn't originate from the intended endpoint.
- Configuration drift. The application's model-endpoint config changes without authorisation.
- Load-balancer routing errors. An infrastructure bug routes traffic to the wrong upstream.
Each has produced real incidents at scale.
How to detect it
Three techniques:
- Response fingerprinting. Known behavioural signatures of the intended model can be checked per-response.
- Endpoint attestation. Some providers support signed responses; verifying the signature catches substitution.
- Distribution drift. Unexpected behaviour patterns on a known input set suggest the model has changed.
Griffin AI's eval harness serves partly as substitution detection: if the deployed model's behaviour drifts from the eval baseline, substitution is one hypothesis.
How to prevent it
Four layers:
- TLS with certificate pinning for model endpoints.
- Response signing where supported.
- Configuration change management with audit trail.
- Network monitoring for unexpected egress.
Each is standard infrastructure hygiene applied to the AI-endpoint-calling layer.
How Safeguard Helps
Safeguard's model-endpoint calls include certificate pinning, response verification where supported, and continuous eval monitoring that catches substitution-induced behaviour drift. For customers whose AI threat model includes network-layer manipulation, these controls are the defence.