Direct prompt injection requires the attacker to get their payload in front of the user. Retrieval context poisoning requires them to get it into the RAG index, which is often more accessible. The attack then affects every query that retrieves the poisoned content. Leverage scales with ingest surface rather than with attacker-to-user proximity. This is the structural reason why RAG poisoning is a different class of problem than classic prompt injection.
Why scale is different
Three structural reasons:
- One payload, many victims. A poisoned document in a knowledge base affects every query that retrieves it. High leverage per attack.
- Persistence. Unlike a prompt injection that affects one session, a poisoned document persists across sessions, users, and updates.
- Indirection. The attacker is not the user. Detection requires reasoning about the content, not the user's behaviour.
Defences that work for direct prompt injection don't automatically work here.
Where frontier models struggle
Frontier models cannot distinguish poisoned content from legitimate content in the retrieved context window. The model sees text; it tries to be helpful. Adversarial text that looks like helpful content is followed.
The limit is structural. Model-level improvements help at the margin but don't close the gap.
Defences that work
Four layers:
- Ingest governance. Curated sources; provenance required.
- Source attribution in outputs. Users see where content came from; suspicious sources get reviewed.
- Retrieval anomaly detection. Unusual retrieval patterns flagged.
- Capability scoping. Even if the model is influenced, its authorised actions are bounded.
Each layer reduces exposure. Combined, they produce reasonable defence in depth.
How Safeguard Helps
Safeguard's RAG-adjacent features include ingest governance, source attribution, retrieval anomaly detection, and capability scoping. For customers deploying RAG in production, the defence-in-depth posture is what makes the deployment safe rather than the model's own instructions.