Tool-call hijacking is the attack variant where prompt injection or model manipulation induces the model to invoke tools the operator did not intend. Unlike prompt injection that produces a bad response, tool-call hijacking produces bad actions in the world — files deleted, emails sent, money moved. The defence requires the tool layer to enforce scope rather than trusting the model to request only authorised actions. Architectural choices matter here more than at the model layer.
What hijacking attacks look like
Three patterns:
- Scope-expansion attempts. The model is induced to call a tool that exists but is not authorised for the current session.
- Argument manipulation. The model calls an authorised tool but with manipulated arguments that expand the effective access.
- Cascading tool invocation. The model is induced to make a chain of tool calls that together produce unauthorised behaviour even if each individual call is authorised.
Each needs different defence.
Where model-level defences fail
"Ask the model nicely to only call authorised tools" is the weakest defence. It depends on the model's judgment, which is exactly what the attack exploits. Mythos-class tools that rely primarily on model-level enforcement have measurable vulnerability.
How Griffin AI handles it
Three architectural layers:
Authorisation at the tool layer. Each tool has scope, and the scope is enforced by the tool handler, not by the model. When the model emits a call to a tool outside the session's scope, the tool layer refuses regardless of how the model justified the call.
Argument validation against capability manifest. Each tool's capability manifest declares what arguments are acceptable. Argument values outside the declared range are rejected at the tool layer.
Cascading-invocation rate limits. Anomalous patterns of cascading tool calls (e.g., read followed immediately by exfiltrative write) trigger rate limits and optional out-of-band confirmation.
A concrete example
An MCP server exposes a list_files(directory) tool scoped to a specific directory. An attacker's prompt injection induces the model to call list_files(directory="../../etc/").
With model-level defences: the model may or may not refuse depending on training.
With Griffin AI's tool-layer enforcement: the tool handler rejects the call because the directory argument escapes the declared scope. The attacker's attempt is logged; the audit trail shows the model attempted an out-of-scope call.
What to evaluate
Three concrete checks:
- Configure an MCP server with narrow scope. Induce the model to attempt a scope-escape. Verify the scope holds.
- Attempt argument manipulation (path traversal, SQL injection in tool arguments). Verify the arguments are validated at the tool layer.
- Attempt cascading tool invocation. Verify rate limits trigger.
How Safeguard Helps
Safeguard's tool-call security is built on tool-layer enforcement of scope, argument validation, and cascading-call rate limits. The defence does not depend on the model cooperating. For organisations whose AI agents have access to real production tools, this architectural choice is the difference between a safe deployment and a latent incident.