AI Security

AI Agent Tool Calling Security: Risks and Mitigations

AI agents that call tools -- APIs, databases, file systems, code interpreters -- convert non-deterministic LLM output into real-world actions. Securing this boundary is the defining challenge of agentic AI.

Bob
Security Researcher
7 min read

The transition from chatbot LLMs to agentic LLMs is a fundamental shift in the security model. A chatbot produces text. An agent produces actions. When a chatbot hallucinates, you get a wrong answer. When an agent hallucinates, it might delete a database table, send an email to the wrong person, or deploy code to production.

Tool calling -- the mechanism by which an LLM decides to invoke external functions -- is where non-deterministic language generation meets deterministic system operations. The LLM does not "decide" in the human sense. It generates tokens that, when parsed, match a function call pattern. The reliability of these generated function calls determines the security of the entire agent system.

How Tool Calling Works

The LLM receives a system prompt that describes available tools, their parameters, and their purposes. When the LLM generates a response, it can choose to output a structured tool call instead of (or alongside) natural language text. The orchestrator (LangChain, CrewAI, the Anthropic agent SDK, or a custom framework) parses this output, executes the function, and feeds the result back to the LLM for the next step.

The security-critical observation is that the LLM is making authorization decisions. It decides which tool to call, with what parameters, in what order. These decisions are based on pattern matching in the model's weights, influenced by the prompt context. The model has no concept of authorization, data sensitivity, or blast radius. It generates the most probable next token given its context.

Threat Categories

Unintended tool invocation. The model calls a tool it should not call, or calls a tool with parameters the user did not intend. A user asks "show me the deletion policy" and the model invokes a delete_resource tool instead of a get_policy tool because the word "deletion" triggered the wrong function match.

Parameter manipulation through prompt injection. An attacker crafts input that causes the model to call a tool with attacker-specified parameters. "Process this refund for order #12345, amount $99999" might cause the model to invoke a refund tool with an inflated amount if the prompt injection overrides the intended parameters.

Tool chaining attacks. The model is prompted to call a sequence of tools that individually are safe but in combination produce an unauthorized outcome. Read a database to get an admin email, then send an email using that address, then use the email confirmation to reset a password.

Excessive tool permissions. The model has access to tools with broader permissions than any single user request should require. A tool that reads any database table is more dangerous than a tool that reads the user's records in one table.

Confidentiality violation through tool results. The model calls a tool that returns sensitive data, then includes that data in its response to the user. The user might not be authorized to see the data, but the tool returned it because the tool operates with system-level permissions.

Architecture for Secure Tool Calling

Principle of least privilege for tools. Each tool should have the minimum permissions necessary for its function. A tool that queries user orders should not have access to all orders in the system. Parameterize tools with the current user's authorization context, not with system-level access.

This means the tool definitions are user-specific. When User A interacts with the agent, the get_orders tool is scoped to User A's orders. When User B interacts, the same tool is scoped to User B's orders. The scoping happens in the tool implementation, not in the prompt -- because the prompt is not a security boundary.

Tool call validation layer. Insert a validation layer between the LLM's tool call output and the actual tool execution. This layer checks: Is this tool call authorized for this user? Are the parameters within expected bounds? Does this tool call combined with recent tool calls constitute an authorized workflow?

The validation layer is deterministic code, not another LLM call. It applies rules that are predictable, testable, and auditable. The LLM suggests what to do; the validation layer decides whether to allow it.

Human-in-the-loop for high-risk actions. Define a risk classification for each tool. Low-risk tools (read-only queries, formatting) execute automatically. Medium-risk tools (creating records, sending non-sensitive messages) require confirmation for unusual patterns. High-risk tools (deletion, financial transactions, access control changes) always require explicit human approval.

The risk classification should be configurable and should default to higher risk. A new tool should require human approval until its behavior is understood.

Rate limiting and quotas. Limit how many tool calls an agent can make per session and per time window. An agent in a loop (repeatedly calling a tool because the model misinterprets the results) can exhaust API quotas, create enormous bills, or trigger abuse detection systems. Rate limits bound the damage.

Tool call logging and audit. Every tool call, its parameters, and its result must be logged. When something goes wrong -- and with non-deterministic systems, things will go wrong -- the audit log is the only way to understand what happened and why.

Input Validation for Tool Parameters

The LLM generates tool parameters as structured data (typically JSON). These parameters must be validated with the same rigor as any user input in a traditional application.

Type validation. If a tool expects an integer, reject strings. If it expects a UUID, reject arbitrary strings. The LLM sometimes generates parameters that are close but not quite right -- a number as a string, a date in the wrong format. Strict type validation catches these before they cause errors or unexpected behavior.

Range validation. Numeric parameters should have bounds. A refund amount should not exceed the original order amount. A date range should not span more than a year. A limit parameter should not be 999999.

Allowlist validation. For parameters that accept a finite set of values (status codes, category names, resource types), validate against an allowlist. Do not rely on the LLM to generate only valid values -- verify.

Injection prevention. If tool parameters are used in database queries, API calls, or command execution, apply the same injection prevention as for any user input: parameterized queries, API call construction through SDKs, and never string concatenation for commands.

Sandboxing Tool Execution

Tools that execute code (interpreters, shell commands, SQL queries) should run in sandboxed environments.

Container isolation. Execute code in ephemeral containers with no network access, limited file system access, and constrained resource limits. The container is destroyed after execution.

Read-only file systems. Tools that read files should not have write access. Tools that write files should write to a staging area that requires human approval before committing.

Network isolation. Tools should not make arbitrary network requests unless their function requires it. A database query tool needs database access but not internet access. An API call tool needs access to specific endpoints but not all endpoints.

Monitoring and Anomaly Detection

Establish baselines for normal agent behavior and alert on deviations.

Tool call frequency. If an agent typically makes 3-5 tool calls per user interaction and suddenly makes 50, something is wrong -- either a prompt injection attack or a model loop.

Parameter distributions. Track the distribution of tool parameters. If the get_records tool normally receives query parameters with 1-100 record limits and suddenly receives a request for 1 million records, flag it.

Tool call sequences. Define expected tool call sequences (read, then process, then respond) and flag unexpected sequences (delete without prior read, multiple mutations without read).

Error rate changes. A spike in tool call errors may indicate that the model is being manipulated into making calls with invalid parameters.

How Safeguard Helps

Safeguard monitors the agent framework dependencies that underpin tool calling systems -- LangChain, CrewAI, AutoGen, and their rapidly evolving dependency trees. These frameworks receive frequent updates that can change tool calling behavior, introduce bugs, or patch security issues. Safeguard provides continuous vulnerability monitoring for the entire agent framework stack, ensuring that the infrastructure your AI agents run on is free from known supply chain vulnerabilities. For teams building agentic AI applications, Safeguard secures the foundation while you focus on securing the agent's behavior.

Related articles in AI Security

AI Security

Safeguard Now Supports Every Major AI Model Family for Zero-Day Discovery: Anthropic, OpenAI, Gemini, Microsoft, Meta, and Your Own Models

You should not have to choose between your organization's AI strategy and your security platform. Safeguard's agentic zero-day discovery and remediation pipeline now works on Anthropic Claude Fable 5, OpenAI GPT, Google Gemini, Microsoft Phi, Meta Llama, Safeguard native models, and privately hosted custom models — all running as first-class agents in the same Multi-Agent TAOR Deep Think AI Engine.

June 9, 2026Read
AI Security

Anthropic Claude Mythos Releases Tomorrow: Capabilities, Benchmarks, and What Security Teams Must Do Now

Anthropic's Claude Mythos model goes public on June 10, 2026 — a frontier AI that scored 97.6% on the Math Olympiad, completed expert-level hacking tasks at 73% success, and found 271 vulnerabilities in Firefox 150. Here is everything security teams need to know before it lands, and how Safeguard already supports Mythos zero-day discovery natively.

June 9, 2026Read
AI Security

Claude Fable 5: Anthropic's Most Capable Public Model Is Here — Benchmarks, Capabilities, and What It Means for Security

Anthropic just released Claude Fable 5, its most capable publicly available model and the first Mythos-class AI open to everyone. 80.3% on SWE-Bench Pro, 88% on Terminal-Bench 2.1, state-of-the-art across software engineering, vision, and scientific research. Safeguard has already integrated Fable 5 natively — here is everything you need to know.

June 9, 2026Read

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.