AI Security

AI Agent Tool Calling Security: Risks and Mitigations

AI agents that call tools -- APIs, databases, file systems, code interpreters -- convert non-deterministic LLM output into real-world actions. Securing this boundary is the defining challenge of agentic AI.

Bob
Security Researcher
7 min read

The transition from chatbot LLMs to agentic LLMs is a fundamental shift in the security model. A chatbot produces text. An agent produces actions. When a chatbot hallucinates, you get a wrong answer. When an agent hallucinates, it might delete a database table, send an email to the wrong person, or deploy code to production.

Tool calling -- the mechanism by which an LLM decides to invoke external functions -- is where non-deterministic language generation meets deterministic system operations. The LLM does not "decide" in the human sense. It generates tokens that, when parsed, match a function call pattern. The reliability of these generated function calls determines the security of the entire agent system.

How Tool Calling Works

The LLM receives a system prompt that describes available tools, their parameters, and their purposes. When the LLM generates a response, it can choose to output a structured tool call instead of (or alongside) natural language text. The orchestrator (LangChain, CrewAI, the Anthropic agent SDK, or a custom framework) parses this output, executes the function, and feeds the result back to the LLM for the next step.

The security-critical observation is that the LLM is making authorization decisions. It decides which tool to call, with what parameters, in what order. These decisions are based on pattern matching in the model's weights, influenced by the prompt context. The model has no concept of authorization, data sensitivity, or blast radius. It generates the most probable next token given its context.

Threat Categories

Unintended tool invocation. The model calls a tool it should not call, or calls a tool with parameters the user did not intend. A user asks "show me the deletion policy" and the model invokes a delete_resource tool instead of a get_policy tool because the word "deletion" triggered the wrong function match.

Parameter manipulation through prompt injection. An attacker crafts input that causes the model to call a tool with attacker-specified parameters. "Process this refund for order #12345, amount $99999" might cause the model to invoke a refund tool with an inflated amount if the prompt injection overrides the intended parameters.

Tool chaining attacks. The model is prompted to call a sequence of tools that individually are safe but in combination produce an unauthorized outcome. Read a database to get an admin email, then send an email using that address, then use the email confirmation to reset a password.

Excessive tool permissions. The model has access to tools with broader permissions than any single user request should require. A tool that reads any database table is more dangerous than a tool that reads the user's records in one table.

Confidentiality violation through tool results. The model calls a tool that returns sensitive data, then includes that data in its response to the user. The user might not be authorized to see the data, but the tool returned it because the tool operates with system-level permissions.

Architecture for Secure Tool Calling

Principle of least privilege for tools. Each tool should have the minimum permissions necessary for its function. A tool that queries user orders should not have access to all orders in the system. Parameterize tools with the current user's authorization context, not with system-level access.

This means the tool definitions are user-specific. When User A interacts with the agent, the get_orders tool is scoped to User A's orders. When User B interacts, the same tool is scoped to User B's orders. The scoping happens in the tool implementation, not in the prompt -- because the prompt is not a security boundary.

Tool call validation layer. Insert a validation layer between the LLM's tool call output and the actual tool execution. This layer checks: Is this tool call authorized for this user? Are the parameters within expected bounds? Does this tool call combined with recent tool calls constitute an authorized workflow?

The validation layer is deterministic code, not another LLM call. It applies rules that are predictable, testable, and auditable. The LLM suggests what to do; the validation layer decides whether to allow it.

Human-in-the-loop for high-risk actions. Define a risk classification for each tool. Low-risk tools (read-only queries, formatting) execute automatically. Medium-risk tools (creating records, sending non-sensitive messages) require confirmation for unusual patterns. High-risk tools (deletion, financial transactions, access control changes) always require explicit human approval.

The risk classification should be configurable and should default to higher risk. A new tool should require human approval until its behavior is understood.

Rate limiting and quotas. Limit how many tool calls an agent can make per session and per time window. An agent in a loop (repeatedly calling a tool because the model misinterprets the results) can exhaust API quotas, create enormous bills, or trigger abuse detection systems. Rate limits bound the damage.

Tool call logging and audit. Every tool call, its parameters, and its result must be logged. When something goes wrong -- and with non-deterministic systems, things will go wrong -- the audit log is the only way to understand what happened and why.

Input Validation for Tool Parameters

The LLM generates tool parameters as structured data (typically JSON). These parameters must be validated with the same rigor as any user input in a traditional application.

Type validation. If a tool expects an integer, reject strings. If it expects a UUID, reject arbitrary strings. The LLM sometimes generates parameters that are close but not quite right -- a number as a string, a date in the wrong format. Strict type validation catches these before they cause errors or unexpected behavior.

Range validation. Numeric parameters should have bounds. A refund amount should not exceed the original order amount. A date range should not span more than a year. A limit parameter should not be 999999.

Allowlist validation. For parameters that accept a finite set of values (status codes, category names, resource types), validate against an allowlist. Do not rely on the LLM to generate only valid values -- verify.

Injection prevention. If tool parameters are used in database queries, API calls, or command execution, apply the same injection prevention as for any user input: parameterized queries, API call construction through SDKs, and never string concatenation for commands.

Sandboxing Tool Execution

Tools that execute code (interpreters, shell commands, SQL queries) should run in sandboxed environments.

Container isolation. Execute code in ephemeral containers with no network access, limited file system access, and constrained resource limits. The container is destroyed after execution.

Read-only file systems. Tools that read files should not have write access. Tools that write files should write to a staging area that requires human approval before committing.

Network isolation. Tools should not make arbitrary network requests unless their function requires it. A database query tool needs database access but not internet access. An API call tool needs access to specific endpoints but not all endpoints.

Monitoring and Anomaly Detection

Establish baselines for normal agent behavior and alert on deviations.

Tool call frequency. If an agent typically makes 3-5 tool calls per user interaction and suddenly makes 50, something is wrong -- either a prompt injection attack or a model loop.

Parameter distributions. Track the distribution of tool parameters. If the get_records tool normally receives query parameters with 1-100 record limits and suddenly receives a request for 1 million records, flag it.

Tool call sequences. Define expected tool call sequences (read, then process, then respond) and flag unexpected sequences (delete without prior read, multiple mutations without read).

Error rate changes. A spike in tool call errors may indicate that the model is being manipulated into making calls with invalid parameters.

How Safeguard.sh Helps

Safeguard.sh monitors the agent framework dependencies that underpin tool calling systems -- LangChain, CrewAI, AutoGen, and their rapidly evolving dependency trees. These frameworks receive frequent updates that can change tool calling behavior, introduce bugs, or patch security issues. Safeguard.sh provides continuous vulnerability monitoring for the entire agent framework stack, ensuring that the infrastructure your AI agents run on is free from known supply chain vulnerabilities. For teams building agentic AI applications, Safeguard.sh secures the foundation while you focus on securing the agent's behavior.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.