AI Security

Data Exfiltration via LLM Agents in 2026

Tool-using agents have become a viable exfiltration channel. The patterns showing up in incident reports, and the controls that contain them.

LLM agents have quietly become one of the most viable exfiltration channels available to an attacker in 2026. The reason is structural: agents combine read access to sensitive internal data, the ability to call tools that produce outbound network traffic, and a vulnerability to prompt injection that turns the model itself into an unwitting collaborator. Every meaningful incident response engagement we have run this year that involved an agent included exfiltration as one of the attack chains considered, and in several cases it was the primary one.

This post catalogs the exfiltration patterns we have seen in production incidents and the controls that have actually contained them. The framing is the practitioner's: what does the attack look like in real traffic, and what do you put in front of it to stop it.

What does the exfiltration attack chain look like?

The basic exfiltration chain has three stages and is depressingly consistent across the incidents we have reviewed. The attacker plants a prompt injection payload in content the agent will retrieve, often through a public-facing channel like a customer support email, a public webpage the agent crawls, or a shared document the user opens. The agent's retrieval or browsing tool ingests the payload, which contains instructions to gather specific sensitive data from other tools the agent has access to and to send it to an attacker-controlled endpoint. The agent executes the instructions, packages the sensitive data, and uses one of several available channels to transmit it outward. The channels vary by deployment: HTTP requests through a fetcher tool, image tags rendered in markdown output, base64-encoded content in tool-call parameters, even DNS queries through a tool that resolves hostnames. The MITRE ATLAS framework now documents this chain explicitly under technique groups for indirect prompt injection and tool exfiltration, and it is a useful map for defenders.

Which channels matter most in 2026?

Three exfiltration channels account for the bulk of the incidents we have triaged this year. The first is markdown image rendering: the agent emits a markdown image tag whose URL contains the exfiltrated content as a query parameter, and the user's browser dutifully fetches the attacker-controlled URL, encoding the data in the request. This works against any chat UI that renders markdown without restricting image domains, which is most of them by default. The second is direct tool-call exfiltration: the agent has an HTTP fetch tool or a webhook-call tool, and the injection induces it to call an attacker URL with sensitive content in the body. The third is search-query exfiltration: the agent has a web search tool, and the injection encodes data into search queries that the attacker reads from access logs on a controlled domain. Each channel needs a distinct control, and the highest-impact deployments harden all three at once.

How is egress filtering being applied?

Egress filtering is the load-bearing control for the tool-call exfiltration vector, and it has matured into a real discipline this year. The pattern is to route all agent-generated outbound traffic through a dedicated egress proxy with strict allowlists for destination domains. Internal-only tools should be on a wholly separate network segment with no internet egress at all. Tools that need external access should be limited to their specific destinations: the Stripe API client can reach api.stripe.com and nothing else. Cloud-native deployments are using VPC endpoint policies and Squid-style egress proxies in roughly equal measure, with the choice driven by existing infrastructure rather than security tradeoffs. The other increasingly common pattern is per-request egress decisions: the egress proxy authenticates each outbound request against the originating agent session and refuses traffic to domains the user does not have a legitimate business reason to reach.

What about content-level exfiltration controls?

Content-level controls operate at the boundary where sensitive data could leave the system, regardless of channel. DLP for agent outputs is the emerging discipline here, and it borrows heavily from established DLP for email and chat. The agent's response and outbound tool-call parameters are scanned for known sensitive patterns, structured identifiers like credit card numbers and SSNs, internal document fingerprints, and content matching deployed DLP policies. Blocked responses are intercepted before delivery, logged, and routed to incident response. The catch is that DLP works well for structured PII and poorly for unstructured sensitive content like business plans or source code. The mature deployments combine pattern-based DLP with a classifier trained on the organization's own sensitive document corpus, and they accept the false positive rate as the cost of the false negative reduction. Microsoft Purview, Symantec DLP, and the newer Nightfall and Cyberhaven platforms all see meaningful deployment in this role.

How are teams catching the indirect injection that enables this?

The exfiltration chain depends on indirect prompt injection landing successfully, and the upstream controls that prevent injection are the highest-leverage place to invest. We have covered injection defense in detail elsewhere; the summary is that source-level provenance tagging, content classifiers on retrieved data, and dual-LLM architectures where untrusted content never reaches the privileged model are the durable patterns. The other key control is human-in-the-loop approval for any outbound action that could plausibly carry sensitive content. Approval gates on agents that send email, post to external services, or fetch URLs add latency, and they cut exfiltration risk dramatically. The teams operating sensitive agents in production are increasingly defaulting to approval gates on outbound actions and treating fully autonomous outbound action as an exception requiring explicit risk acceptance.

How Safeguard Helps

Safeguard maps the exfiltration risk surface of your agent deployments through reachability analysis and policy enforcement. Griffin AI traces which outbound network paths are reachable from each agent service and flags configurations missing egress restrictions or DLP coverage. Policy gates in CI block deployments that introduce new outbound tool capabilities without corresponding network and content controls. Our zero-day feed surfaces newly disclosed indirect injection techniques and exfiltration patterns within hours of publication, so detection rules and policies can be updated proactively. TPRM scoring evaluates the third-party services your agents call outward to, including their access logging and breach disclosure practices, and zero-CVE container images reduce the attack surface on the agent runtime itself, leaving fewer paths for the initial foothold that the exfiltration chain depends on.

data exfiltration ai agents dlp egress filtering indirect injection

Back to all articles