AI Security

Network egress controls for autonomous agent runtimes

Autonomous agents need network access to do useful work, and that access is exactly what attackers exploit when they trick an agent into exfiltrating data. Here is how to design egress controls that hold up under adversarial pressure.

The job description of an autonomous agent runtime almost always includes network access. Fetch this page, call that API, query the database, post to the webhook, hit the search endpoint — useful agents are useful because they can reach out, and the same property is exactly what makes the runtime a confused-deputy hazard the moment an attacker gets to influence what it reaches for. Egress is where prompt-injection attacks turn into actual data leaks, and the difference between a runtime that contains the damage and one that does not is almost always a question of how the network surface is shaped.

The challenge is that egress controls designed for ordinary workloads do not translate cleanly to agent runtimes. A backend service typically talks to a small, stable set of destinations that can be allowlisted statically; an agent talks to whatever destination the model decides to reach for, often based on user instructions, often to addresses that did not exist when the policy was written. Building controls that keep the agent useful without turning the runtime into an open relay requires thinking about egress as a structured, policy-aware layer rather than a flat allow-or-deny firewall.

Why doesn't a static allowlist work for agent runtimes?

Static allowlists are the default starting point for any security team because they are simple to reason about: enumerate the destinations the workload needs, deny everything else, done. For an agent that has to fetch arbitrary URLs to answer a research question, query unfamiliar APIs in support of a user's task, or read documentation hosted on whatever site the user mentioned, that approach collapses on the first real workload. The team either keeps adding domains to the list every day, eroding the security value, or the team holds the line and finds that the agent cannot complete most of the work it was deployed for.

A second problem is granularity. A backend service that needs to talk to api.example.com talks to a specific endpoint with a specific method and a stable shape; an agent that reaches api.example.com might do so for a hundred different purposes, some legitimate and some adversarial. A flat allowlist that permits api.example.com permits all of them equally, including the call that an injected prompt persuaded the agent to make. Real safety here requires either narrower allowlists at the API path level or a layer above the network that understands the semantics of the call.

The third problem is that allowlists by themselves do not address exfiltration through allowed destinations. If api.example.com is allowed and the API accepts a free-form text field that gets logged or surfaced somewhere the attacker can read, an injected prompt can use that field as a covert channel. A defense that stops at "is this destination allowed?" misses the whole class of exfiltration that uses legitimate destinations for illegitimate purposes.

What does a signed-context proxy add that a firewall doesn't?

A signed-context egress proxy sits between the agent and the network and makes policy decisions based on more than the destination. The agent's request reaches the proxy with metadata describing the agent's session, the user who initiated it, the provenance of the immediately upstream context, the tool through which the call is being made, and a recent reasoning summary. The proxy authenticates this metadata — it is signed by the agent runtime, not generated by the agent itself — and uses it to decide whether the request matches the policy for that session.

The pattern unlocks several things a plain firewall cannot do. The proxy can permit calls to a class of destinations only when the session originated from a particular user role; it can require that any call carrying a payload above a size threshold trace back to a first-party user instruction rather than to third-party document content; it can rewrite or strip headers based on which tool is making the call. None of these decisions are expressible in IP-and-port firewall rules, but all of them are routine for HTTP-aware policy engines that have signed metadata to consume.

The other thing the proxy provides is a chokepoint for logging. Every outbound request flows through a single point that knows the full call context, which makes the resulting log far richer than a netflow record. When an incident requires answering "what did the agent send out, to whom, in response to what, with which user's authority?" the proxy log answers the question directly. Without it, the investigation is a reconstruction across half a dozen partial sources.

How do you catch DNS-based exfiltration before the data leaves?

DNS is the favorite covert channel for any defender's least favorite attack: small amounts of data, encoded into subdomain queries, sent to an attacker-controlled nameserver, never blocked because DNS is universally allowed. Agent runtimes are particularly attractive targets because the agent often generates the strings that go into outbound calls and can be tricked into emitting structured data into a DNS name. A single innocuous-looking instruction can produce a sequence of queries to subdomains of an attacker domain, each carrying a few bytes of a session secret.

The detection pattern is to put the runtime's DNS resolution behind a controlled resolver that scores every query against a small set of signals. High entropy in the leftmost labels, queries to recently registered domains, queries with unusually long names, repeated queries from the same session to subdomains of a single parent — none of these is conclusive on its own, but in combination they catch the canonical exfiltration profile. The resolver does not have to block on a single signal; it can quarantine the agent for review when the score crosses a threshold and let benign traffic through unmolested.

The structural defense complements the detection. Agent runtimes that resolve DNS only through the controlled resolver, and that cannot fall back to system DNS or to an external resolver embedded in a library, narrow the exfiltration channel to the surface the defender can see. Containers that explicitly remove or override /etc/resolv.conf and that block UDP/TCP port 53 to anything other than the resolver give the runtime no path around the controls. This is plumbing rather than novel defense, but it is the plumbing that determines whether the DNS controls are real or theatrical.

How do you tune controls without breaking the agent?

The most common failure mode of any egress regime is over-blocking. A control that fires too often gets disabled by the people it inconveniences, and the resulting bypass is usually worse than the original threat. The path to tuning is observability before enforcement: run the policy in a logging-only mode for long enough to see what it would have blocked, segment those events into the ones that look like attacks and the ones that look like legitimate work, adjust the policy to permit the latter, and only then turn enforcement on.

The second tuning lever is granularity. A policy that distinguishes between session types — interactive user sessions versus background batch jobs versus scheduled reports — can be tight where tightness is cheap and loose where looseness is necessary. A background job that should never read external documents can have a much more aggressive egress policy than an interactive session that helps a user research a topic; treating both the same is the easy mistake that drives operators toward the more permissive setting.

The third lever is feedback. When the policy blocks a request, the agent should know it was blocked, the user should know in terms they can act on, and the policy team should get a structured event they can review. A block that returns a generic timeout is worse than useless, because the agent's next attempt will look like a normal retry and the policy team will never hear about it. Closing the feedback loop is what lets a security team and an agent team converge on a configuration that both can live with.

How Safeguard Helps

Safeguard puts a signed-context egress layer in front of agent runtimes and turns the network surface into a place where policy can actually be enforced. Griffin AI evaluates every outbound request with full session context — provenance of upstream inputs, tool identity, user role, recent reasoning trace — so the policy decision depends on more than the destination IP. MCP server security policies and agent guardrails let teams compose allowlists, payload-size thresholds, and tool-specific egress rules without writing one-off code, and the DNS resolution path can be funneled through a controlled resolver that scores queries for exfiltration patterns. Runtime egress monitoring captures the full record of what each agent sent out, in response to what, under whose authority, so investigations are evidence-driven from the start. To shape an egress posture that fits your agent deployment, get in touch with our team.

agent runtime egress controls ai security data exfiltration network policy

Back to all articles

More on #agent runtime

View all

AI Security

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.

Network egress controls for autonomous agent runtimes

Why doesn't a static allowlist work for agent runtimes?

What does a signed-context proxy add that a firewall doesn't?

How do you catch DNS-based exfiltration before the data leaves?

How do you tune controls without breaking the agent?

How Safeguard Helps

More on #agent runtime

Keeping secrets out of agent context windows: brokers, scoped tokens, and redaction

Detecting shadow MCP servers in developer environments

Defending LLM agents against confused-deputy attacks on their tool privileges

Related articles in AI Security

Daybreak vs. Mythos: 2026 Is the Year the Frontier Labs Entered Defensive Security

Patch the Planet: What AI-Generated Fixes Actually Mean for Open-Source Maintainers

OpenAI's Daybreak: An Honest Assessment of Codex Security, GPT-5.5-Cyber, and the Find-Validate-Patch Loop

Never miss an update