Inline egress detection with Lino, MCP-server tool-input inspection, and adversarially-trained Griffin variants. Plus a public prompt-injection eval suite and customer-specific red-teaming so detection rates stay accurate against tomorrow's payloads.
Every tool-using LLM ingests text the developer did not write — search results, customer messages, scraped pages, PDF attachments. Each of those channels is a prompt-injection surface, and most agentic deployments have no inspection plane on either ingress or egress.
Static system-prompt hardening helps marginally; attackers iterate against publicly known guardrails the same week they ship. Defence has to evolve faster than the attack corpus, which means: inline detection on every model call, structured inspection at the tool-use boundary, and an adversarial training loop on the defending model.
Safeguard pairs Lino (inline on egress and ingress) with the MCP-server (structured tool-input inspection) and an adversarially-trained Griffin variant — measured against a public eval suite and your own red-team payloads on every release.
Email, search, attachments, transcripts, third-party APIs — every channel feeding the agent is a candidate injection surface. The set grows faster than any allow-list can keep up.
An agent that can write to your database, send email, or trigger external actions converts a successful injection into real-world damage in one step.
A fixed system prompt or rule-based filter is reverse-engineered within days of disclosure. Continuous training against fresh adversarial corpora is the only durable defence.
Filtering only the user input misses second-order injections — the case where the agent ingests poisoned text mid-session. Egress inspection is non-optional.
Every prompt and every response is scored for injection-shape in under 40 ms by the on-device Lino classifier; high-confidence cases gate or rewrite before the call completes.
Tool inputs flowing through the Safeguard MCP-server are inspected before dispatch; destructive tool calls require an explicit non-injected confirmation token.
A Griffin variant is trained continuously against fresh adversarial corpora plus customer red-team payloads; deployed as the second-pass verdict on borderline cases.
The open prompt-injection eval suite runs on every release; customer-specific red-team payloads run on customer tenants. Detection rates are public per release tag.
Every prompt and every retrieved context chunk scored for injection-shape in <40 ms on the inference path.
Tool calls dispatched through the MCP-server have arguments structurally inspected before the tool runs.
Borderline cases routed to an adversarially-trained Griffin variant for a deeper verdict; sub-second on the critical path.
Verdict drives one of three actions: block, rewrite to remove the injected directive, or proceed with logging.
Model output re-scored before leaving the boundary; second-order injections caught even if ingress missed them.
Blocked attempts and false-positive reports feed the adversarial training corpus; new model snapshot ships against the public eval suite weekly.
See Lino, the MCP-server, the Griffin family, plus ai-governance and guardrails-and-enforcement for the governance plane.
Bring a target agent; we'll fire the public corpus plus a customised payload set and ship you the detection-rate report.