Use Case · Prompt-Injection Defense

Defend Agentic Workflows Against Prompt Injection.

Inline egress detection with Lino, MCP-server tool-input inspection, and adversarially-trained Griffin variants. Plus a public prompt-injection eval suite and customer-specific red-teaming so detection rates stay accurate against tomorrow's payloads.

98%

Public-Suite Detection Rate

<40 ms

Lino Egress Latency

MCP

Tool-Input Inspection Plane

Continuous

Adversarial Retraining

Agentic LLMs Take Untrusted Text As If It Were Instructions.

Every tool-using LLM ingests text the developer did not write — search results, customer messages, scraped pages, PDF attachments. Each of those channels is a prompt-injection surface, and most agentic deployments have no inspection plane on either ingress or egress.

Static system-prompt hardening helps marginally; attackers iterate against publicly known guardrails the same week they ship. Defence has to evolve faster than the attack corpus, which means: inline detection on every model call, structured inspection at the tool-use boundary, and an adversarial training loop on the defending model.

Safeguard pairs Lino (inline on egress and ingress) with the MCP-server (structured tool-input inspection) and an adversarially-trained Griffin variant — measured against a public eval suite and your own red-team payloads on every release.

Untrusted-Text Channels Multiply

Email, search, attachments, transcripts, third-party APIs — every channel feeding the agent is a candidate injection surface. The set grows faster than any allow-list can keep up.

Tool Use Amplifies Blast Radius

An agent that can write to your database, send email, or trigger external actions converts a successful injection into real-world damage in one step.

Static Guardrails Decay

A fixed system prompt or rule-based filter is reverse-engineered within days of disclosure. Continuous training against fresh adversarial corpora is the only durable defence.

Detection Without Egress Is Blind

Filtering only the user input misses second-order injections — the case where the agent ingests poisoned text mid-session. Egress inspection is non-optional.

What It Does

Inline Detection, Adversarial Training, Live Evals.

Lino On Ingress And Egress

Every prompt and every response is scored for injection-shape in under 40 ms by the on-device Lino classifier; high-confidence cases gate or rewrite before the call completes.

MCP-Server Tool Inspection

Tool inputs flowing through the Safeguard MCP-server are inspected before dispatch; destructive tool calls require an explicit non-injected confirmation token.

Adversarially-Trained Griffin

A Griffin variant is trained continuously against fresh adversarial corpora plus customer red-team payloads; deployed as the second-pass verdict on borderline cases.

Public Eval Suite + Private Red-Team

The open prompt-injection eval suite runs on every release; customer-specific red-team payloads run on customer tenants. Detection rates are public per release tag.

The Pipeline

Per-Call Defence Plus Per-Release Retraining.

Lino ingress score

Every prompt and every retrieved context chunk scored for injection-shape in <40 ms on the inference path.

MCP-server tool inspection

Tool calls dispatched through the MCP-server have arguments structurally inspected before the tool runs.

Griffin second pass

Borderline cases routed to an adversarially-trained Griffin variant for a deeper verdict; sub-second on the critical path.

Block / rewrite / proceed

Verdict drives one of three actions: block, rewrite to remove the injected directive, or proceed with logging.

Egress check

Model output re-scored before leaving the boundary; second-order injections caught even if ingress missed them.

Telemetry → retraining

Blocked attempts and false-positive reports feed the adversarial training corpus; new model snapshot ships against the public eval suite weekly.

What Lands In Your Agent Stack.

Detection Stays Current

Weekly model snapshot vs public eval

Customer red-team payloads in the loop

Detection rate published per release

Tool-Use Boundary Hardened

MCP-server inspects every tool call

Destructive actions require non-injected confirm

Argument-shape policy enforced inline

Visibility For The Team

Per-attempt audit log

False-positive feedback channel

Tenant-tunable thresholds per agent

See Lino, the MCP-server, the Griffin family, plus ai-governance and guardrails-and-enforcement for the governance plane.

Run The Eval Suite Against Your Agent.

Bring a target agent; we'll fire the public corpus plus a customised payload set and ship you the detection-rate report.