Architecture

Griffin Agent Loop: Design Decisions

The design rationale behind Griffin, Safeguard's triage agent — how the loop is structured, why we bounded reasoning depth, and how tool calls stay auditable.

Griffin is the agent that sits beside a security engineer on Safeguard, answering questions like "why did this finding fire" and "what would it take to fix this across my fleet." It looks like a chat interface but the interesting work is in the loop behind it — how it plans, which tools it calls, how it decides when to stop, and how the whole thing stays auditable for a compliance review. This post is a tour of the design choices we made and the ones we explicitly rejected.

Why build a bespoke agent loop instead of using a framework?

We spent about two months early in the project with LangChain and then with a couple of alternatives before concluding we had to build our own. The frameworks were good starting points for prototyping but they hid two things we could not afford to leave hidden: the exact sequence of tool calls taken for any given response, and the decision points at which the agent could have done something else. Auditability is not optional in a security product — when a customer asks "why did Griffin recommend this remediation," we need a reproducible trace, not a summarized stack.

The other problem was determinism under bounded cost. We need every Griffin response to come back within a user-facing latency budget and within a per-request token budget. Off-the-shelf loops allow unbounded tool-call chains by default, and the customizations needed to bound them cleanly ended up being invasive enough that writing the loop ourselves was cleaner.

What is the shape of the loop?

The loop is a plan-execute-verify cycle with explicit state checkpoints. Below is the high-level flow:

          ┌──────────────────┐
          │  User Question   │
          └────────┬─────────┘
                   │
                   ▼
          ┌──────────────────┐
          │     Planner      │ ◀───── tenant context,
          │  (structured)    │        history, policy
          └────────┬─────────┘
                   │ plan (steps)
                   ▼
          ┌──────────────────┐
          │    Executor      │ ◀───── tool registry,
          │  (one step/turn) │        quotas, safety
          └────────┬─────────┘
                   │ tool result
                   ▼
          ┌──────────────────┐
          │     Verifier     │ ◀───── ground-truth
          │   (structured)   │        graph snapshot
          └────────┬─────────┘
                   │
              ┌────┴────┐
              │         │
            done?     retry /
              │       replan
              │
              ▼
     ┌──────────────────┐
     │     Response     │
     │    Synthesizer   │
     └──────────────────┘

The Planner takes the user question and produces a structured plan — a list of tool calls with their expected inputs and outputs, plus branching conditions. The Executor runs one step at a time, capturing the actual tool result, and hands back to the Verifier. The Verifier asks "does the plan still make sense given this result, and are we closer to a grounded answer?" If yes, continue. If the plan is broken, replan. If the budget is spent, answer with what we have.

Crucially, the Planner and Verifier are separate LLM calls with distinct prompts. That separation is what lets us spot a plan that is not making progress — if the Verifier keeps saying "replan" without new information, we escalate to a bounded human fallback rather than looping forever.

What tools does Griffin actually have?

The tool set is deliberately small. Too many tools creates selection ambiguity; too few creates cases where the agent has to reason around a tool it does not have. We currently ship nine tools:

graph.query — run an SGQL query against the tenant's knowledge graph
graph.traverse — follow a path from a node to a target class
finding.lookup — fetch a specific finding by ID with full context
policy.evaluate — run a named policy against a hypothetical input
remediation.suggest — ask the remediation service for suggestions given a finding
vuln.details — pull vulnerability details from our advisory store
runtime.evidence — query runtime trace evidence for an artifact
audit.log_access — record that the agent accessed a record (for compliance)
human.escalate — hand off to a human operator with context

Every tool has a strict JSON schema on its inputs and outputs, and the Executor validates both sides. A tool call with malformed inputs gets rejected locally and the Planner sees a typed error rather than a silent empty result. This pattern catches about 12 percent of what would otherwise be confusing responses during development.

How do you bound reasoning depth?

We use three complementary bounds. The first is depth — any single question is capped at a maximum of twelve executed steps. Around step eight the Verifier prompt shifts tone from "continue planning" to "converge on an answer with what you have." Empirically, questions that cannot be answered in eight steps are questions that need human help, not more looping.

The second is token budget. We track input and output tokens per request and hard-cap at a configurable tenant policy (default 100k tokens). When we hit 80 percent of the budget, the Planner is told to prune uncompleted branches and consolidate.

The third is confidence gating. The Verifier emits a confidence score with each turn. If confidence has not improved by more than a small delta over three consecutive turns, we stop and hand back what we have with an explicit "I was not able to fully answer this" header. Users have told us repeatedly that they prefer a truthful "I do not know" to a confident-sounding guess, so we invested in making the low-confidence signal honest.

Here is the shape of a Verifier output:

{
  "state": "continue" | "converge" | "replan" | "give_up",
  "confidence": 0.74,
  "confidence_delta": 0.12,
  "evidence_remaining": [
    "need to confirm reachability status for Finding:f-8819"
  ],
  "suggested_next_step": {
    "tool": "graph.query",
    "args": { ... }
  },
  "reasoning_trace": "..."
}

The reasoning_trace is stored in the per-conversation audit log but is not surfaced to the user by default. Customers with stricter audit requirements can expose it.

How do you keep the tool calls auditable?

Every tool call Griffin makes is recorded in an immutable audit stream keyed by conversation ID. The record includes the exact tool name, the exact inputs (with sensitive values redacted to policy), the exact outputs, the prompt hash of the Planner/Verifier call that led to it, and the wall-clock timestamp. This stream is exportable in CSV, JSON Lines, and OSCAL formats, which matters for customers whose compliance framework is a specific expected shape.

The reason we include the prompt hash rather than the full prompt is that the prompts themselves are considered platform IP and we do not want them exfiltrated via a ticket export. The hash lets us reconstruct the prompt server-side if there is ever a dispute, without leaking it. For FedRAMP HIGH tenants where the audit requirements are stricter, we support exporting the full prompt into an audit vault that lives in the same enclave as the rest of the tenant data.

We also emit a structured event on every Planner decision — what it considered, what it chose, and what it discarded. This is invaluable during post-incident review. If a customer's engineer says "Griffin missed this finding," we can walk back the conversation and see whether the Planner even surfaced that finding as a candidate.

How does Griffin stay inside the tenant boundary?

Every Griffin request carries a tenant context that is injected as a system-level claim into every tool call, and the tool servers enforce the claim at the API gateway — not inside the agent. If Griffin were compromised (a prompt injection, a malicious tool response, a model error), it could not reach outside the tenant's data because the authorization is structurally below the agent, not enforced by it. This is the same pattern we use everywhere in the platform: agents are untrusted callers, not privileged actors.

We also run Griffin with a read-mostly tool set by default. Of the nine tools above, only audit.log_access and human.escalate produce side effects, and those are explicitly safe — one writes to an append-only log, the other sends a notification. Write operations (filing a ticket, applying a remediation, changing a policy) happen only when a human explicitly approves Griffin's suggestion in the UI, after which the UI issues the write directly with the user's credentials. Griffin never writes on its own behalf.

How Safeguard.sh Helps

Griffin pairs a senior engineer with an assistant that understands the full Safeguard knowledge graph, the tenant's policies, and the runtime state of their fleet, and it does so inside a loop that is bounded, verifiable, and auditable end to end. The design favors correctness over cleverness — we prefer the agent to admit it does not know, escalate to a human, or stop early rather than hallucinate. For teams that want to use AI to accelerate triage without giving up the ability to explain every decision to an auditor, Griffin is the pattern we have found works in practice. You can try it in any Safeguard workspace.

griffin ai-agents agent-loop tool-calling architecture

Back to all articles