AI Security

Audit Trail Quality: Griffin AI vs Mythos

An audit trail is only useful if you can answer questions from it. Quality is not about volume — it's about the ability to reconstruct decisions after the fact.

Shadab Khan
Security Engineer
4 min read

Most AI-for-security tools produce audit logs. The volume is usually impressive. The quality — the ability to answer specific questions after the fact — varies dramatically. A trail that logs every API call but cannot reconstruct why a specific finding was surfaced, deprioritised, or closed is not a useful audit trail. Griffin AI and Mythos-class general-purpose tools diverge sharply on audit quality, and the difference shows up when an incident requires post-hoc investigation.

What good audit trails answer

Five questions a mature security program asks from its audit trail:

  • Why was this finding generated? What inputs triggered it, what analysis ran, what decision was made.
  • Why was this finding deprioritised or closed? What reviewer, what rationale, what evidence attached.
  • What changed about this finding over time? Version history — state transitions, reassignments, policy-evaluation deltas.
  • Who accessed this data, when, and why? User actions, API calls, automated processes.
  • What was the platform's state at a specific moment? Reconstruction of the world as of a given timestamp.

Audit trails that answer all five are operationally useful. Trails that answer some produce investigation friction.

Where the engine-plus-LLM architecture helps

Two structural advantages:

Deterministic engine outputs are reconstructible. The engine's call graph, taint analysis, and reachability computation are deterministic for a given codebase version. Given the same inputs, the same outputs are produced. Audit reconstruction is a matter of replay.

Griffin AI reasoning is logged at decision points. Every LLM reasoning step is logged with the input brief, the model response, and the resulting decision. The logs include enough context to understand why the model produced the output, not just that it did.

Together these produce audit trails that answer all five questions above.

Where pure-LLM tools land

Mythos-class tools often log at the API boundary — the call went in, a response came out. The internal reasoning of the model is not captured. Asked "why was this finding flagged this way?", the answer is "the model said so."

This is not useless but it is insufficient for serious post-hoc investigation. The customer's own review cannot reconstruct the decision; they can only know the decision was made.

A concrete example

Six weeks after deployment, a customer's compliance team asks: "Show us the audit trail for finding F-2024-847. We need to explain to an auditor why this finding was closed as a false positive three weeks ago."

Griffin AI's response: a complete trail showing the original finding detection (with call-graph reference), the reviewer assignment, the reviewer's deprioritisation with rationale, the supporting evidence attached (specific sanitizer detected, VEX statement applied), and the closure record. Every transition has a user, timestamp, and decision rationale.

Mythos-class response varies. Commonly: "the finding was reviewed and closed by User X on date Y." The rationale behind the closure is not in the trail; the investigator would need to interview the reviewer, who may have left the company.

What the audit trail should record

Seven fields per event:

  1. Event ID (stable, sortable, globally unique).
  2. Timestamp with timezone.
  3. Actor (user, service account, automated process).
  4. Action (finding.generated, finding.reassigned, finding.closed).
  5. Subject (which finding, package, rule, etc.).
  6. Input hash (what data was used to make the decision).
  7. Decision rationale (why this action was taken).

The seventh field is where tools differentiate. Without it, audit trails are event streams that tell you what happened but not why.

Retention and access control

Two operational dimensions:

  • Retention period. Customer-configurable per event type. SOC 2, HIPAA, and EU AI Act have different retention expectations.
  • Access control. Audit trail access should be logged (meta-audit). Bulk export should require additional approvals. Legal-hold flags should prevent deletion regardless of retention policy.

Griffin AI ships all three as configurable defaults. Mythos-class tools vary.

What to evaluate

Three concrete checks:

  1. Walk through an audit-trail query for a specific finding. Can you reconstruct the decision chain end-to-end?
  2. Test legal hold: does the platform prevent deletion of flagged events even past retention?
  3. Query audit access: who looked at these records last month?

The answers determine whether the audit trail supports the investigations the program actually needs.

How Safeguard Helps

Safeguard's audit trail records every finding lifecycle event with full decision rationale, input references, and reviewer context. The engine's deterministic outputs support full replay; Griffin AI's reasoning steps are logged with input briefs and responses. Legal-hold flags, per-event-type retention, and access logging ship as configurable defaults. For organisations whose compliance posture depends on answering "why did this decision get made" months or years after the fact, the trail quality is the architectural property that matters.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.