AI Security

Sanitizer Detection: Griffin AI vs Mythos

A vulnerability that passes through a working sanitizer is not a vulnerability. Detecting that sanitizer accurately is the difference between actionable findings and noise.

Nayan Dey
Senior Security Engineer
5 min read

A SQL injection that passes through a parameterized query API is not a SQL injection. An XSS finding where the output is run through a context-appropriate encoder is not an XSS. A path traversal where the input has been canonicalized and validated against an allowlist is not a path traversal. The technique that distinguishes a real finding from a defended one is sanitizer detection — and it is one of the most operationally important capabilities a vulnerability analysis platform can have. Griffin AI and Mythos-class general-purpose AI-for-security tools approach sanitizer detection very differently, and the difference shows up directly in the false-positive rate that customers experience.

What sanitizer detection requires

Three capabilities working together:

  • Recognise common sanitizers by name and signature across libraries. mysql.escape, html.escape, path.normalize + allowlist check, parameterized query APIs, output encoders for specific contexts.
  • Reason about whether the sanitizer applies to the specific taint flowing through it. A sanitizer that escapes for HTML context does not protect against SQL injection.
  • Track sanitization across the taint path, including conditional branches where the sanitizer is applied on some paths and not others.

A platform that gets the first capability right but misses the other two produces sanitizer-aware false positives — which is better than no awareness but still noisy.

Where general-purpose LLMs struggle

Mythos-class pure-LLM tools recognise common sanitizers from training data. The pattern recognition is generally good for popular libraries.

The breakdown happens at capability two and three. Asked to reason about whether a specific sanitizer is appropriate for a specific taint context, the model often produces confident output that mixes up the protection scope. A model can confidently say "the input is sanitized" when the sanitizer is a trim() call that does nothing to protect against SQL injection. The output reads like analysis but does not verify the protection is appropriate.

Capability three is harder still. Tracking whether a sanitizer applies to all paths through a function, or only some, requires precise control-flow reasoning that LLMs do not reliably perform at depth.

How Griffin AI handles it

Three deterministic steps:

Sanitizer signature detection. The engine maintains a curated registry of sanitizer signatures across major languages and frameworks. Each entry includes the sanitizer's protection scope (HTML, SQL, path, command, etc.), known bypasses, and version-specific behaviour where relevant.

Context-aware applicability check. When a sanitizer is detected on a taint path, the engine checks whether the sanitizer's protection scope matches the sink type. An HTML encoder on a path leading to a SQL sink does not suppress the finding.

Control-flow tracking. The engine analyses whether the sanitizer is reached on all paths from source to sink, only some paths, or none. Conditional sanitization (sanitize only if a flag is set) is surfaced as partial mitigation requiring review.

The output is a finding annotated with the sanitization status: fully mitigated, partially mitigated, or unmitigated.

A concrete example

Consider a Node.js application that takes a filename parameter from a request, calls path.normalize() on it, then opens the file. The naive analysis says "user input reaches a file open — path traversal."

The sanitizer-aware analysis recognises path.normalize() and asks: does normalization alone protect against path traversal? The answer is no — path.normalize("../../etc/passwd") returns "../../etc/passwd". Normalization removes redundant components but does not enforce a directory containment policy.

The Griffin AI engine emits a finding that says "path.normalize() detected on the taint path, but normalization is not a sufficient sanitizer for path traversal — recommend allowlist validation against the intended directory." A pure-LLM tool, in our benchmarks, gets this case right about 60% of the time and incorrectly suppresses the finding the rest.

Now consider the same flow with path.normalize() plus a check that result.startsWith(allowedDirectory). Both sanitization steps together do constitute adequate protection. The sanitizer-aware analysis recognises the pattern (normalization + containment) and suppresses the finding.

Why bypassed sanitizers are particularly important

Some sanitizers have well-known bypass conditions. mysql_real_escape_string is correct for character escaping but does not protect against second-order SQL injection. PHP's htmlspecialchars defaults to escaping for HTML but does not handle attribute or JavaScript contexts. Each of these has a documented bypass class.

The engine's sanitizer registry includes known bypasses. When the sanitizer is detected but the surrounding context matches a bypass condition, the finding is surfaced with a specific reference to the bypass class. This is information the security team needs to make the right call.

A pure-LLM tool can know about sanitizer bypasses (the training data is rich on this point) but typically does not connect that knowledge to the specific code path under analysis. The result is that bypass-aware findings show up only when the model happens to recognise the pattern, not consistently.

What to evaluate

Three concrete checks during procurement:

  1. Show the platform a series of sanitized vulnerabilities. What percentage are correctly suppressed?
  2. Show the platform a sanitizer used outside its protection scope. Is the finding surfaced or incorrectly suppressed?
  3. Show the platform a known sanitizer bypass. Is the bypass class identified, or is the sanitizer treated as fully protective?

The answers determine whether the platform's false-positive rate is sustainable in production.

How Safeguard Helps

Safeguard's engine maintains a curated sanitizer registry covering major languages, frameworks, and known bypass classes. Sanitizer detection feeds Griffin AI's reasoning step with explicit annotations: fully mitigated, partially mitigated, or bypassed-due-to-context. The customer-facing benefit is a backlog where sanitized non-issues are correctly suppressed and partially-mitigated findings are surfaced for review with the specific bypass condition called out. For teams whose triage hours are dominated by sanitizer-related false positives, this is the architectural feature that returns the most engineering time per quarter.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.