← Concepts & Glossary
Detection & Analysis

Taint Analysis

Tracing untrusted input from source to sink to find exploitable data flows.

What is taint analysis?

Taint analysis is a classical program analysis technique that marks data from untrusted sources as "tainted" and then follows that data as it flows through the program. If tainted data reaches a sensitive operation — a database query, a shell command, a file path — without being sanitised along the way, the analyzer flags an exploitable data flow.

It is the technique behind most high-confidence SQL injection, command injection, path traversal, and SSRF findings. Unlike pattern scanners, taint analysis proves that a concrete input path exists from an attacker-controlled surface to a dangerous sink — which is exactly what a reviewer wants to see before triaging a finding as real.

How it works

The analysis is built around three concepts — sources, sinks, and sanitizers:

  1. Sources. Points where untrusted input enters the program — HTTP request parameters, headers, cookies, file uploads, database rows, message queue payloads. Every source is the root of a potential taint flow.
  2. Sinks. Operations where tainted input causes damage — eval, raw SQL concatenation, shell execution, file-system APIs, deserialization, template rendering. Each sink has a taint signature that tells the engine what kind of flow would be exploitable.
  3. Propagation and sanitizers. The engine walks the call graph and propagates taint through assignments, function returns, and structure fields. When it encounters a known sanitizer — parameterised queries, HTML escapers, validated allowlists — the taint is cleared for that downstream path. Unsanitised flows from source to sink become findings.

Why it matters

Most commercial SAST tools lean heavily on pattern matching. They find uses of dangerous APIs and flag them regardless of whether attacker-controlled data can actually reach them. The result is a backlog of mostly-false-positive findings and an engineering team that stops reading the alerts.

Taint analysis inverts that. It produces fewer findings, each one backed by a specific data flow you can replay in a debugger. That is the difference between a scanner that annoys engineers and an analyzer that changes how code gets shipped.

What value it adds

  • Deterministic exploit paths

    Every finding ships with a source-to-sink trace. A developer can see exactly how input travels and where to break the flow.

  • Drastically fewer false positives

    Pattern scanners flag every risky API call. Taint analysis only flags the ones reachable from untrusted input — usually 5–10x fewer findings.

  • Sanitizer-aware

    If the team already routes input through a validator, the engine understands that and stops flagging the sanitised flow. Existing defenses actually count.

  • Catches injection classes that patterns miss

    Second-order SQLi, template injection through config, and SSRF via indirect URL construction are taint problems by nature — and invisible to grep-style rules.

  • Feeds downstream automation

    Structured taint paths are the right input for remediation PRs, policy gates, and AI-generated fixes. Findings without traces cannot drive automation.

How Safeguard uses it

Taint analysis is the core of Safeguard's reachability engine and the structured signal behind zero-day discovery. Every taint path becomes evidence that Griffin AI uses to reason about exploitability and draft remediations.

See taint paths in your code.

Point Safeguard at a repo. Get a list of source-to-sink flows back. Compare it to your current SAST output.