Taint analysis is only as accurate as its source and sink classification. A source is a program location that introduces untrusted data; a sink is a location where untrusted data becomes dangerous. Misclassify either and the entire analysis falls apart: miss a source and you call a real exploit unreachable; miss a sink and you call a harmless path dangerous. This post compares how Griffin AI classifies sources and sinks using a curated catalog and how Mythos-class pure-LLM tools try to infer the same classifications on the fly.
Why classification matters
Classification sits at the foundation of reachability reasoning. The call graph tells you whether paths exist; the taint graph tells you whether attacker-controlled data flows along those paths; but both of those answers depend on knowing where the attacker-controlled data enters and where it becomes dangerous. A path from a logging API to a log file is structurally similar to a path from an HTTP request to a database write, but only one of them matters for injection. The difference is in the labels.
The CWE catalog is effectively a taxonomy of source-sink combinations. CWE-89 maps untrusted input to SQL execution; CWE-79 maps untrusted input to HTML rendering; CWE-918 maps untrusted input to outbound fetches; CWE-502 maps untrusted bytes to deserialization; CWE-611 maps untrusted XML to external entity resolution; CWE-22 maps untrusted paths to file system access. Each CWE is a pair of labels the classifier must get right.
Griffin AI's curated catalog
Griffin AI maintains an explicit catalog of sources and sinks for every supported language and framework. Sources include HTTP request components (body, query, path, headers, cookies), message queue payloads, file inputs where the path is attacker-influenced, environment variables with known user influence, serialized state from external systems, and command-line arguments. Sinks include database execution functions, shell execution, template rendering, HTML output, cryptographic key materials, file system writes to privileged locations, network fetches, deserialization functions, and many others.
Each catalog entry is structured. It names the specific function or method, specifies which arguments are the tainted inputs, and records the taint class (user-provided, attacker-controlled, partially trusted). The catalog is versioned per library and per framework so that a sink that was introduced in a specific version is not reported for code that uses an earlier version. The catalog is also extensible: teams can add internal sinks for their custom abstractions, and the additions are part of the policy configuration.
Classification is not a separate LLM call; it is a graph annotation. The taint engine looks up the call site in the catalog, marks the arguments or return values accordingly, and propagation proceeds. The LLM's reasoning consumes the labeled graph rather than re-deriving the labels, which keeps it fast and deterministic.
Griffin's 2026 Q1 benchmark recorded 96 percent classification precision on a 312-item source-sink validation set across JavaScript, Python, Java, and Go codebases. The precision gain came mostly from the catalog's ability to distinguish sinks by specific argument (for example, classifying exec(cmd, args) differently for the cmd argument than for the args argument).
Mythos-class inference
Mythos-class pure-LLM tools do not carry an explicit catalog. Classification is part of the LLM's reasoning: the model reads a function name and arguments, infers whether it is a source or a sink, and proceeds. This works well for canonical names (exec, query, fetch, innerHTML) and struggles with everything else.
The failure modes are predictable. The first is the renamed export problem. A library wraps child_process.exec in a helper called runCommand, and the helper is re-exported with another name in a second-order dependency. The LLM sees the second-order name and does not recognize the sink underneath. The second is the internal abstraction problem. A team has an internal db.execute wrapper that the LLM may or may not classify correctly depending on the name. The third is the partial-trust problem, where a source is tainted along one argument but sanitized along another, and the LLM does not distinguish.
Each failure mode produces either false positives or false negatives. In practice, Mythos-class tools tend toward false positives on classification because the LLM is biased toward flagging things that look suspicious. This is the opposite bias from call graph analysis, where the LLM tends toward false negatives because it cannot see the path. The combination produces a noisy, uncalibrated output.
A worked example
Consider a Python service that uses a custom ORM wrapper. The wrapper exposes a repo.run(query, params) method that, internally, calls cursor.execute(query, params). Some codepaths use repo.run_raw(sql), which skips parameter binding and passes the SQL directly.
Griffin's catalog has an entry for repo.run_raw classified as a SQL-injection sink on its first argument, and an entry for repo.run classified as safe when the second argument is used as parameters. The taint engine traces the call and applies the correct classification at each site. A finding that reaches repo.run with tainted SQL but untainted params is not raised; a finding that reaches repo.run_raw with tainted SQL is raised with a specific CWE-89 citation.
A Mythos-class tool sees repo.run and repo.run_raw as similar names. It may flag both or flag neither, depending on how its retrieval surfaces the helper implementations. If it looks at repo.run_raw in isolation, it may miss the absence of parameter binding; if it looks at repo.run, it may incorrectly flag it because it looks like an exec-style API. The output is neither precise nor stable across runs.
The extensibility question
Enterprise codebases have their own sources and sinks. A security team may define an internal permissions API whose functions become authorization sinks: any reachable call with a tainted permission string is a finding. A data engineering team may define internal data access APIs that should be treated as sensitive sinks. Griffin's catalog is extensible through a policy file: a team can add entries for its internal abstractions, and every subsequent analysis respects them.
Mythos-class tools do not offer a comparable extension surface. Some tools accept prompts that describe custom patterns, but the LLM applies the prompt inconsistently and forgets it across sessions. The only durable way to capture team-specific classifications is through a structured catalog, and the only tools that have one are the grounded ones.
Sanitizer labeling
Sinks are only half the picture. Griffin also classifies sanitizers: functions that neutralize specific taint classes. A SQL parameter binder neutralizes SQL-injection taint on the bound arguments but not on the query string. An HTML escaper neutralizes XSS taint on the escaped text but not on attribute contexts that require a different escaper. The taint engine applies the sanitizer effects deterministically, which is the only way to get accurate verdicts on paths that cross validation layers.
Mythos-class inference of sanitizers is especially weak because the sanitizer's effect depends on the specific context. A function called escape might neutralize XSS for HTML text but not for JavaScript event handlers. The LLM cannot tell without seeing the implementation, and the implementation is rarely in the retrieved context.
Labeling as an auditable artifact
Because Griffin's classifications are catalog-backed, every finding can cite the specific catalog entry that produced it. Auditors asking "why did the system classify this as a sink?" get a structured answer: "This function matches entry 14 in the SQL-sink catalog for version 2.3.1 of this library." That traceability is the difference between a finding a security team can defend in a regulated environment and a finding that reads as opinion.
How Safeguard Helps
Safeguard ships Griffin AI's classification catalog and exposes it for extension through the policy editor. Teams can see every source and sink the analyzer recognizes, add their own internal abstractions, and pin specific versions. Every reachability finding links back to the catalog entries it used, which makes review and audit straightforward. If your current tool is inferring classifications ad hoc and producing inconsistent findings, Safeguard's catalog-backed workflow will stabilize the queue.