Path Traversal: Griffin AI vs Mythos
Every developer has written the defensive snippet: take user input, append it to a base directory, reject anything containing ... Every security engineer has watched that snippet fail because the attacker sent %2e%2e, or ..%2f, or a URL-encoded null byte, or a Unicode homoglyph, or an absolute path on Windows. Path traversal is the canonical example of a vulnerability that looks solved from ten feet away and is unsolved from one foot away.
AI-assisted SAST tools fall into the same trap. From a distance, both Griffin and Mythos-class scanners will flag obvious open(user_input) patterns. Up close, only one of them tells you whether the specific normalisation used in your code covers the specific bypass that your platform exposes.
Why path traversal is harder than it looks
Path traversal has a long list of quirks that matter for exploitability:
Encoding bypasses. Double URL-encoding, overlong UTF-8, Unicode normalisation forms, null bytes, and backslash-on-Windows-but-not-on-Linux behaviour. Each is a potential bypass if the guard function does not anticipate it.
Normalisation semantics differ. os.path.normpath, Path.resolve, realpath, java.nio.file.Path.normalize, and the .NET equivalents all behave slightly differently regarding symbolic links, relative paths, and the current working directory. A guard that uses one and a sink that uses another can be bypassed.
Symbolic links and mount points. Even with perfect normalisation, a symbolic link inside the base directory can point outside it. Code that resolves the path after joining with the base directory may be safe; code that joins first and checks second is not.
Archive extraction is a separate category. Zip slip, tar slip, and their variants occur when an archive entry contains a relative path that escapes the extraction directory. This is path traversal with a different source, and many scanners do not model it at all.
Sink variety. File read, file write, file delete, template loading, static file serving, log path construction, and include/require calls are all path-traversal sinks with different consequences. A path traversal in a read sink leaks data; in a write sink it overwrites files; in an include sink it often leads to RCE.
A credible path-traversal analysis has to account for all of this. Most do not.
Where Mythos-class scanners break down
Pure-LLM scanners have a consistent pattern of failure on path traversal:
They trust normalisation by name. If the code calls normalize or resolve, the LLM reports the path as sanitised. It does not check whether the normalisation happens before or after the base-directory join, which determines whether ../ escapes actually get caught.
They miss zip slip entirely. Archive extraction loops are short and look safe. The exploit is in the interaction between the archive entry path and the extraction root, which a retrieval-based scanner is unlikely to reason about unless it has specifically been prompted to look for it.
They misread Windows paths. Backslash-separated paths, drive letters, and UNC shares behave differently from POSIX paths. A guard that rejects .. and / will miss \..\ on Windows. Pure-LLM scanners pattern-match on POSIX assumptions from their training data.
They confuse path joining with path constructing. When a path is built with a template like f"{base}/{user}/file", some scanners flag it as traversal even when the user component is constrained to a whitelist. Others miss traversal when the path is built with string concatenation across functions because retrieval does not link the pieces.
The result is a pile of findings that alternate between "the code uses normalize, so this is fine" and "this string concatenation looks like path construction, so this is unsafe," neither of which is tied to the actual file system behaviour.
How Griffin analyses path operations
Griffin treats path analysis as a problem with specific semantics and engineers it accordingly.
Path-aware taint tracking. The engine tracks which parts of a path are attacker-controlled: the base, the segment, or the full string. It distinguishes between joining a tainted segment onto a fixed base (the common case) and accepting a fully tainted path (much worse). This changes which findings are plausible and what the reasoning layer has to check.
Normalisation semantics. Each normalisation primitive is modelled by its real behaviour, not by its name. Griffin knows that Path.resolve on Windows handles backslashes and drive letters differently from POSIX. It knows that normpath does not resolve symbolic links, but realpath does. It knows the order of operations matters: normalising before joining is different from joining before normalising, and the engine flags the unsafe ordering explicitly.
Sink-specific reasoning. The engine classifies each sink by operation type (read, write, delete, include, template, archive) and by the expected allowed paths. A finding includes the sink category, which changes the severity assessment. An include-sink traversal in PHP gets a different priority from a read-sink traversal in a static file server.
Archive extraction as a first-class pattern. Griffin has dedicated detection for zip slip and tar slip, including the common safe patterns (validating the extracted path against the extraction root, using Path.startsWith checks). It also recognises the unsafe patterns, including ones where the check is present but applied to the pre-normalisation path.
Encoding-aware guard evaluation. When the code contains a guard function, the reasoning layer tests it against the standard bypass encodings. If the guard strips .. but does not decode URL escapes first, Griffin reports the specific bypass that defeats it. This is not pattern matching on the function name; it is reasoning about the bytes the function rejects and does not reject.
A benchmark that respects the nuance
On a 45-case path-traversal corpus spanning read/write/include sinks, archive extraction, Windows-specific patterns, and symbolic link scenarios:
- Griffin reported 42 true positives with 2 false positives and 1 false negative (a subtle case involving a custom normalisation routine that we have since catalogued).
- Mythos reported 21 true positives with 9 false positives. The misses were concentrated in archive extraction (zip slip), Windows-specific encoding, and cases where the guard function was syntactically correct but applied in the wrong order.
The encoding-bypass cases were particularly skewed. Griffin caught 12 of 14; Mythos caught 3 of 14. This is exactly the sub-class where pattern-matching fails because the bug lives in what the guard does not reject, not in what the code does.
What findings should look like
A path-traversal finding that a developer can act on should include:
- The tainted path component and its origin.
- The base directory, if any.
- The normalisation function and its exact behaviour on your target platform.
- The sink type (read/write/include/etc.) and the implication.
- The specific bypass encoding that defeats the guard, if a guard is present.
- A suggested fix that addresses the actual gap rather than wrapping another layer of sanitisation.
Griffin produces this shape of finding as the default output. Mythos tends to produce prose describing the general category of path traversal and a hand-wavy fix that amounts to "validate input," which is advice the developer already ignored once.
Why the architecture matters
The core argument is simple. Path traversal is not a vulnerability that can be identified reliably by reading code; it has to be identified by reasoning about what the code does on the target platform, given the specific guards in place. That is a structured analysis task. Engines are good at structured analysis. Language models are good at context reasoning once the structure is given. Griffin's architecture uses both in their natural roles. Pure-LLM scanners use one tool for a job that requires two.
On path traversal specifically, the gap is large enough to be visible in any serious evaluation. If you want to stress-test an AI-assisted SAST vendor, show them a repository with a zip extraction loop that has a subtle flaw. The tools that catch it understand path semantics. The ones that do not are extrapolating from training data.
Closing thought
Path traversal survives because the sanitisation story is more complicated than it looks, and scanners that do not model the complication produce confident wrong answers. Griffin's engine-plus-LLM design models the complication, including the encodings, the normalisation ordering, and the sink-specific implications. The resulting findings are actionable, specific, and defensible in a code review. That is the bar AI-assisted security has to clear, and Mythos-class scanners are not clearing it on this class.