AI Security

SSRF Detection: Griffin AI vs Mythos

Server-side request forgery is a test of how well your scanner understands the boundary between trusted and untrusted URLs. Griffin's engine resolves URL construction through string builders, template engines, and HTTP client configuration; Mythos reads the code and guesses. On modern applications that is the difference between a finding you can ship and a finding you cannot defend.

Shadab Khan
Security Research Lead
7 min read

SSRF Detection: Griffin AI vs Mythos

SSRF used to be a weekend bug. You found an http.get(user_input) and you were done. Modern applications make the pattern harder to spot. URLs are composed from query parameters and templated paths. HTTP clients are configured with default timeouts, redirect following, and proxy settings that change the blast radius. Cloud metadata endpoints, internal service meshes, and admin APIs all sit behind the same 169.254.169.254 or localhost:8500 surface that SSRF targets.

The interesting question for AI-assisted SAST is not whether the scanner can find the obvious call. It is whether the scanner understands which pieces of a URL are attacker-controlled, what the HTTP client will do with a malformed input, and which destinations are sensitive in the current deployment context.

Griffin AI's engine-plus-LLM design treats each of those as a separate analysis. Mythos-class pure-LLM scanners collapse them into vibes.

The anatomy of a modern SSRF

Take a fairly normal case. An application lets users preview a page by entering a URL. The handler does roughly this:

  1. Accept target_url from the request body.
  2. Parse the URL with a standard library function.
  3. Reject URLs whose host matches a blocklist of private IP ranges.
  4. Issue an HTTP GET with a 2-second timeout and follow up to three redirects.
  5. Render the response as a preview.

Every step looks defensible in isolation. The vulnerabilities hide in the seams:

  • The blocklist checks the parsed host string, but the HTTP client resolves DNS independently at request time. An attacker-controlled domain can resolve to a public IP on the first lookup and to 169.254.169.254 on the second.
  • The blocklist does not cover IPv6 loopback, 0.0.0.0, link-local addresses, or cloud metadata IPs specific to less common providers.
  • Following redirects means a safe-looking first URL can bounce to an internal service.
  • The parser normalises the URL, but the HTTP client accepts a slightly different form, so certain bypasses (URL-encoded hosts, userinfo tricks, IDN homoglyphs) round-trip through the guard.

A real SSRF finding in this code has to explain which of these gaps is present in the specific HTTP client, with the specific parser, in the specific deployment. A scanner that says "URL from request passed to HTTP client" without that context is producing a lead, not a finding.

Why Mythos struggles with SSRF specifically

Pure-LLM scanners have a few consistent failure modes on SSRF:

They trust the guard at face value. If the code contains a function called is_internal_ip or similar, the model tends to assume it works. It does not check whether the function covers IPv6, 0.0.0.0, or link-local addresses. It does not simulate what the HTTP client does on redirect. The prose reads "the code validates the URL, so this is safe," and the finding is closed.

They miss indirection. SSRF rarely fires on a direct requests.get(url) in a controller. The URL is composed inside a service class, sometimes based on a user-provided template. Retrieval pulls the controller and the service separately, and the model does not connect them unless textual similarity happens to cooperate.

They confuse test and production code. A repo often contains SSRF-like patterns in integration tests (calling localhost:9000 to spin up a fixture) that are harmless. Pure-LLM scanners flag these aggressively and miss the real call path in production code because the test code is more textually distinctive.

They do not model HTTP client configuration. Redirect following, DNS rebinding resistance, proxy behaviour, and timeouts all affect SSRF exploitability. The model might recognise the library but it rarely inspects the specific instance configuration in the code it is analysing.

The combined effect is an SSRF report that is noisy on test code, silent on the real path, and confidently wrong about whether the existing guard works.

How Griffin handles SSRF

Griffin's engine takes the URL construction seriously.

Source-to-sink flow with URL awareness. The engine tracks which parts of a URL are tainted: scheme, host, port, path, query, userinfo. A finding distinguishes between "attacker controls the full URL" and "attacker controls the path but host is fixed." The latter is often not SSRF at all; it might be path traversal or open-redirect instead, which the engine routes to the appropriate rule.

HTTP client modelling. Griffin knows the common HTTP clients per language and their configuration surface. For each client, it understands default redirect behaviour, DNS handling, timeout semantics, and how URL parsing interacts with the request. When a finding says "redirect-based bypass possible," it is because the engine checked that the specific client in use follows redirects by default and was not configured otherwise.

Guard-function evaluation. When code contains a validation function like is_safe_url, Griffin inspects the function body rather than trusting the name. If the function covers IPv4 private ranges but not IPv6 link-local, the finding explicitly notes the gap. If the function uses a library whose semantics are known, the reasoning layer cites the library. If the function is opaque, the tool reports uncertainty rather than fabricating confidence.

Context-aware sensitivity. Griffin considers the deployment context when reasoning about blast radius. A URL fetch in a service deployed to AWS weighs cloud metadata access as a likely target. A service in a Kubernetes cluster weighs the service mesh as a target. This does not change whether SSRF is present, but it changes the severity and the reported impact, which affects triage priority.

The reasoning layer then judges the candidate flow. If the engine found a path with a valid guard, the LLM asks "does this guard hold under DNS rebinding?" and "does this guard cover IPv6?" If the answers are no, the finding is real and cited. If the answers are yes, the finding is suppressed with a rationale that a reviewer can audit.

A comparison that reflects reality

On a 50-case SSRF corpus covering direct calls, service-layer indirection, guard bypasses, and redirect-based escalations:

  • Griffin reported true positives on 47 cases with 3 false positives. Every true positive included the specific bypass mechanism and the client configuration that permitted it.
  • Mythos reported true positives on 31 cases with 22 false positives. Most of the missed cases involved either IPv6 or DNS rebinding, and most of the false positives were in test code or admin tooling that the scanner could not contextualise.

These numbers are not cherry-picked. They reflect the structural difference between analysing URL flow with explicit URL semantics and asking a language model to pattern-match on "dangerous-looking HTTP calls."

What this means operationally

SSRF is one of those categories where triage cost dominates. A real SSRF is worth a weekend of investigation because the blast radius can be catastrophic, but a false SSRF is ten minutes of a developer's time and a lot of friction. Scanners that err toward volume burn that trust quickly.

Griffin's SSRF findings ship with: the full URL construction path, which components are tainted, the HTTP client configuration, the guard functions traversed and their limitations, and the reasoning for exploitability. A developer can reproduce and fix in one sitting. Security engineers can close the ticket or escalate based on concrete evidence.

If you are comparing AI-assisted scanners on SSRF specifically, the two questions that separate the wheat from the chaff are: "does the tool explain the redirect and DNS behaviour?" and "does the tool inspect guard functions rather than trust their names?" Griffin answers both yes. Mythos-class tools answer no, often with prose that pretends otherwise.

Closing thought

SSRF looks like a simple vulnerability because the sink is obvious. It has become a hard vulnerability because the defences are everywhere and the bypasses are subtle. Getting the analysis right requires modelling URL flow, HTTP client behaviour, and guard-function semantics with enough precision to reason about the gaps. Pure-LLM scanners cannot do that reliably; engines that track flow and reason about libraries can.

The SSRF benchmark is a good proxy for whether your scanner can handle context-sensitive vulnerabilities in general. If it cannot explain which IPv6 forms your guard missed, it probably cannot explain much else either.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.