When a frontier model is asked to find vulnerabilities in code without structured grounding, it produces findings — some real, some not. The rate of hallucinated findings (plausible-sounding vulnerabilities that don't actually exist in the code) varies with the task and model but consistently lands between 20% and 70% in published research. For production use, any rate above ~10% means the output is operationally unusable without heavy human filtering. Grounding is the architectural mechanism that drops the rate to production-acceptable levels.
What hallucinated findings look like
Three common patterns:
- Non-existent imports. The model reports a dangerous import that isn't in the file.
- Wrong function attribution. The model attributes a vulnerability to the wrong function.
- Confident-sounding non-vulnerabilities. Plausible-looking SQL injection in code that uses parameterised queries.
Each is costly to triage because the finding reads like analysis.
Why hallucination happens
Three reasons:
- Models are completion engines. They produce the most likely next tokens given the input. Plausibility is optimised, not truth.
- Security-specific training data is limited. The model's training includes general security knowledge but not deep grounding in specific codebase analysis.
- Multi-hop reasoning is unreliable. Chains of inference accumulate error.
Each factor compounds.
How grounding fixes it
Griffin AI's grounding approach:
- The engine does reachability, taint, and call graph analysis deterministically.
- The model reasons over the engine's structured output, not over raw code.
The model is asked "given this specific taint path, what is the exploit hypothesis?" not "find vulnerabilities in this code." The narrower question is one the model is far more reliable at.
Published Griffin AI hypothesis-accuracy numbers: 81% full agreement, 94% with partial CWE credit. The gap from the 30-80% pure-LLM baseline is the grounding effect.
What to evaluate
Three concrete checks:
- Ask the platform to analyze code with no structured grounding. Measure hallucination rate.
- Add grounding (reachability, SBOM). Re-measure.
- Compare operational usability.
How Safeguard Helps
Safeguard's engine-plus-LLM grounding architecture measurably reduces the hallucination rate that afflicts pure-LLM security analysis. For teams whose triage time is dominated by false positives, grounding is the architectural property that changes the economics.