A question we get almost every week from security engineers evaluating Griffin AI is some version of: "If Griffin uses Claude underneath, why don't we just call Claude directly?" It's a fair question, and the honest answer is more nuanced than most vendor pitches suggest.
Griffin AI is not a competitor to Claude. Griffin uses Anthropic's Claude family — Opus, Sonnet, and Haiku — as its core reasoning engine. The comparison isn't Griffin versus Claude. It's "raw Claude API" versus "Claude plus security-specific grounding, evaluation harness, and workflow scaffolding." That distinction matters because it changes how you should evaluate the tool.
What Raw Claude Does Well
Claude on its own is an extraordinary general-purpose reasoning engine. Point it at a CVE description and it will produce a competent summary. Paste in a package.json and it will flag obvious problems. Ask it to write a remediation plan and you'll get something readable. For ad hoc security work, raw Claude is already better than many purpose-built tools that launched five years ago.
Teams who use Claude.ai or the Anthropic API directly for security work usually find that it handles well-defined, single-shot tasks beautifully. "Explain this vulnerability." "Suggest a fix for this insecure pattern." "Write a policy document for dependency management." These are the jobs where a general model with strong reasoning wins.
The friction shows up when the task stops being single-shot.
Where Raw Claude Starts To Strain
Security work is rarely a single question. A real triage session involves cross-referencing a CVE against your actual dependency graph, checking which services use the affected function, looking at exploit availability, and deciding whether to patch now or wait for a cleaner upgrade path. Every one of those steps needs data that isn't in Claude's training set — it's in your SBOM, your artifact registry, your issue tracker, and your runtime telemetry.
Raw Claude has no idea what you have installed. It has no memory of last week's triage decisions. It can't pull a fresh advisory from the OSV database. It can't open a Jira ticket when a critical finding lands. Every one of those integrations is something you have to build yourself, and security teams quickly discover that the integration layer is where the real cost is.
The second strain point is evaluation. Claude will confidently produce an answer even when it's hallucinating a CVE number or misattributing a fix to the wrong package version. In a security context, that's not a minor annoyance. A false remediation recommendation can break production or, worse, leave a real vulnerability unpatched while the team thinks they've handled it. Without a systematic evaluation harness, you won't know which of Claude's outputs you can trust.
What Griffin Adds On Top
Griffin wraps the same Claude models with three layers that specifically target these strain points.
The first layer is engine context. Griffin automatically injects the relevant slice of your tenant data into every Claude call — your projects, SBOMs, recent findings, integration state, and policy configuration. When you ask Griffin "what changed in our risk posture this week?" it isn't guessing. It's reasoning over the actual delta in your findings table. The same question asked to raw Claude returns a generic essay about how to measure risk posture.
The second layer is the evaluation harness. Every Griffin-generated recommendation passes through a set of graders that check for hallucinated CVEs, broken package coordinates, policy conflicts, and reachability mismatches. When a grader catches a problem, Griffin retries with a corrected prompt or falls back to a deterministic path. Users never see the bad output. With raw Claude, you'd see every hallucination and have to catch them yourself.
The third layer is workflow scaffolding. Security work is multi-step: scan, triage, remediate, verify, document, notify. Griffin packages those steps into durable workflows that can pause, resume, and hand off between humans and agents. Raw Claude gives you a single turn of reasoning. Griffin gives you a triage that runs overnight, pages the right engineer at the right time, and leaves an audit trail.
The Honest Trade-offs
Using Griffin means you're trading raw flexibility for opinionated structure. If your workflow is genuinely novel and doesn't fit any existing security pattern, Claude directly plus a few custom tools might serve you better. Griffin's assumptions — that you care about CVEs, that findings need to move through states, that remediation should be suggested before it's applied — are load-bearing. They only help if they match how your team works.
You're also paying for the scaffolding. The underlying Claude tokens cost the same either way, but Griffin's orchestration, context injection, and eval pipeline add latency and compute on top. For a one-off question, that overhead isn't worth it. For a recurring daily triage run, it pays for itself many times over.
And finally, you're accepting a vendor's opinion about what "good" security reasoning looks like. Griffin's prompts, graders, and tool wiring encode a particular philosophy of supply chain security. Most of that philosophy is uncontroversial in the industry, but if your team has strong, specific, contrarian views about how to handle, say, license risk or reachability, you may find yourself fighting the defaults.
When To Choose Which
Raw Claude is the right call when you need a smart reasoning partner for ad hoc questions, when you're doing one-time research, when you're building something truly custom and don't want an opinionated wrapper in the way, or when you're exploring what a security LLM can even do before committing to a platform.
Griffin is the right call when you've already decided that AI-assisted security is something you want to operationalize, when the volume of findings or the complexity of your supply chain exceeds what a small team can manually triage, when auditability and consistency matter more than novelty, and when you'd rather configure a workflow than build one from scratch.
Most teams we talk to end up using both. They use Claude directly for research, experimentation, and the edge cases Griffin doesn't cover. They use Griffin for the boring, repetitive, high-volume work where consistency and grounding pay off. That combination — human-in-the-loop Claude for the creative work, Griffin for the operational work — is where we see the best outcomes.
What We're Still Figuring Out
The line between "workflow scaffolding that helps" and "workflow scaffolding that gets in the way" shifts as Claude itself improves. Every time Anthropic ships a new model, some of the scaffolding Griffin used to do becomes unnecessary and some new capability becomes worth scaffolding for. Our job as a platform is to keep that layer thin where Claude has gotten stronger and thick where Claude still needs grounding.
We'd rather be honest about this than pretend Griffin is magic. Griffin is Claude doing what Claude already does, wrapped in the context, checks, and choreography that security teams would otherwise have to build themselves. That's the pitch, and that's the tradeoff worth weighing.