Tomorrow, Anthropic ships Mythos to the world.
After unveiling the model in April and spending two months running it through Project Glasswing — a controlled program with roughly fifty defensive security partners — Anthropic is opening Mythos to general availability on June 10. If you work in security, you should understand what you are about to have access to and what it means for your program.
This is not a hype piece. We have been running Mythos in production through Glasswing since the early access program began, and the capabilities are real. So is the responsibility that comes with them.
What Mythos Actually Is
Mythos is not an incremental model release. It is Anthropic's first model positioned explicitly above the Opus and Sonnet lines — a frontier-tier system built with security capability as a first-class property, not an afterthought.
The headline numbers are striking. On the 2026 USA Mathematical Olympiad, Mythos scored 97.6 percent — 55 percentage points higher than Opus 4.6, which landed at 42.3 percent. That is not a marginal improvement on a hard benchmark. It represents a qualitative shift in the model's ability to reason through multi-step problems with precision.
The security numbers are even more significant. The UK AI Security Institute evaluated Mythos on expert-level hacking tasks — assessments that, until April 2025, no AI model could complete at all. Mythos completed them at a 73 percent success rate. Let that land: tasks previously requiring world-class human expertise are now completable by a model running inference in seconds.
Mozilla gave Mythos access to Firefox 150 and asked it to find vulnerabilities. It surfaced 271 findings — more than ten times what Opus 4.6 found in the previous Firefox release. The gap is not explainable by the software being different. The gap is the model.
And then there is reverse engineering. Mythos has demonstrated the ability to take a closed-source, stripped binary — no symbols, no source, no debug information — and reconstruct legible source code from it. This is a capability that previously required specialized tooling, deep expertise, and significant time investment. Mythos does it at model speed.
Why Anthropic Held It Back
The two-month Glasswing window was not a standard beta program. Anthropic was explicit that Mythos's security capabilities exceeded what responsible public access would allow without additional safeguards. A model that can complete expert-level offensive security tasks at 73 percent success needs a different deployment framework than a model that can write an email.
Project Glasswing served two purposes simultaneously. First, it gave Anthropic signal on how defensive security teams actually used the model — what worked, what hallucinated, where the false positive rate sat in real workflows. Second, it gave Glasswing partners time to build the operational infrastructure to use the model responsibly before the broader security community had access.
The controlled rollout is part of Anthropic's Responsible Scaling Policy, which requires deployment safeguards to be in place before releasing models whose capabilities cross certain thresholds. Mythos crossed the ASL-3 threshold for cyber capabilities. What you are getting tomorrow is a model that passed Anthropic's internal safety review with appropriate access controls in place.
What This Means for Your Security Program
If you are a defensive security team, Mythos is the most capable tool for finding unknown vulnerabilities that has ever been made generally available. The 271-vulnerability Firefox result is not a demonstration — it is what happens when you point the model at real software. Your software has vulnerabilities Mythos will find that your current tools will not.
If you are a software team shipping products with third-party dependencies, the model your threat actors now have access to is orders of magnitude more capable than what was available six months ago. The attack surface calculation has changed.
If you are a platform team running AI-powered products, Mythos introduces a capability that needs to be in your threat model. A model that can reverse engineer stripped binaries can also analyze the code your own AI systems generate. That is both an opportunity and a risk surface.
The honest framing is this: Mythos does not change whether sophisticated attackers find vulnerabilities in your software. It changes how long that takes. The asymmetry between attacker cost and defender cost just shifted again, and it shifted toward offense unless you are actively using the same capability on defense.
The Architecture Gap That Still Exists
What Mythos gives you is an extraordinarily capable model. What it does not give you is the operational infrastructure to deploy that capability in a production security program.
Finding vulnerabilities at Mythos's capability level creates the same triage and remediation problem that every AI scanning tool creates — at higher fidelity and higher volume. A model that finds ten times as many real vulnerabilities also needs ten times the operational capacity to act on them. Without a verification pipeline, severity contextualization, supply chain mapping, and automated remediation guidance, a more capable scanner creates more noise alongside more signal.
This is an architecture problem, not a model problem. Mythos is the most capable raw capability available. Deploying it without surrounding infrastructure is how you end up with a 271-finding report that sits in a backlog for six months because no one has the bandwidth to triage it.
Safeguard Supports Mythos — Natively, Today
Safeguard has been a Glasswing partner since the program launched in April. We have spent two months integrating Mythos into our Multi-Agent TAOR Deep Think AI Engine — not as a bolt-on, but as a first-class model within the agentic pipeline that handles zero-day discovery and remediation.
What that means in practice:
Zero-day discovery with Mythos's capability, not Mythos's noise. Safeguard routes Mythos's findings through our multi-agent verification pipeline before they surface as findings. A Mythos candidate finding goes through independent verification agents, exploitability reasoning against your actual deployment topology, and cross-reference against existing CVE data before it reaches your security team as a confirmed issue. You get Mythos's detection capability without the false positive burden of running a raw model.
Remediation, not just detection. Mythos tells you something is broken. Safeguard tells you how to fix it. Our remediation agents generate context-aware fix guidance — specific to your language, framework, and deployment — and can open pull requests directly. The finding-to-fix cycle that used to require a human researcher in the loop at every step runs autonomously for the vulnerability classes Mythos surfaces.
Agentic AI native. Safeguard is not a dashboard that receives model output. It is an agentic system where Mythos operates as an agent alongside verification agents, remediation agents, and supply chain intelligence agents. Mythos's reverse engineering capability integrates with our binary analysis pipeline so that vulnerabilities in compiled dependencies — not just source-available packages — are part of your continuous discovery surface.
Continuous, not point-in-time. Mythos runs against your dependency graph on every code change, not on a quarterly schedule. New dependencies, updated packages, and changed code paths are all in scope the moment they land. Zero-day discovery is a continuous property of your Safeguard deployment, not a periodic engagement.
Supply chain coverage. The vulnerability Mythos finds in a transitive dependency three levels deep in your package tree is surfaced with full supply chain context — which of your products is affected, in which environments, via which dependency path. You do not get a raw finding. You get an actionable risk item with blast radius assessment.
What to Expect When Mythos Ships Tomorrow
Anthropic will release Mythos through the Claude API. The model will be accessible via the same interface as Opus and Sonnet, with the usual token-based pricing. Access controls for certain capability classes — particularly the offensive security capabilities — will be governed by Anthropic's API usage policies, which have specific provisions for the ASL-3 capability set.
Expect the model to perform exactly as Glasswing numbers suggest. The vulnerability discovery results are reproducible. The math reasoning capability is genuine. The reverse engineering capability is real and fast. If you are evaluating it against your own codebase for the first time, run it on something you understand well so you can calibrate signal against noise before running it on unknown territory.
One honest note on the rollout: the first week of any broad model release involves calibration. Anthropic's safety systems will be tuning access controls based on observed usage patterns. Some capability classes may have additional friction in the first few days. This is the expected shape of a responsible ASL-3 rollout, not a limitation of the model.
The Moment We Have Been Building Toward
The security community has spent the last two years arguing about whether AI could actually do security work or whether it was pattern-matching on training data. Mythos ends that argument. 73 percent on tasks no model could complete last year, 271 vulnerabilities in a single codebase, reverse engineering of stripped binaries — these are not benchmark artifacts. They are the capability set that changes how security programs run.
The teams that move fastest to put Mythos's capability on defense — with the infrastructure to operationalize it, not just access it — will have a structural advantage over the teams still running quarterly scans with conventional tooling.
We have been preparing for this since April. Tomorrow is the starting gun for everyone else.
Safeguard's Mythos integration is live for all customers today. If you want to see Mythos-powered zero-day discovery running against your dependency tree before the general release lands, get in touch — we can show you a live run on your actual stack.