https://www.flyingpenguin.com/mythos-mystery-in-mozilla-numb...
And how much with Opus 4.7? 5x?
There is also a pretty big risk that anyone who is not you would leak the answer to the test. We are close to n=1 epistemics here. You’re going to have to do the research yourself.
Yes, Anthropic have said they made Opus 4.7 worse at this on purpose.
> It is entirely possible that Mythos is a different architecture or size
It has 5x the token pricing of Opus 4.7, so it's probably larger.
https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5...
If I understand correctly, Opus 4.7 was launched as nerfed Mythos with some improvements from 4.6.
Anthropic launches major bumps (like 4.6 to 4.7) every 4 - 5 months. So by all accounts, Mythos should be released by July.
The problem reduces to: How quickly can competing models surpass Opus 4.7 and start taking over Anthropic's market share?
So yeah, huge marketing as always.
That's the one that says:
> We took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. Those models recovered much of the same analysis.
Or providing a map with a direction
There is a long history of high-value private vulns being rediscovered from scant details
The American firms are focused on marketing now to convince people to not even consider open sourced models / open weight models as they are inferior (that’s what they want you to believe).
If people actually believe the narrative then the bankers will over price Anthropic and get away with it.
4.6 but close.
"...After fixing the initial set of issues that Anthropic sent to us in February, we built our own harness atop our existing fuzzing infrastructure.
We began with small-scale experiments prompting the harness to look for sandbox escapes with Claude Opus 4.6. Even with this model, we identified an impressive amount of previously-unknown vulnerabilities which required complex reasoning over multiprocess browser engine code..."
So yeah, Anthropic and Mozilla likely compare "Amount of bugs found by Opus 4.6 during early experiments" vs "Amount of bugs found by Mythos during large-scale codebase scanning".
[1] https://hacks.mozilla.org/2026/05/behind-the-scenes-hardenin...
https://xbow.com/blog/mythos-offensive-security-xbow-evaluat...