Meanwhile from [1]:
"Not even half-way through this #curl release cycle we are already at 11 confirmed vulnerabilities - and there are three left in the queue to assess and new reports keep arriving at a pace of more than one/day."
"The simple reason is: the (AI powered) tools are this good now. And people use these tools against curl source code.They find lots of new problems no one detected before. And none of these new ones used Mythos. Focusing on Mythos is a distraction - there are plenty of good models, and people who can figure out how to get those models and tools to find things."
Yeah, it looks like there are at least 11 security bugs missed by Mythos.
[1] https://www.linkedin.com/feed/update/urn:li:activity:7463481...
That would align with the curl feedback you linked, they aren't using mythos but are finding bugs with other models. Presumably the expectation would be that with mythos they'd find more that were missed by other models already used.
It's not quite apples-to-apples. It was Opus on Firefox 148, Mythos on 150. A better test of Mythos vs Opus would have been to apply Mythos to Firefox 148. Or also re-apply Opus to Firefox 150.
Do we know all the Opus+Firefox 148 bugs are fixed in Firefox 150? Do we know the number of new bugs introduced per Firefox release?
That may be parsable from their bug tracker, though I don't know of all bugs raised by mythos are public.
I'd be particularly interested in how many of the bugs found existed in 148. Assuming most or all of them weren't newly created bugs added in 149 or 150, the comparison should still hold even though Opus and Mythos looked at different releases.
Anthropic promised us that Mythos was such an existential threat that it would compromise "every OS and browser on devices across the planet". They've held conferences and meetings with banks and govts across the world, shouting how critical this issue is.
GPT5.5 has been out for a month. Every device on earth has not been breached yet. It's very fair to criticize Anthropic's maximalist posturing when it's becoming exceedingly clear their models are fairly behind OpenAI's in capability.
In my opinion, the original commenter's statement stands, and the UK govt data point only helps support that due to the equal result between Mythos and GPT.
I'd advise reading into the specifics of what happened with Firefox; the TL;DR is a reduced safety version of its code was scanned by Opus 4.6 (yes Opus) and found a multitude of bugs and 4 high severity vulns that did not escape sandbox. The Mythos system card test describes running Mythos against the same issues Opus found to see if it could reliably replicate and chain together an attack.