undefined

upvote

points

by OsrsNeedsf2P1 days ago |

upvote

by krisbolton1 days ago|

[-]

There is independent research out there on frontier model security capability. AI Security Institute (UK) put out their paper comparing Mythos to other frontier models in early April. They've been tracking frontier model security capability since early 2023, so it's a decent dataset. https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos...

reply

upvote

by energy1231 days ago|

[-]

. Mozilla found and fixed 271 vulnerabilities in Firefox 150 while testing Mythos Preview—over ten times more than they found in Firefox 148 with Claude Opus 4.6;

reply

upvote

by applfanboysbgon1 days ago|

[-]

Did they allocate the same number of tokens to looking with Claude 4.6? Or did they find more because they looked more, owing to a special initative by Anthropic?

reply

upvote

by kllrnohj1 days ago|

[-]

No, not really. Mythos found 3 CVEs, not 271.

https://www.flyingpenguin.com/mythos-mystery-in-mozilla-numb...

reply

upvote

by simonw1 days ago|

[-]

The Mozilla team responded to that argument here: https://hacks.mozilla.org/2026/05/behind-the-scenes-hardenin... - in the FAQ.

reply

upvote

by moyix1 days ago|

[-]

I think you're confusing CVEs and vulnerabilities here? Mozilla (per their longstanding practice) grouped multiple vulnerabilities found internally under a small number of CVEs.

reply

upvote

[-]

deleted

reply

upvote

by properbrew1 days ago|

[-]

> over ten times more than they found in Firefox 148 with Claude Opus 4.6

And how much with Opus 4.7? 5x?

reply

upvote

by arjie1 days ago|

[-]

The era where you could reputably believe things published by anyone on this front is over. If you want this information, you’re going to have to attempt it yourself with the Opus API. It is entirely possible that any released model access will be heavily guardrailed against hacking attempts and Mythos is just an unrailed model. It is entirely possible that Mythos is a different architecture or size. We can’t know from the outside.

There is also a pretty big risk that anyone who is not you would leak the answer to the test. We are close to n=1 epistemics here. You’re going to have to do the research yourself.

reply

upvote

by MallocVoidstar22 hours ago|

[-]

> It is entirely possible that any released model access will be heavily guardrailed against hacking attempts

Yes, Anthropic have said they made Opus 4.7 worse at this on purpose.

> It is entirely possible that Mythos is a different architecture or size

It has 5x the token pricing of Opus 4.7, so it's probably larger.

reply

upvote

by parker-34611 days ago|

[-]

Makes me wonder if Anthropic is really having issues with allocating compute (see recent deals with xAI and SpaceX). From available benchmarks, it seems like similar results should be possible with GPT 5.5 Pro or Opus 4.7 (with specific cybersecurity trained models).

reply

upvote

by smoe1 days ago|

[-]

At least according to this, GPT-5.5 Cyber is on par with Mythic, as the only two models that were able to finish their 32-step corporate network attack simulation.

https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5...

reply

upvote

by wiwiwq1 days ago|

[-]

Who knows but from a valuation stand point it’s better to signal that demand is higher than existing capacity..

reply

upvote

by ospray1 days ago|

[-]

This report is far more positive with a far lower false positive rate than I was expecting based on reports from the curl team and a few others. I guess I have just been hearing about the ten percent misses. Can anyone not employed by Anthropic who has used it vouch that it is equal to general human testers and do you need xbow to make it that way.

reply

upvote

by kirtivr18 hours ago|

[-]

Training for Mythos finished in February, 2026 while training for Opus 4.7 finished around that same time.

If I understand correctly, Opus 4.7 was launched as nerfed Mythos with some improvements from 4.6.

Anthropic launches major bumps (like 4.6 to 4.7) every 4 - 5 months. So by all accounts, Mythos should be released by July.

The problem reduces to: How quickly can competing models surpass Opus 4.7 and start taking over Anthropic's market share?

reply

upvote

by bobbycastorama1 days ago|

[-]

I've seen a blog post by a security researcher saying that he was able to find the same vulnerabilities (for Firefox IIRC) with a ~30B params LLM...

So yeah, huge marketing as always.

reply

upvote

by Brystephor1 days ago|

[-]

Did the security researcher point the LLM at the blob of information and say "Find vulnerabilities" or was the LLM told to "determine if vulnerability X is present in this blob"? Confirmation of suspected vulnerabilities is a different problem from finding vulnerabilities.

reply

upvote

by simonw1 days ago|

[-]

You mean this one? https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jag...

That's the one that says:

> We took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. Those models recovered much of the same analysis.

reply

upvote

by dansquizsoft21 hours ago|

[-]

Sounds like he applied exactly the same methodology then!

reply

upvote

by krisbolton1 days ago|

[-]

This is different though right? He found one (? we don't know who you're referring to - post sources for a higher quality discussion) vulnerability, he already knew it was there, etc. Anthropic didn't claim no other model can find vulnerabilities, nor that it's impossible with smaller models. They're claiming Mythos is a step-change in ability for end-to-end vulnerability discover and exploit creation. And that other frontier models are close behind.

reply

upvote

by nikcub1 days ago|

[-]

Finding the neeedle is easier when you remove the haystack

Or providing a map with a direction

There is a long history of high-value private vulns being rediscovered from scant details

reply

upvote

by wiwiwq1 days ago|

[-]

To me it’s clear what’s going on.

The American firms are focused on marketing now to convince people to not even consider open sourced models / open weight models as they are inferior (that’s what they want you to believe).

reply

upvote

by rhubarbtree1 days ago|

[-]

IPO is coming is what is going on

reply

upvote

by wiwiwq1 days ago|

[-]

That’s implicit in my post.

If people actually believe the narrative then the bankers will over price Anthropic and get away with it.

reply

upvote

by 0gs1 days ago|

[-]

what's weirdest to me (and i agree with you) is that it could ALSO be true that a highly competently managed, highly capitalized closed source and weights model training on tons of real-world data non-stop COULD stay ahead of open weights models, and that lead COULD grow. now, how competent (much less merciless) the frontier-blazing U.S. corporations will be able to be long-term ... i suspect they are right to be nervous and highly focused on optics, regardless of the truth :)

reply

upvote

by 23 hours ago|

[-]

deleted

reply

upvote

by pertymcpert1 days ago|

[-]

> Mozilla found and fixed 271 vulnerabilities in Firefox 150 while testing Mythos Preview—over ten times more than they found in Firefox 148 with Claude Opus 4.6

4.6 but close.

reply

upvote

by OsrsNeedsf2P1 days ago|

[-]

Right, but were they using the same methodology and harness? I'm skeptical that they're doing something with the harness - i.e. with Mythos, they pass each file in one at a time, whereas on 4.6 they let Claude Code run loose to find bugs. This would have a larger impact difference than the model itself.

reply

upvote

by mpyne1 days ago|

[-]

Yes, the harness they used actually existed and was in use beforehand, it wasn't developed for testing with Mythos.

reply

upvote

by ZrArm23 hours ago|

[-]

From Mozilla post [1]:

"...After fixing the initial set of issues that Anthropic sent to us in February, we built our own harness atop our existing fuzzing infrastructure.

We began with small-scale experiments prompting the harness to look for sandbox escapes with Claude Opus 4.6. Even with this model, we identified an impressive amount of previously-unknown vulnerabilities which required complex reasoning over multiprocess browser engine code..."

So yeah, Anthropic and Mozilla likely compare "Amount of bugs found by Opus 4.6 during early experiments" vs "Amount of bugs found by Mythos during large-scale codebase scanning".

[1] https://hacks.mozilla.org/2026/05/behind-the-scenes-hardenin...

reply

upvote

by boston_clone1 days ago|

[-]

you would likely be quite interested in the more quantitative writeup from a real research team ! it’s linked about midway in to the article - similar functionally can be reached, yes, but not always and never with fewer tokens than what mythos requires.

https://xbow.com/blog/mythos-offensive-security-xbow-evaluat...

reply

upvote

by OsrsNeedsf2P1 days ago|

[-]

Ok this is actually a pretty good article and justifies the step function marketing in security they talked about

reply

upvote

by enlightenedfool1 days ago|

[-]

Is this the God model that no one else can build? Unbelievable.

reply