undefined

points

[-]

I'd be hypothetically very curious to see hypothetical results if you ever decide to hypothetically run Mythos aginst the code (in Minecraft?)

by CaveTech16 hours ago|

prev|

[-]

It was found with gpt 5.5 7/10 times it’ll be trivially found by mythos

by afro8816 hours ago|

parent|

[-]

That's an example of why it would be useful for someone to actually do it. A random commenter on HN is one thing. A direct comparison on a brand new app that isn't part of any training is another

by CaveTech16 hours ago|

parent|

[-]

I’m highly confident that prior exposure is irrelevant at this point. I work on vulnerability detection at a hyperscaler.

by HDBaseT15 hours ago|

parent|

[-]

That's an example of why it would be useful for someone to actually do it. A random commenter on HN is one thing. A direct comparison on a brand new app that isn't part of any training is another

by GuB-421 hours ago|

parent|

prev|

[-]

Before Mythos is released to the world at large and not just to select people behind NDAs, I will treat it as its name suggests: as fiction.

Maybe it is the real deal, but in a world of overpromising and underdelivering, I prefer to be skeptical.

by enraged_camel12 hours ago|

parent|

prev|

[-]

People need to stop repeating this because it’s not true. Yes, other models can find the same vulnerabilities Mythos found… if pointed at the exact code that has each vulnerability. It does not mean they are nearly as capable when starting from scratch, or when chaining multiple (often very obscure) vulnerabilities).

by adrian_b5 hours ago|

parent|

[-]

Anthropic themselves have explained that the harness for Mythos has a very important role in finding the vulnerabilities, because the model does not start from scratch, but the harness runs the model many times on each file of the code base, with different prompts, where the prompts evolve depending on the results of the previous runs.

First with more generic prompts, to determine whether it is worthwhile to do a detailed analysis of that file, then with more specific prompts to identify the bugs, and eventually with a prompt that requests a confirmation that a given bug/vulnerability exists.

For a proper comparison between some other model and Mythos, you also need such a complex harness. If you just tell to an LLM "find the bugs", and it does not find a vulnerability known to have been found by Mythos, that is a totally invalid comparison.

The final results provided by Mythos, like a PoC exploit or a patch, are also generated with a prompt that points to the exact code that has the vulnerability (which is supposed to exist based on the results of the previous runs).

by loeg4 hours ago|

parent|

[-]

My take from the SCW interview is that the Mythos harness isn't all that important and the author thought it would be even less important with future models. But maybe I misremember.

by bitexploder4 hours ago|

parent|

[-]

Anthropic has a vested interest in downplaying the harness relevance. In my experience harness really matters. More capable models are great, but current models are enough if you put some engineering effort into the harness.

by nznzjzizixnsnsj16 hours ago|

prev|

[-]

lol what is even the point of this kind of comment? this is the ultimate "source: trust me bro" comment I have ever seen.

every model since gpt3 was claimed to be "too dangerous to release." it's too EXPENSIVE to release, and you're probably a local model with <10B parameters yourself

by Karuma7 hours ago|

parent|

[-]

That was actually GPT-2: https://www.theguardian.com/technology/2019/feb/14/elon-musk...

by bakugo8 hours ago|

parent|

prev|

[-]

The point of it is marketing for Anthropic. Nothing more, nothing less.

by DontchaKnowit5 hours ago|

prev|

[-]

Damn bro you're so cool

by tsunamifury16 hours ago|

prev|

[-]

cool.