It's a possibility, but it doesn't eliminate the possibility that it's hype. If these claims were indeed serious, they would submit it for independent analysis somewhere.
This isn't some crazy process. Defense contractors are required to submit their systems (secret sauce and all) for operational test and evaluation before they're fielded.
They have. 40 different companies that have all committed resources to patching their systems based on vulnerabilities found by Mythos. One of them, Google, is a frontier AI lab that pointedly did not say that their own models have found similar vulnerabilities.
> Defense contractors are required to submit their systems (secret sauce and all) for operational test and evaluation before they're fielded.
Does this look something like having 40 separate companies look at the outputs of the system, deciding that it’s real and they should do something about it, and committing resources to it?
At some point, “cynicism” is another word for “lalala can’t hear you”.
To which my answer is clearly, no, not even remotely. If Anthropic is outright lying about what Mythos can do, someone else will have it in a year.
In fact the security world would have to seriously consider the possibility that even if Mythos didn't exist that nation states have the equivalent in hand already. And of course, if Mythos does exist, nation states have it now. The odds that Antropic (and every other AI vendor) isn't penetrated enough by every major intelligence agency such that they have access to their choice of model approach zero.
I wonder about the overlap between people being skeptical of Mythos' capabilities, and those who are too skeptical of AI to have spent any time with it because they assume it can't be any good. If you are not aware of what frontier models routinely do, you may not realize that Mythos is just an evolution of existing capabilities, not a revolution. Even just taking a publicly-available frontier model, pointing it at a code base and telling it to "find the vulnerabilities and write exploits" produces disturbingly good results. I can see the weaknesses referenced by the Mythos numbers, especially around the actual writing of the exploits, but it's not like the current frontier models fall on their face and hallucinate wildly for this task. Most everything they produce when I try this is at least a "yeah, that's worth thinking about" rather than an instant dismissal.
I'm not that old but have been here long enough that I remember when GPT-3 was considered too dangerous to release. Now you have models 10x as good, 1/10th the size and run on 8GB VRAM.
Mythos will benefit security in the long run more than hackers, if it can do what they claim. And there's nothing that will stop an LLM like it from being released in the near term so it's very likely just resource constraints or marketing
We don't yet know if Mythos was a level shift in the capability/cost frontier, or a continued extension of the same logarithmic capability/cost curve.
unless you are an employee at anthropic and shouldn't be talking about any of this at all, there's no way to know what the model's capabilities are.
AI companies routinely claim that something is too dangerous to release (I think GPT-2 was the first case) for marketing reasons. There are at least 10 documented high profile cases.
They keep it secret because they now sell to the MIC with China and North Korea bullshit stories as well as to companies who are invested in the AI hype themselves.
And with gpt-2 the worry was mass emails a lot better and more detailed and personal, social media campaigns etc.
How many bots are deployed today on X and influencing democrazy around the globe?
Its fair to say it had an impact and LLMs still have.
The platonic ideal of how to dismiss any argument by anyone about anything.