Opus "found" 8 issues. Two of them looked like they were probably realistic but not really that big a deal in the context it operates in. It labelled one of them as minor, but the other as major, and I'm pretty sure it's wrong about it being "major" even if is correct. Four of them I'm quite confident were just wrong. 2 of them would require substantial further investigation to verify whether or not they were right or wrong. I think they're wrong, but I admit I couldn't prove it on the spot.
It tried to provide exploit code for some of them, none of the exploits would have worked without some substantial additional work, even if what they were exploits for was correct.
In practice, this isn't a huge change from the status quo. There's all kinds of ways to get lots of "things that may be vulnerabilities". The assessment is a bigger bottleneck than the suspicions. AI providing "things that may be an issue" is not useless by any means but it doesn't necessarily create a phase change in the situation.
An AI that could automatically do all that, write the exploits, and then successfully test the exploits, refine them, and turn the whole process into basically "push button, get exploit" is a total phase change in the industry. If it in fact can do that. However based on the current state-of-the-art in the AI world I don't find it very hard to believe.
It is a frequent talking point that "security by obscurity" isn't really security, but in reality, yeah, it really is. An unknown but presumably staggering number of security bugs of every shape and size are out there in the world, protected solely by the fact that no human attacker has time to look at the code. And this has worked up until this point, because the attackers have been bottlenecked on their own attention time. It's kind of just been "something everyone knows" that any nation-state level actor could get into pretty much anything they wanted if they just tried hard enough, but "nation-state level" actor attention, despite how much is spent on it, has been quite limited relative to the torrent of software coming out in the world.
Unblocking the attackers by letting them simply purchase "nation-state level actor"-levels of attention in bulk is huge. For what such money gets them, it's cheap already today and if tokens were to, say, get an order of magnitude cheaper, it would be effectively negligible for a lot of organizations.
In the long run this will probably lead to much more secure software. The transition period from this world to that is going to be total chaos.
... again, assuming their assessment of its capabilities is accurate. I haven't used it. I can't attest to that. But if it's even half as good as what they say, yes, it's a huge huge huge deal and anyone who is even remotely worried about security needs to pay attention.
Anthropic spends millions - maybe significantly more.
Then when they know where they are, they spend $20k to show how effective it is in a patch of land.
They engineered this "discovery".
What the small teams are doing is fair - it's just a scaled down version of what Anthropic already did.
Unless Anthropic makes it known exactly what model + harness/scaffolding + prompt + other engineering they did, these comparisons are pointless. Given the AI labs' general rate of doomsday predictions, who really knows?