The only practical defense is for these frontier models to generate _beneficial_ attacks to innoculate older binaries by remote exploits. I dubbed these 'antibotty' networks in a speculative paper last year, but never thought things would move this fast! https://anil.recoil.org/papers/2025-internet-ecology.pdf
System Card: Claude Mythos Preview [pdf] - https://news.ycombinator.com/item?id=47679258
Project Glasswing: Securing critical software for the AI era - https://news.ycombinator.com/item?id=47679121
I can't tell which of the current threads, if any, should be merged - they all seem significant. Anyone?
I'd love to see them go for a wasm interpreter escape, or a Firecracker escape, etc. They say that these aren't just "stack-smashing" but it's not like heap spray is a novel technique lol
> It autonomously obtained local privilege escalation exploits on Linux and other operating systems by exploiting subtle race conditions and KASLR-bypasses.
I think this sounds more impressive than it is, for example. KASLR has a terrible history for preventing an LPE, and LPE in Linux is incredibly common. Has anything changed here? I don't pay much attention but KASLR was considered basically useless for preventing LPE a few years ago.
> Because these codebases are so frequently audited, almost all trivial bugs have been found and patched. What’s left is, almost by definition, the kind of bug that is challenging to find. This makes finding these bugs a good test of capabilities.
This just isn't true. Humans find new bugs in all of this software constantly.
It's all very impressive that an agent can do this stuff, to be clear, but I guess I see this as an obvious implication of "agents can explore program states very well".
edit: To be clear, I stopped about 30% of the way through. Take that as you will.
From a marketing standpoint Anthropic is showing that they're able to direct 'compute' to find vulnerabilities where human time/cost is not efficient or effective.
Project Glasswing is attempting to pay off as many of these old vulnerabilities as possible now so the low-hanging fruit has already been picked.
The next generation of Mythos and real world vulnerabilities exploits are going to be in newly committed code...
That's fine, I wouldn't argue against that. It doesn't really change things, right?
> From a marketing standpoint Anthropic is showing that they're able to direct 'compute' to find vulnerabilities where human time/cost is not efficient or effective.
Yes, they've demonstrated that.
Good morning Sir.
> Has anything changed here? I don't pay much attention but KASLR was considered basically useless for preventing LPE a few years ago.
No. It's still like this. Bonus point that there are always free KASLR leaks (prefetch side-channels).
But then, this thing is just.. I don't have a word for this. Just randomly read paragraphs from the post and it's like, what?
> It is easy to turn this into a denial-of-service attack on the host, and conceivably could be used as part of an exploit chain.
So yeah, perhaps some evidence to what I'm getting at. Bug density is too low in that project, it's high enough in others. I'll be way way way more interested in that.
> But then, this thing is just.. I don't have a word for this. Just randomly read paragraphs from the post and it's like, what?
I read about 30% and got bored. I suppose I should have been clearer, but my impression was pretty quickly "cool" and "not worth reading today".
I was lucky then :) Somehow I saw this first. And then the "somewhat reliably writing exploits for SpiderMonkey" part, and then the crypto libraries part. Finally I wonder why is there a Linux LPE mini writeup and realized it's the "automatically turn a syzkaller report to a working exploit" part.
Now that I read the first few things (meh bugs in OpenBSD, FFmpeg, FreeBSD etc) they are indeed all pretty boring!
The post also points out that the model wasn't trained specifically on cybersecurity, and that it was just a side-effect – so I think there's still a lot of headroom.
It's scary, but there's also some room for cautious non-pessimism. More people than ever can cause billions of dollars of damage in attacks now [1], but the same tools can be used for defensive use. For that reason, I'm more optimistic about mitigations in security vs. other risk areas like biosecurity.
[1]: https://www.noahlebovic.com/testing-an-autonomous-hacker/
Given that it's absolutely impossible to stop people not aligned with us (for any definition of us) from doing AI research, the most reasonable way forward is to dedicate compute resources to the frontier, and to automatically send reasonable disclosures to major projects. It could in itself be a pretty reasonable product. Just like you pay for dubious security scans and publish that you are making them, an LLM company could offer actually expensive security reviews with a preview model, and charge accordingly.
Terrible take. You don't get to push the extinction button just because you think China will beat you to the punch.
>This is the very nature of being a human being. We summit mountains, regardless of the danger or challenge.
No, just no... We barely survived the Cold War, at times because of pure luck. AI is at least as dangerous as that, if not more. We have far exceeded our wisdom relative to our capabilities. As you have so cleanly demonstrated.