upvote
A general limitation of this approach is that it is only as good as your validator, and there's nothing easier to validate than a test case that creates, say, an AddressSanitizer use-after-free. For subtler issues will we have to more specific validators or will the LLM become better at coming up with other dangerous conditions it will verify? We'll see.
reply
> A general limitation of this approach is that it is only as good as your validator, and there's nothing easier to validate than a test case that creates, say, an AddressSanitizer use-after-free

Sure, but, surely AddressSanitizer would also detect the same problem in the C or Rust which together also make up about 25% of Firefox so... ?

reply
It's because they verified the bugs using AddressSanitizer so by construction it was only ever going to find C++ bugs.
reply
It's possible Mythos is a lot better at finding vulnerabilities in C++ code than it is for other languages. After all, these models are also based on pre-existing security analysis.

From what I can tell, a lot of these bugs were hardly C++-specific, they just happened in C++ code. Even the most secure Rust can't magically catch things like TOCTOU issues.

reply
> Even the most secure Rust can't magically catch things like TOCTOU issues

I suppose it depends what the word "magically" means. A ToCToU race is because you imagined things wouldn't change but they did and in Rust you actually do write fewer patterns with this mistake because of the Mutable xor Aliased rule. If we have at least one immutable reference to a Goose then Rust isn't OK with anybody mutating the Goose, your safe Rust can't do that and unsafe Rust mustn't do that. So the ToCToU race caused by "Oops I forgot somebody else might change the Goose" is less likely because you were made to wrestle with this problem during design - the safe Rust where you just forgot about this doesn't compile.

reply