That's because there were none. All bugs came with verifiable testcases (crash tests) that crashed the browser or the JS shell.
For the JS shell, similar to fuzzing, a small fraction of these bugs were bugs in the shell itself (i.e. testing only) - but according to our fuzzing guidelines, these are not false positives and they will also be fixed.
There's some nuance here. I fixed a couple of shell-only Anthropic issues. At least mine were cases where the shell-only testing functions created situations that are impossible to create in the browser. Or at least, after spending several days trying, I managed to prove to myself that it was just barely impossible. (And it had been possible until recently.)
We do still consider those bugs and fix them one way or the other -- if the bug really is unreachable, then the testing function can be weakened (and assertions added to make sure it doesn't become reachable in the future). For the actual cases here, it was easier and better to fix the bug and leave the testing function in place.
We love fuzz bugs, so we try to structure things to make invalid states as brittle as possible so the fuzzers can find them. Assertions are good for this, as are testing functions that expose complex or "dangerous" configurations that would otherwise be hard to set up just by spewing out bizarre JS code or whatever. It causes some level of false positives, but it greatly helps the fuzzers find not only the bugs that are there, but also the ones that will be there in the future.
(Apologies for amusing myself with the "not only X, but also Y" writing pattern.)
Did you also test on old source code, to see if it could find the vulnerabilities that were already discovered by humans?
“Our first step was to use Claude to find previously identified CVEs in older versions of the Firefox codebase. We were surprised that Opus 4.6 could reproduce a high percentage of these historical CVEs”
Then, with each model having a different training epoch, you end up with no useful comparison, to decide if new models are improving the situation. I don't doubt they are, just not sure this is a way to show it.
I think it was curl that closed its bug bounty program due to AI spam.
The curl situation was completely different because as far as I know, these bugs were not filed with actual testcases. They were purely static bugs and those kinds of reports eat up a lot of valuable resources in order to validate.
The level of AI spam for Firefox security submissions is a lot lower than the curl people have described. I'm not sure why that is. Maybe the size of the code base and the higher bar to submitting issues plays a role.