More like, give $$$ pass or not.
It's good if no LLMs can find a bug. It certainly does not mean there isn't one...
I've found LLMs to be very disappointing at identifying overly complex code (that they've written) and the correct architectural decisions to 1) make the code actually work, and 2) be simple, maintainable, and future proof.
They can certainly find some bugs, which definitely has value, but I've not had much success with them writing code that simply has no bugs...
That requires simplicity and architectural correctness, something LLMs are good at vaguely bullshitting, but not very good at getting correct.
I think this can be solved by feeding them the right metrics, but I haven't found prior art for how to algorithmically pinpoint: 1) what is actually complex in a bad way (there's a lot of ways to do this roughly), and 2) where exactly the problem is most acutely (less prior art here, but some), and 3) what viable solutions are.
If you can get better at 1 and 2, the LLMs can get much better at 3.
Anybody who has ideas, I'd love to hear them, as this is what I'm working on now.