I’ve read plenty of papers with “formal proofs of correctness” that turned out to have huge flaws. Machine verifiable proofs I trust. But I’ve personally found more bugs with fuzzing than I have via proofs.
I have found this quickly becomes false. I have learned I cannot review llm generated code as if it is written by a trusted senior developer (where I often just do a quick look, see nothing obvious and hit approve). Once you start reading the code in depth with the goal of understanding you quickly see the places where flaws are likely. Sure I start with no clue where to look, but it doesn't take long to see things.
Of course not. That's why they are so rare. But I thought we live in an AI era now where this kind of stuff can be done by a machine.