I've seen that when prompting it to look for concurrency issues vs saying something more like "please inspect this rigorously to look for potential issues..."
(Which isn't to say don't do it: I think this is a huge benefit you can gain from being able to refactor more quickly. Just to say that you're gonna short-term give yourself a lot more homework to make sure you don't fix things that aren't bugs, or break other things in your quest to make them more provable/testable.)
Since it's a large codebase, they go even more specific and hint that the bug is in file A, then try again with a hint that the bug is in file B, and so on.
If anyone, or anything, ever answers a question like that, you should stop asking it questions.
I was very impressed when the top three AIs all failed to find anything other than minor stylistic nitpicks in a huge blob of what to me looked like “spaghetti code” in LLVM.
Meanwhile at $dayjob the AI reviews all start with “This looks like someone’s failed attempt at…”