Yes, but even here one needs some oversight.
My experiments with Codex (on Extra High, even) was that a non-zero percentage of the "tests" involved opening the source code (not running it, opening it) and regexing for a bunch of substrings.
"The AI said so ..."