I did not expect perfect reliability, but I thought they could at least get it right on the second attempt once you point out the difference. No such luck, it confidently tells you that now the code is the same, with yet another subtle bug added in the difference.
I don't know what work one would need to do where these garbage-class models would be adequate. Maybe they can masquerade as competent for a few minutes, but in the end the results simply are not right. At best they are suitable for a smarter search or autocomplete, in my opinion.
I don't think I'd be using AI to code at all if this weren't the case. (I don't want to feel stunted or stuck just from losing my internet connection.)