upvote
Interesting that we’ve have such different experiences. I was working with both those models today and on several occasions it proposed some pretty poor solutions.

I also find I need to run an llm code review or two against any code it produces to even get to the point where’s it’s ready for human review.

In any case they served as an extremely valuable tool.

reply
I use GPT 5.5. Sometimes it does what you say. It certainly finds silly mistakes in my code better than I could. But frequently enough it makes catastrophic architectural mistakes in its own code.
reply