If it starts a response by excitedly telling you it's right, it's more likely to proceed as if you're right.
Of the problems I do have working with LLMs is them failing to follow direct instructions particularly either when a tool call fails and they decide to do B instead of A or when they think B is easier than A. Or they'll do half a task and call it complete. Too frequently I have to respond with "Did you follow my instructions?" "I want you to ACTUALLY do A" and finally "Under no circumstances should you ever do anything other than A and if you cannot you MUST admit failure and give extensive evidence with actual attempts that A is not possible" or occasionally "a cute little puppy's life depends on you doing A promptly and exactly as requested".
--
Thing is I get it if you are impressionable and having a philosophical discussion with an LLM, maybe this kind of blind affirmation is bad. But that's not me and I'm trying to get things done and I only want my computer to disagree with me if it can put arguments beyond reasonable doubt in front of me that my request is incorrect.
Instead, they either blindly follow or quietly rebel.
Frustrating, but “over correction” is a pretty bad euphemism for whatever half assed bit of RLHF lobotomy OpenAI did that, just a few months later, had ChatGPT doing a lean-in to a vulnerable kid’s pain and actively discourage an act that might have saved his life by signaling more warning signs to his parents.
It wasn’t long before that happened, after the python REPL confusion had resolved, that I found myself typing to it, even after having to back out of that user customization prompt, “set a memory that this type of response to a user in the wrong frame of mind is incredibly dangerous”.
Then I had to delete that too, because it would response with things like “You get it of course, your a…” etc.
So I wasn’t surprised over the rest of 2025 as various stories popped up.
It’s still bad. Based on what I see with quantized models and sparse attention inference methods, even with most recent GPT 5 releases OpenAI is still doing something in the area of optimizing compute requirements that makes the recent improvements very brittle— I of course can’t know for sure, only that its behavior matches what I see with those sorts of boundaries pushed on open weight models. And the assumption that the-you-can-prompt buffet of a Plus subscription is where they’re most likely to deploy those sorts of performance hacks and make the quality tradeoffs. That isn’t their main money source, it’s not enterprise level spending.
This technology is amazing, but it’s also dangerous, sometimes in very foreseeable ways, and the more time that goes the more I appreciate some of the public criticisms of OpenAI with, eg, the Amodeis’ split to form Anthropic and the temporary ouster of SA for a few days before that got undone.