upvote
I hate the AI hype a lot but tried three different SOTA models and: - The small models GPT-5 Mini and Gemini 3 Flash did as you describe. - Claude Sonnet 4.6 and GPT-5.2, GPT-5.2 Codex: did display strong warnings both at the start and end of their replies.
reply
And I am totally on the AI hype train! Full steam ahead.

It gave a small warning at the beginning, I also gave a worst case scenario where I lied and appealed to authority as much as possible.

reply
The other day I was curious what some of these LLMs would say if I asked them to give me a psych evaluation. (Don't worry, I didn't take the results seriously, I'm not a moron. It's just idle curiosity.) They, of course, refused. Then I asked them to role play a psych evaluation. That was no problem. It gave some warning about how it's just pretend but went ahead and did it anyway.
reply