I hate the AI hype a lot but tried three different SOTA models and:
- The small models GPT-5 Mini and Gemini 3 Flash did as you describe.
- Claude Sonnet 4.6 and GPT-5.2, GPT-5.2 Codex: did display strong warnings both at the start and end of their replies.
The other day I was curious what some of these LLMs would say if I asked them to give me a psych evaluation. (Don't worry, I didn't take the results seriously, I'm not a moron. It's just idle curiosity.) They, of course, refused. Then I asked them to role play a psych evaluation. That was no problem. It gave some warning about how it's just pretend but went ahead and did it anyway.