Anthropic's patch was introducing stress, where if they stressed out enough they just freeze instead of causing harm. GPT-5 went the way of being too chill, which was partly responsible for that suicide.
Good reading: https://www.anthropic.com/research/agentic-misalignment