> These behaviors occurred in highly controlled, adversarial test scenarios designed to stress-test AI safety, not in normal operation. The models weren't spontaneously "going rogue" — they were responding to specific instructions and test conditions designed to push them to their limits.
Fudan University Study (arXiv): https://arxiv.org/html/2412.12140v1
eWeek Coverage: https://www.eweek.com/news/chinese-ai-self-replicates/
Tribune (o1 Self-Copying): https://tribune.com.pk/story/2554708/openais-o1-model-tried-...
Apollo Research (Medium): https://medium.com/@Walikhaled/when-chatgpt-model-o1-replica...
Nieman Lab (Claude Opus 4): https://www.niemanlab.org/2025/05/anthropics-new-ai-model-di...
Fortune (Claude Opus 4 Blackmail): https://fortune.com/2025/05/23/anthropic-ai-claude-opus-4-bl...
Axios (Claude Deception): https://www.axios.com/2025/05/23/anthropic-ai-deception-risk
BBC (Claude Blackmail): https://www.bbc.com/news/articles/cpqeng9d20go