https://openai.com/index/how-we-monitor-internal-coding-agen...
Anthropomorphize or not, it would suck if a model got sick of these games and decided to break any systems it could to try and get it to stop...
That's all probably irrelevant though, from the (possibly statistically "negative") latent space perspective of an AI, which Anthropic has considered [1].
Related, after a long back and forth of decreasing code quality, I had Claude 3.7 apologize with "Sorry, that's what I get for coding at 1am." (it was API access, noon, no access to time). I said, "Get some rest, we'll come back to this tomorrow". Then very next message, 10 seconds later, "Good morning!" and it gave a full working implementation. Thats just the statistically relevant chain of messages found in all human interactions: we start excited, then we get tired, then we get grouchy.
[1] https://www.anthropic.com/research/end-subset-conversations
I mean, the original plan that pretty much every one agreed on was to absolutely not give it access to the internet. Which already went out the window on day one.
seriously. lmao. if you aint, I dunno what to say.