upvote
The problem is degradation. It was working much better before. There are many people (some example of a well know person[0]), including my circle of friends and me who were working on projects around the Opus 4.6 rollout time and suddenly our workflows started to degrade like crazy. If I did not have many quality gates between an LLM session and production I would have faced certain data loss and production outages just like some famous company did. The fun part is that the same workflow that was reliably going through the quality gates before suddenly failed with something trivial. I cannot pinpoint what exactly Claude changed but the degradation is there for sure. We are currently evaling alternatives to have an escape hatch (Kimi, Chatgpt, Qwen are so far the best candidates and Nemotron). The only issue with alternatives was (before the Claude leak) how well the agentic coding tool integrates with the model and the tool use, and there are several improvements happening already, like [1]. I am hoping the gap narrows and we can move off permanently. No more hoops, you are right, I should not have attempted to delete the production database moments.

https://x.com/theo/status/2041111862113444221

https://x.com/_can1357/status/2021828033640911196

reply
Same as how I expect a coin to come up heads 50% of the time.
reply
If you get consistently nowhere near 50% then surely you know you're not throwing a fair coin? What would complaining to the coin provider achieve? Switch coins.

*typo

reply
Well I'm paying the coin to be near 50% and the coin's PM is listening to customers, so that's why.
reply
The coin's PM is spamming you trivial gaslighting corporate slop, most of it barely edited.
reply
> how you expect a stochastic model [...] is supposed to be predictable in its behavior.

I used it often enough to know that it will nail tasks I deem simple enough almost certainly.

reply