upvote
Maybe they have vibe-coded their own stack!

But less tongue-in-cheek, yeah Anthropic definitely has reliability issues. It might be part of trying to move fast to stay ahead of competitors.

reply
They have. Claude Code was their internal dev tool, and it shows.
reply
And yet even dogfooding their own product heavily, it's still a giant janky pile. The prompt work is solid, the focus on optimizing tools was a good insight, and the model makes a good agent, but the actual claude code software is pretty shameful to be the most viable product of a billion dollar company.
reply
What artifact are you evaluating to come to this conclusion? Is the implementation available?
reply
The source for one of the initial versions got leaked a while ago and let’s say it’s not very good architecturally speaking, specifically when compared with the Gemini CLI, which it open source.

The point of Claude Code is deep integration with the Claude models, not the actual CLI as a piece of software, which is quite buggy (it also has some great features, of course!)

At least for me, if I didn’t have to put in the work to modify the Gemini CLI to work reliably with Claude (or at least to get a similar performance), I wouldn’t use Claude Code CLI (and I say this while paying $200 per month to Anthropic because the models are very good)

reply
A. I use it daily to take advantage of the plan inference discount.

B. Let's just say I didn't write the most robust javascript decompilation/deminification engine in existence solely as an academic exercise :)

reply
The tongue-in-cheek jokes are kind of obvious, but even without the snark I think it is worth asking why the supposed 100x productivity boost from Claude Code I keep hearing about hasn't actually resulted in reliability improvements, even from developers who presumably have effectively-unlimited token budgets to spend on improving their stack.
reply
I love how people like Simon Willison and Pete Steinberger spend all this effort trying to be skeptical of their own experiences and arrive at nuanced takes like “50% more productive, but that’s actually a pretty big deal, but the nature of the increase is complicated” and y’all just keep repeating the brainrotted “100x, juniors are cooked” quote you heard someone say on LinkedIn.
reply
AI gives you what you ask for. If you don't understand your true problems, and you ask it to solve the wrong problems, it doesn't matter how much compute you burn, you're still gonna fail.
reply
I've been paying for the $20/m plan from Anthropic, Google, and OpenAI for the past few months (to evaluate which one I want to keep and to have a backup for outages and overages).

Gemini never goes down, OpenAI used to go down once in a while but is much more stable now, and Anthropic almost never goes a full week without throwing an error message or suffering downtime. It's a shame because I generally prefer Claude to the others.

reply
Same here, but for API access to the big three instead of their web/app products, and Gemini also shows greater uptime.

But even when the API is up, all three have quite high API failure rates, such as tool calls not responding with valid JSON, or API calls timing out after five minutes with no response.

Definitely need robust error handling and retries with exponential backoff because maybe one in twenty-five calls fails and then succeeds on retry.

reply
Invalid JSON and other formatting issues is more towards the model behavior I would say since no model guarantees that level of conformance to the schema. I wouldn't necessarily club it with the downtime of the API.
reply
A lot of people might be discovering their preference for Claude.
reply
All the AI labs are but Anthropic is the worst. Anyone serious about running Claude in prod is using Bedrock or Vertex. We've been pretty happy with Vertex.
reply
I wonder why they haven't invested a lot more in the inference stack? Is it really that different from Google, OpenAI and other open weight models?
reply
Have you used Bitbucket?
reply
A core research library for MATLAB I used in a course project used to be on BitBucket, though thankfully didn't have to deal with a lot of collaboration there.
reply
OpenAI used to be just as bad if not worse.

But they've stabilized the past 5 months.

reply