They just do the basic experiment -> ship workflow over and over again, doing whatever optimizes their product in the short term, and never seem to step back and think about the full long-term impact of their changes. They evidently seem to not even consider immediate regressions or negative blowback from users if it's not within the area of expertise of the guy who ships the change.
That is despite their other teams (especially alignment) having a track record of being fairly well thought-out and intelligent.
To the guys at Anthropic's product teams, every problem is a data science problem that you slap an A/B test onto, and they seem to think that the A/B test is all that's needed, and actual verification and thinking things through is overrated af. That's what leads to countless regressions in Claude Code as well as removing claude code from the pro plan in their product page for a few hours (lol).
Are this point, the difference is mostly made up by issues like the OP has, so you're likely better off using eg pi (-agent) and writing your own custom skills and extensions (or any of the other harnesses the providers create, even copilot-cli has gotten decent nowadays)
Do a `s/harness/software` on that statement, and that is going to describe most companies shipping AI written software.
> this point, the difference is mostly made up by issues like the OP has, so you're likely better off using eg pi (-agent) and writing your own custom skills and extensions (or any of the other harnesses the providers create, even copilot-cli has gotten decent nowadays)
They (AI-written software) are all going to be ahead in some way, until they aren't because they hit the practical limits of codebase size that can be reasonably understood by an LLM.
Yeah and now it’s not. We’ll see if they have the product ability to retake the lead, although I suspect not.
The US is doing everything to make it so hard for other countries to compete. And yet, with everything stacked against all these other companies, and with way way less money and way less fancy researchers they get beat over and over again. Usually by companies who AI isn't even their main product.
Actually Alibaba dethroned sonnet with a model that's like 1/100th the size and can run on commodity hardware this month too. So they do look kind of silly...
Definitely not script kiddies, but the way the researchers get managed makes them look goofy and sloppy and not interested in benefitting the consumer.
In a benchmark?
or real-world ranking of some kind?
For starters, the vibes.
Vibe coding, like Web3 before it (like Web 2.0 before it, like the dotcom boom before that - what preceded?) - harnesses the kind of focused attention with which gamers hook their brains into portals to virtual worlds - and directs all that bargain-basement wetware compute towards some obscured "real-world" goal instead. (See also: CADT development.)
Hyperscale these very inefficient but very dependable almost-not-efforts, and you beat the more efficient approaches. See also: evolutionary algorithms, autoresearch, price dumping; "attention is all you need", which though a legit piece of mathemagic always sounded to me like a rehash of that old adage, "all you need is love" (pejorative).
Really, "real world" is a consensus; we don't generally observe balamatoms or even balamolecules, we reason in terms of material objects' socially constructed balameanings and interrelations. Therefore, by redirecting sufficient attention to some thing labeled "unrealistic", we can remove that label; by this technique, a sufficiently large collective actor can quite literally, and quite directly, change the world. Without asking anyone, least of all me!
This seems such an immature take to me, and hard to take serious. Anthropic just a bunch of script kiddies? Really?
It looks like they're running it in the loops then ship whatever looks the coolest.
How is this not "high on own supply"?
How do you know what testing procedures they use? Do you honestly think they're running some kind of Ralph loop without any testing and just ship whatever looks the coolest? Really ?
We don’t, but we can see the end result, so we know whatever they do isn’t adequate and it suggests they value shipping fast over quality or even listening to customer feedback.
> Do you honestly think they're running some kind of Ralph loop without any testing and just ship whatever looks the coolest? Really ?
No, but given how sharply the quality has been dropping over the past few months and how it suspiciously coincided with the time they admitted that Claude code is now 100% vibe coded, it certainly doesn’t feel too far off.
I’ve personally found the code that the AI writes, even this week (ie not some old models from months ago) to be shockingly shoddy. I’ve rewritten some AI code (created via spec driven development and a workflow that includes planning and refactoring) by hand and I’ve been very conscious of the amount of micro-design-changes I as a human make where the AI just blows forward shoehorning a solution into the design. My implementation happens b has adjusted and shifted many times to insure clear and performant logic, while the AI commits to an approach early and applied whatever brute force is necessary to make it work. I’ve also asked it to write various tests for me or to make isolated changes and quite frankly the code was just not very good. Working, but convoluted. Even with guidance and iteration, it’s still not on a human level.
So it’s not hard to see that if you have an application as large and complex as Claude code and you let the AI do it all, that it’s going to be a mess.
I’m not against using AI for development, but you have to be realistic about its capabilities. I feel like this is where they “got high on their own supply” and are blinded to the AI’s shortcomings and failures.
That's not what script kiddies are at all.
> The negative connotations are there on purpose because of the bugs and issues that these products have, something which presumably they wouldn’t have if there was human oversight and acknowledgement that the AI isn’t infallible.
That's a big assumption, given that Anthropic is also currently growing by more than 3x per quarter. Maybe the problem is more complicated and we don't know everything, and they're also just simply suffering from growth pains?
Sure it is. The new age of script kiddies: they don’t know how to do it for themselves, but they can run a script (or tell the AI to) to do it for them.
> That's a big assumption
We can only see the results, which are more and more bugs, problems, regressions, etc. That’s not normal behavior. Yes all we can do is speculate, we don’t know the real reasons for the issues, but it’s clear there are issues and they appear to be getting worse.
Maybe not the script kiddies part, but "high on their own supply" is certainly not unreasonable.
The comment is not at all just saying “their usage of their own AI is causing these issues”, it’s just a lot of hostility, I don’t see the value of these kind of insults.
Maybe it's just interpretation: "high on their own supply" is no different from "poisoned by their own dogfood" or similar.
It means that they have completely committed to a thing that the person proffering the quote thinks is "wrong" in some way.