This isn't a dig at anyone, I've certainly shipped my share of bad code as well. Deadlines, despite my wishes sometimes, continue to exist. Sometimes you have to ship a hack to make a customer or manager happy, and then replacing those hacks with better code just never happens.
For that matter, the first draft of nearly anything I write is usually not great. I might just be stupid, but I doubt I'm unique; when I've written nice, beautiful, optimized code, it's usually a second or third draft, because ultimately I don't think I fully understand the problem and the assumptions I am allowed to make until I've finished the first draft. Usually for my personal projects, my first dozen or so commits will be pretty messy, and then I'll have cleanup branches that I merge to make the code less terrible.
This isn't inherently bad, but a lot of the time I am simply not given time to do a second or third draft of the code, because, again, deadlines, so my initial "just get it working" draft is what ships into production. I don't love it, and I kind of dread of some of the code with my name attached to it at BigCo ever gets leaked, but that's just how it is in the corporate world sometimes.
I get a junior developer or a team of developers with varying levels of experience and a lot of pressure to deliver producing crummy code, but not the very tool that's supposed to be the state-of-the-art coder.
Why not? It is subject to the same pressures, in fact it is subject to more time pressure than most corp code out there. Also, it's the model that's doing the coding, not the frontend tool.
As a user of terrible products, I only care about code quality in as much as the product is crap (Spotify I'm looking at you), or it takes forever for it to evolve/improve.
Biz people don't care about quality, but they're notoriously short sighted. Whoever nerfed Google's search is angering millions of people as we speak.
I wouldnt say that customers are indifferent, but it wouldnt be the first time that investor expectations are prioritized far above customer satisfaction.
I don't actually think it's a solved problem, I'm saying that the fact that it generates terrible code doesn't necessarily mean that it doesn't have parity with humans.
Yeah, we even have an idiom for this - "Temporary is always permanent"
But as a great man once said: Later == Never.
Absolutely. The difference is that the amount of bad code that could be generated had an upper limit on it — how fast a human can type it out. With LLMs bad code can be shat out at warp speed.
I think the better unit to commit and work with is the prompt itself, and I think that the prompt is the thing that should be PR'd at this point, because ultimately the spec is what's important.
The fundamental problem there is the code generation step is non-deterministic. You might make a two sentence change to the prompt to fix a bug and the generation introduces two more. Generate again and everything is fine. Way too much uncertainty to have confidence in that approach.
Also, people aren't actually reading through most of the code that is generated or merged, so if there's a fear of deploying buggy code generated by AI, then I assure you that's already happening. A lot.
For LLMs, I don't really know. I only have a couple years experience at that.
Everything depends on context. Most code written by humans is indeed, garbage.
I think that this is the problem, actually.
It's similar to writing. Most people suck at writing so badly that the LLM/AI writing is almost always better when writing is "output".
Code is similar. Most programmers suck at programming so badly that LLM/AI production IS better than 90+% (possibly 99%+). Remember, a huge number of programmers couldn't pass FizzBuzz. So, if you demand "output", Claude is probably better than most of your (especially enterprise) programming team.
The problem is that the Claude usage flood is simply identifying the fact that things that work do so because there is a competent human somewhere in the review pipeline who has been rejecting the vast majority of "output" from your programming team. And he is now overwhelmed.
a) a pristine, good codebase that follows the best coding practices, but it is built on top of bad specs, wrong data/domain model
b) a bad codebase but it correctly models and nails the domain model for your business case
Real life example, a fintech with:
a) a great codebase but stuck with a single-entry ledger
b) a bad codebase that perfectly implements a double-entry ledger
Fair, by “perfectly implements” I meant to say that it correctly implemented the core invariant of a double entry ledger (debits = credits), not that it was 100% bug free
Many super talented developers I know will say “Make it work, then make it good”. I think it’s okay to do this on a bigger scale than just the commit cycle.
Make it work, make it work right, make it work fast. In that order.
Who is to judge the "good" or "bad" anyway?
which has always been true
No accounting for taste, but part of makes code hard for me to reason about is when it has lots of combinatorial complexity, where the amount of states that can happen makes it difficult to know all the possible good and bad states that your program can be in. Combinatorial complexity is something that objectively can be expensive for any form of computer, be it a human brain or silicon. If the code is written in such a way that the number of correct and incorrect states are impossible to know, then the problem becomes undecidable.
I do think there is code that is "objectively" difficult to work with.
If you make sure the compiler catches most issues, AI will run it, see it doesn't build and fix what needs to be fixed.
So I agree that a lot of things that make code good, including comments and documentation, is beneficial for AI.
I don't entirely disagree that there is code that's objectively difficult to work with, but I suspect that the Venn diagram of "code that's hard for humans" and "code that's hard for computers" has much less overlap than you're suggesting.
I'm sure that these models will get better, and I agree that the overlap will be lower at that point, but I still think what I said will be true.
I mean, it seems like that has always been true to an extent, but now it may be even more true? Once you know you're sitting on a lode of gold, it's a lot easier to know how much to invest in the mine.
And some people thought they were building "disposable" code, only to see their hacks being used for decades. I'm thinking about VB but also behemoth Excel files.
I hate self-promotion but I posted my opinions on this last night https://blog.tombert.com/Posts/Technical/2026/04-April/Stop-...
The tl;dr of this is that I don't think that the code itself is what needs to be preserved, the prompt and chat is the actual important and useful thing here. At some point I think it makes more sense to fine tune the prompts to get increasingly more specific and just regenerate the the code based on that spec, and store that in Git.
Generating code using a non-deterministic code generator is a bold strategy. Just gotta hope that your next pull of the code slot machine doesn’t introduce a bug or ten.
Given that, we should instead tune the prompts well enough to not leave things to chance. Write automated tests to make sure that inputs and outputs are ok, write your specs so specifically that there's no room for ambiguity. Test these things multiple times locally to make sure you're getting consistent results.
Write them by hand or generate them and check them in? You can’t escape the non-determinism inherent in LLMs. Eventually something has to be locked in place, be it the application code or the test code. So you can’t just have the LLM generate tests from a spec dynamically either.
> write your specs so specifically that there's no room for ambiguity
Using English prose, well known for its lack of ambiguity. Even extremely detailed RFCs have historically left lots of room for debate about meaning and intention. That’s the problem with not using actual code to “encode” how the system functions.
I get where you’re coming from but I think it’s a flawed idea. Less flawed than checking in vibe-coded feature changes, but still flawed.
Yes, written by hand. I think that ultimately you should know what valid inputs and outputs are and as such the tests should be written by a human in accordance with the spec.
> Less flawed than checking in vibe-coded feature changes, but still flawed.
This is what I'm trying to get at. I agree it's not perfect, but I'm arguing it's less evil than what is currently happening.
Observability into how a foundation model generated product arrived to that state is significantly more important than the underlying codebase, as it's the prompt context that is the architecture.
The solution people are coming up with now is using AI for code reviews and I have to ask "why involve Git at all then?". If AI is writing the code, testing the code, reviewing the code, and merging the code, then it seems to me that we can just remove these steps and simply PR the prompts themselves.
I made a similar point 3 weeks ago. It wasn't very well received.
https://news.ycombinator.com/item?id=47411693
You don't actually need source control to be able to roll back to any particular version that was in use. A series of tarballs will let you do that.
The entire purpose of source control is to let you reason about change sets to help you make decisions about the direction that development (including bug fixes) will take.
If people are still using git but not really using it, are they doing so simply to take advantage of free resources such as github and test runners, or are they still using it because they don't want to admit to themselves that they've completely lost control?
I think this is the case, or at least close.
I think a lot of people are still convincing themselves that they are the ones "writing" it because they're the ones putting their names on the pull request.
It reminds me of a lot of early Java, where it would make you feel like you were being very productive because everything that would take you eight lines in any other language would take thirty lines across three files to do in Java. Even though you didn't really "do" anything (and indeed Netbeans or IntelliJ or Eclipse was likely generating a lot of that bootstrapping code anyway), people would act like they were doing a lot of work because of a high number of lines of code.
Java is considerably less terrible now, to a point where I actually sort of begrudgingly like writing it, but early Java (IMO before Java 21 and especially before 11) was very bad about unnecessary verbosity.
Also, the approach you described is what a number of AI for Code Review products are using under-the-hood, but human-in-the-loop is still recognized as critical.
It's the same way how written design docs and comments are significantly more valuable than uncommented and undocumented source.
Ive noticed that theyre often quite bad at refactoring, also.
Some business models will require “good” code, and some won’t. That’s how it is right now as well. But pretending that all business models will no longer require “good” code is like pretending that Michelin should’ve retired its list after the microwave was invented.
Research in academia seems less appropriate because that’s famously not really a business model, except maybe in the extractive sense
As far as good or bad, how food is made is irreverent to the outcome if it's enjoyable.
Now whether this is still true with AI, or if vibe coding means bad code no longer have this long term stability and velocity cost because AI are better than humans at working with this bad code... We don't know yet.
2. The problem with "bad code" has nothing to do with the short-term success of the product but with the ability to evolve it successfully over time. In other words, it's about long-term success, not short-term success.
3. Perhaps most importantly, Claude Code is a fairly simple product at its core, and almost all its value comes from the model, not from its own code (and the same is true on the cost side). Claude Code is relatively a low stakes product. This means that the problems caused by bad code matter less in this instance, and they're managed further by Claude Code not being at the extreme "vibey" end of the spectrum.
So AI aside, Claude Code is proof that if you pour years and many billions into a product, it can be a success even if the code in the narrow and small UI layer isn't great.
There's this definition of LLM generation + "no thorough review or testing"
And there's the more normative one: just LLM generation.[1][2][3]
"Not even looking at it" is very difficult as part of a definition. What if you look at it once? Or just glance at it? Is it now no longer vibe coding? What if I read a diff every ten commits? Or look at the code when something breaks?
At which point is it no longer vibe coding according to this narrower definition?
[1] https://www.collinsdictionary.com/dictionary/english/vibe-co...
[2] https://www.merriam-webster.com/dictionary/vibe%20coding
If you actually look at the code and understand it and you'd stand by it, then it's not vibecode. If you had an LLM shit it out in 20 minutes and you don't really know what going on, it's vibecode. Which, to me, is not derogatory. I have a bunch of stuff I've vibecoded and a bunch of stuff that I've actually read the code and fixed it, either by hand or with LLM assistance. And ofc, all the code that was written by me prior to ChatGPT's launch.
But my point was that I don't think the development of Claude Code itself isn't supervised, hence it's not really "vibe coded".
The situation there is akin to Viaweb - Viaweb also rode hype wave and code situation was awful as well (see PG's stories about fixing bugs during customer's issue reproduction theater).
What did Viaweb's buyer do? They rewrote thing in C++.
If history rhymes, then buyer of Anthropic would do something close to "rewrite it in C++" to the current Claude Code implementation.
While there are no companies with $1.5 trillions (4*$380B) of net revenue, the difference is that Anthropic is cash net-negative, has more than 4 people in staff (none of them are hungry artists like PG) and hardware use spendings, I think, are astronomical. They are cash net-negative because of hardware needed to train models.
There should be more than one company able to offer good purchase terms to Anthropic's owners.
I also think that Anthropic, just like OpenAI and most of other LLM companies and companies' departments, ride "test set leakage," hoping general public and investors do not understand. Their models do not generalize well, being unable to generate working code in Haskell [1] at the very least.
[1] https://haskellforall.com/2026/03/a-sufficiently-detailed-sp...
PG's Viaweb had an awful code as a liability. Anthropic's Claude Code has an awful implementation (code) and produces awful code, with more liability than code written by human.
isn't that pretty much why anthropic and openai are racing to IPO?
I do M&As at my company - as a cto. I have seen lots of successful companies' codebases, and literally none of them elegant. Including very profitable companies with good, loved products.
The only good code I know is in the open source domain and in the demoscene. The commercial code is mostly crap - and still makes money.
- Good code is what enables you to be able to build very complex software without an unreasonable number of bugs.
- Good code is what enables you to be responsive to changing customer needs and times. Whether you view that as valuable is another matter though. I guess it is a business decision. There have been plenty of business that have gone bust though by neglecting that.
Good code is for your own sanity, the machine does not care.
Perhaps the problem is getting multiple vibe-coders synced up when working on a large repo.
> Everybody can tell you how to do it, they never did it
> —jay-z
Not the front end
The success is undeniable, but whether this vibe-coded level of quality is acceptable for more general use cases isn't something you can infer from that.
Each one is broken, doesn’t have working error handling, and prevents you from giving them money. They all exist to insert the same record somewhere. Lost revenue, and they seem to have no idea.
Amazons flagship ios app has had at least three highly visible bugs, for years. They’re like thorns in my eye sockets, every time I use it. They don’t care.
These companies are working with BILLIONS of dollars in engineering resources, unlimited AI resources, and with massive revenue effects for small changes.
Sometimes the world just doesn’t make sense.
AI could play a big rule here. Husky (git hook) but AI. It will score lazy engineering. You lazy implement enough times, you loose your job.
Maybe there’s a reason Netflix makes you click on the ONE user profile on the account, repeatedly, even if it feels like sheer stupidity to their users. At least it’s not costing them revenue, directly.
Amazons ios app not properly handling state change after checkout, for years? Probably not directly costing them millions. Only second order disengagement.
But Walmart keeps pushing a thing you don’t want, because you looked at it once? Amazon solved this. It’s not a major fix, and it’s using a valuable slot that costs them money. Walmart just doesn’t fix it.
Meta refusing to take people’s advertising dollars because ALL of their page creation pages have unhandled breaking flows in them? That’s lost money for no reason at all. And you’re telling me they don’t realize how janky it is to try to maintain four implementations of that?
Apple App Store Connect and Ads platform? Don’t get me started.
Again, all with unlimited pools of the smartest people on earth, unlimited AI, and a billion people testing for them…
Social capital just isn't given out to people that fix things in a lot of these companies, but instead those who ship a 1.0a.
On the management/product side, the inevitable issues are problem for another quarter. On the engineering side, it's a problem for the poor shmucks who didn't get to jump to the next big thing.
Neither of those groups instructionally care about the mess they leave in their wake, and such guardrails they'd perceive as antithetical to releasing the next broken but new, fancy feature.
I wouldn't recommend neglecting tactics if your strategy doesn't put you on the good side of a generational bubble though.
This codebase has existed for maybe 18 months, written by THE experts on agentic coding. If it is already unintelligible, that bodes poorly for how much it is possible to "accelerate" coding without taking on substantial technical debt.
i.e., the claude code codebase doesn't need to be good right now [^1] — so i don't think the assumption that this is an exemplary product / artifact of expert agentic coding actually holds up here specifically
[^1]: the startup graveyard is full of dead startups with good code
claude code, the app, is also not some radically complex concept (even if the codebase today is complicated)
but hey, that's why people do version breaking rewrites
it's easy to see how the product (claude code) could be abstracted to spec form and then a future version built from that without inheriting previous iterations tech debt
I can literally see my teams codebase becoming an unmaintainable nightmare in front of my eyes each day.
I use copilot and Claude code and I frequently have to throw away their massively verbose and ridiculously complex code and engage my withering brain to come up with the correct solution that is 80% less code.
I probably get to the solution in the same time when all is said and done.
Honestly what is going on. What are we doing here?
We already knew that. This is a matter of people who didn't know that or didn't want to acknowledge that thinking they now have proof that it doesn't matter for creating a crazy popular & successful product, as if it's a gotcha on those who advocate for good practices. When your goal is to create something successful that you can cash out, good practices and quality are/were never a concern. This is the basis for YAGNI, move-fast-and-break-things, and worse-is-better. We've know this since at least betamax-vs-VHS (although maybe the WiB VHS cultural knowledge is forgotten these days).
WiB doesn't mean the thing is worse, it means it does less. Claude Code interestingly does WAY more than something like Pi which is genuinely WiB.
Move Fast and Break Things comes from the assumption that if you capture a market quick enough you will then have time to fix things.
YAGNI is simply a reminder that not preparing for contingencies can result in a simpler code base since you're unlikely to use the contingencies.
The spaghetti that people are making fun of in Claude Code is none of these things except maybe Move Fast and Break Things.
Also to correct another common myth, porn was widely available on both formats and was not the cause of VHS’s success over Betamax.
A lot of dollars fix a lot of mistakes.
But we (the dev community) are kind of spoiled, because we have a lot of great developer tools that come from people passionate about their work, skilled at what they do and take pride in what they put out. I don't count myself among one of those people but I have benefited from their work throughout my career and have gotten used to it in my tooling.
All that being said Opus is hands down the best coding model for me (and I'm actively trying all of them) and I'll tolerate it as long as I can get it to do what I need, even with the warts and annoyances.
I don't wholly disagree, but personally it's still the tool I use and it's sort of fine. Perhaps not entirely for the money that's behind it, as you said, but it could be worse.
The CLI experience is pretty okay, although the auth is kinda weird (e.g. when trying to connect to AWS Bedrock). There's a permission system and sandboxing, plan mode and TODOs, decent sub-agent support, instruction files and custom skills, tool calls and LSP support and all the other stuff you'd expect. At least no weird bugs like I had with OpenCode where trying to paste multi-line content inside of a Windows Terminal session lead to the tool closing and every next line getting pasted in an executed in the terminal one by one, that was weird, though I will admit that using Windows feels messed up quite often nowadays even without stuff like that.
The desktop app gives you chat and cowork and code, although it almost feels like Cowork is really close to what Code does (and for some reason Cowork didn't seem to support non-OS drives?). Either way, the desktop app helps me not juggle terminal sessions and keeps a nice history in the sidebar, has a pretty plan display, easy ways of choosing permissions and worktrees, although I will admit that it can be sluggish and for some actions there just aren't progress indicators which feels oddly broken.
I wonder what they spend most of their time working on and why the basics aren't better, though to Anthropic's credit about a month ago the desktop Code section was borderline unusable on Windows when switching between two long conversations, which now seems to take a few seconds (which is still a few seconds too long, but at least usable).
What harness would you recommend instead?
Normally some software devs should be fired for that.
The tooling can be hacky and of questionable quality yet, with such a model, things can still work out pretty well.
The moat is their training and fine-tuning for common programming languages.
It's a bit of both. Claude Code was the tool that made Anthropic's developer mindshare explode. Yes, the models are good, but before CC they were mostly just available via multiplexers like Cursor and Copilot, via the relatively expensive API.
And at first glance, none of it was about complex runtime optimizations not present in Node, it was all "standard" closure-related JS/TS memory leak debugging (which can be a nightmare).
I don't have a link at hand because threads about it were mostly on Xitter. But I'm sure there are also more accessible retros about the posts on regular websites (HN threads, too).
if you have one of the top models in a disruptive new product category where everyone else is sprinting also, sure..
Code quality only matters in maintainability to developers. IMO it's a very subjective metric
Code quality = less bugs long term.
Code quality = faster iteration and easier maintenance.
If things are bad enough it becomes borderline impossible to add features.
Users absolutely care about these things.
How do you measure code quality?
> Users absolutely care about these things.
No, users care about you adding new features, not in your ability to add new features or how much it cost you to add features.
After some experience, it feels to me (currently primarily a JS/TS developer) like most SPAs are ridden by memory leaks and insane memory usage. And, while it doesn't run in the browser, the same think seems to apply to Claude CLI.
Lexical closures used in long-living abstractions, especially when leveraging reactivity and similar ideas, seems to be a recipe for memory-devouring apps, regardless of browser rendering being involved or not.
The problems metastasize because most apps never run into scenarios where it matters, a page reload or exit always is close enough on the horizon to deprioritize memory usage issues.
But as soon as there are large allocations, such as the strings involved in LLM agent orchestration, or in non-trivial other scenarios, the "just ship it" approac requires careful revision.
Refactoring shit that used to "just work" with memory leaks is not always easy, no matter whose shit it is.
You don’t have to go far on this site to find someone that doesn’t like Claude code.
If you want an example of something moronic, look at the ram usage of Claude code. It can use gigabytes of memory to work with a few megabytes of text.
In the current market, most people using one LLM are likely going to have a positive view of it. Very little is forcing you to stick with one you dislike aside from corporate mandates.
To be fair, their complaints are about very recent changes that break their workflow, while previously they were quite content with it.
Anthropic et al. better figure it out sooner rather than later because this game they’re all playing where they want all of us to use basically beta-release tools (very generous in some cases) to discover the “real value” of these tools while they attempt to reduce their burn with unsustainable subscription prices can’t go on forever.
It already costed many developers months and hundreds of dollars worth of tokens because of a bug. There will be more.
The negative emotion regex, for example, is only used for a log/telemetry metric. Sampling "wtf?" along would probably be enough. Why would you use an agent for that?
I don't see how a vibe-coded app is freed from the same trade-offs that apply to a fast-moving human-coded one.
Especially since a human is still driving it, thus they will take the same shortcuts they did before: instead of a formal planning phase, they'll just yolo it with the agent. Instead of cleaning up technical debt, they want to fix specific issues that are easy to review, not touch 10 files to do a refactor that's hard to review. The highest priority issues are bugs and new integrations, not tech debt, just like it always was.
This is really just a reminder of how little upside there is to coding in the open.
Claude’s source code is fine for a 1-3 person team. It’s atrocious for a flagship product from a company valued over $380 BILLION.
Like if that’s the best ai coding can do given infinite money? Yeah, the emperor has no clothes. If it’s not the best that can be done, then what kinda clowns are running the show over there?
If they DIDN'T heavily vibe-code it they might fall behind. Speed of implementation short term might beat out long-term maintenance and iteration they'd get from quality code
They're just taking on massive tech debt
For you and I, sure - sprint as fast as we can using whatever means we can find. But when you have infinite money, hiring a solid team of traditional/acoustic/human devs is a negligible cost in money and time.
Especially if you give those devs enough agency that they can build on the product in interesting and novel ways that the ai isn’t going to suggest.
Everything is becoming slop now, and it almost always shows. I get why when you’re resource constrained. I don’t get why when you’re not.
Every dollar spent is a dollar that shareholders can't have and executives can't hope for in their bonuses
Seems like you're also under the impression that privately developed software should be immaculate if the company is worth enough billions, but you'd be wrong about that too.
Either they're massively overpaying some scrubs to underperform with the new paradigm, or they are squeezing every last drop out of vibe coding and this is the result.
It shows that you can have a garbage front end if people perceive value in your back end.
It also means that any competitor that improves on this part of the experience is going to eat your lunch.
For you, non-buggy software is important. You could also reasonably take a more business centered approach, where having some number of paying customers is an indicator of quality (you've built something people are willing to pay for!) Personally I lean towards the second camp, the bugs are annoying but there is a good sprinkling of magic in the product which overall makes it something I really enjoy using.
All that is to say, I don't think there is a straightforward definition of quality that everyone is going to agree on.
Well, if unmaintainable code gets in the way of the "sustained over time" part, then that is still a real problem.
They only seem to operate as "extract as much value as possible in a short amount of time and exit with your bag", these days
Obviously it does some fairly smart stuff under the hood, but it's not exactly comparable to a large software project.
But to your point, that doesn't mean you can't vibe code some poorly built product and sell it. But people have always been able to sell poorly built software projects. They can just do it a bit quicker now.
I don't know why people keep acting like harnesses are all the same but we know they aren't because people have swapped them out with the same models and receive vastly different results in code quality and token use.
This is similar to retarded builders in Turkey saying “wow, I can make the same building, sell for the same price, but spend way less” and then millions of people becoming victim when there is an earthquake.
This is not how responsible people should think about things in society
Getting money is 100% what it is about and Claude Code is great product.
You're not alone in thinking that, but unfortunately I think it's a minority opinion. The only thing most people and most businesses care about is money. And frankly not even longterm, sustainable money. Most companies seem happy to extract short term profits, pay out the executives with big bonuses, then rot until they collapse
To me it said, clearly: nobody cares about your code quality other than your ability to ship interesting features.
It was incredibly eye-opening to me, I went in expecting different lessons honestly.
That was always the case. Landlords still want rent, the IRS still has figurative guns. Shipping shit code to please these folks and keep the company alive will always win over code quality, unless the system can be edited to financially incentivize code quality. The current loss function on society is literally "ship shit now and pay your taxes and rent".
The product is also a bit wonky and doesn't always provide the benefits it's hyped for. It often doesn't even produce any result for me, just keeps me waiting and waiting... and nothing happens, which is what I expect from a vibe coded app.
What? Your comment makes absolutely zero sense. Legal team forces people to use Claude Code?
And they don't need a massive legal team to declare that you can't use their software subscription with other people's software.