upvote
Eight years of wanting, three months of building with AI

(lalitm.com)

Refreshing to see an honest and balanced take on AI coding. This is what real AI-assisted coding looks like once you get past the initial wow factor of having the AI write code that executes and does what you asked.

This experience is familiar to every serious software engineer who has used AI code gen and then reviewed the output:

> But when I reviewed the codebase in detail in late January, the downside was obvious: the codebase was complete spaghetti14. I didn’t understand large parts of the Python source extraction pipeline, functions were scattered in random files without a clear shape, and a few files had grown to several thousand lines. It was extremely fragile; it solved the immediate problem but it was never going to cope with my larger vision,

Some people never get to the part where they review the code. They go straight to their LinkedIn or blog and start writing (or having ChatGPT write) posts about how manual coding is dead and they’re done writing code by hand forever.

Some people review the code and declare it unusable garbage, then also go to their social media and post how AI coding is completely useless and they’re not going to use it for anything.

This blog post shows the journey that anyone not in one of those two vocal minorities is going through right now: A realization that AI coding tools can be a large accelerator but you need to learn how to use them correctly in your workflow and you need to remain involved in the code. It’s not as clickbaity as the extreme takes that get posted all the time. It’s a little disappointing to read the part where they said hard work was still required. It is a realistic and balanced take on the state of AI coding, though.

reply
+1

I’ve been driving Claude as my primary coding interface the last three months at my job. Other than a different domain, I feel like I could have written this exact article.

The project I’m on started as a vibe-coded prototype that quickly got promoted to a production service we sell.

I’ve had to build the mental model after the fact, while refactoring and ripping out large chunks of nonsense or dead code.

But the product wouldn’t exist without that quick and dirty prototype, and I can use Claude as a goddamned chainsaw to clean up.

On Friday, I finally added a type checker pre-commit hook and fixed the 90 existing errors (properly, no type ignores) in ~2 hours. I tried full-agentic first, and it failed miserably, then I went through error by error with Claude, we tightened up some exiting types, fixed some clunky abstractions, and got a nice, clean result.

AI-assisted coding is amazing, but IMO for production code there’s no substitute for human review and guidance.

reply
I’ve found that LLMs will frequently do extremely silly things that no person would do to make typescript code pass the typechecker.
reply
You need to very specific and also question the output if it does something insane
reply
My process: start ideating and get the AI to poke holes in your reasoning, your vision, scalability, etc. do this for a few days while taking breaks. This is all contained in one Md file with mermaid diagrams and sections.

Then use ideation to architect, dive into details and tell the AI exactly what your choices are, how certain methods should be called, how logging and observability should be setup, what language to use, type checking, coding style (configure ruthless linting and formatting before you write a single line of code), what testing methodology, framework, unit, integration, e2e. Database, changes you will handle migrations, as much as possible so the AI is as confined as possible to how you would do it.

Then, create a plan file, have it manage it like a task list, and implement in parts, before starting it needs to present you a plan, in it you will notice it will make mistakes, misunderstand some things that you may me didn’t clarify before, or it will just forget. You add to AGENTS.md or whatever, make changes to the ai’s plan, tell it to update the plan.md and when satisfied, proceed.

After done, review the code. You will notice there is always something to fix. Hardcoded variables, a sql migration with seed data that should actually not be a migration, just generally crazy stuff.

The worst is that the AI is always very loose on requirements. You will notice all its fields are nullable, records have little to no validation, you report an error when testing and it tried to solve it with an brittle async solution, like LISTEN/NOTIFY or a callback instead of doing the architecturally correct solution. Things that at scale are hell to debug, especially if you did not write the code.

If you do this and iterate you will gradually end up with a solid harness and you will need to review less.

Then port it to other projects.

reply
Fwiw, the article mirrors my experience when I started out too, even exactly with the same first month of vibecoding, then the next project which I did exactly like he outlined too.

Personally, I think it's just the natural flow when you're starting out. If he keeps going, his opinion is going to change and as he gets to know it better, he'll likely go more and more towards vibecoding again.

It's hard to say why, but you get better at it. Even if it's really hard to really put into words why

reply
Given how addictive vibecoding is, I think it's very hard to be objective about the results if you are involved in the process.
reply
You can’t put it into words? Why? Perhaps you haven’t looked at it objectively?

It may actually be true. Your feeling might be right - but I strongly caution you against trusting that feeling until you can explain it. Something you can’t explain is something you don’t understand.

reply
really?

have you ever learned a skill? Like carving, singing, playing guitar, playing a video game, anything?

It's easy to get better at it without understanding why you're better at it. As a matter of fact, very very few people master the discipline enough to be able to grasp the reason for why they're actually better

Most people just come up with random shit which may or may not be related. Which I just abstained from.

reply
You can get better at something without understanding why, but you should be able to think about it and determine why fairly easily.

This is something everyone who cares about improving in a skill does regularly - examine their improvement, the reasons behind it, and how to add to them. That’s the basis of self-driven learning.

reply
Not really. I can obviously say something, like you learn which features the models are able to actually implement, and you learn how to phrase and approach trickier features to get the model too do what you want.

And that's not really explainable without exploring specific examples. And now we're in thousands of words of explanation territory, hence my decision to say it's hard to put it into words.

reply
I think you’re handwaving away vague, ungrounded intuition and calling it learning.

For instance, if I say “I noticed I run better in my blue shoes than my red shoes” I did not learn anything. If I examine my shoes and notice that my blue shoes have a cushioned sole, while my red shoes are flat, I can combine that with thinking about how I run and learn that cushioned soles cause less fatigue to the muscles in my feet and ankles.

The reason the difference matters is because if I don’t do the learning step, when buy another pair of blue shoes but they’re flat soled, I’m back to square one.

Back to the real scenario, if you hold on to your ungrounded intuition re what tricks and phrasing work without understanding why, you may find those don’t work at all on a new model version or when forced to change to a different product due to price, insolvency, etc.

reply
Agree. This is such a good balanced article. The only things that still make the insights difficult to apply to professional software development are: this was greenfield work and it was a solo project. But that’s hardly the author’s fault. It would however be fantastic to see more articles like this about how to go all in on AI tools for brownfield projects involving more than one person.

One thing I will add: I actually don’t think it’s wrong to start out building a vibe coded spaghetti mess for a project like this… provided you see it as a prototype you’re going to learn from and then throw away. A throwaway prototype is immensely useful because it helps you figure out what you want to build in the first place, before you step down a level and focus on closely guiding the agent to actually build it.

The author’s mistake was that he thought the horrible prototype would evolve into the real thing. Of course it could not. But I suspect that the author’s final results when he did start afresh and build with closer attention to architecture were much better because he has learned more about the requirements for what he wanted to build from that first attempt.

reply
This wasn't even just greenfield work, it included the exact type of work where AI arguably excels: extracting working code from an extant codebase (SQLite) as a reusable library. (It also included the type of work AI is really bad at: designing APIs sensibly.)
reply
[dead]
reply
deleted
reply
I feel like recently HN has been seeing more takes like this one and at least slightly less of the extremist clickbaity stuff. Maybe it's a sign of maturity. (Or maybe it's just fatigue with the cycle of hyping the absolute-latest model?)
reply
It takes time for people to go through these experiences (three months, in OP's case), and LLMs have only been reasonably good for a few months (since circa Nov'25).

Previously, takes were necessarily shallower or not as insightful ("worked with caveats for me, ymmv") - there just wasn't enough data - although a few have posted fairly balanced takes (@mitsuhiko for example).

I don't think we've seen the last of hypers and doomers though.

reply
Can you point to one other post like this? Curious. Thanks
reply
I'm deeply convinced that there's 2 reasons we don't see real takes like this: 1) is because these people are quietly appreciating the 2-50% uplift you get from sanely using LLMs instead of constantly posting sycophantic or doomer shit for clout and/or VC financing. 2) is because the real version of LLM coding is boring and unsexy. It either involves generating slop in one shot to POC, then restarting from scratch for the real thing or doing extensive remediation costing far more than the initial vibe effort cost; or it involves generally doing the same thing we've been doing since the assembler was created except now I don't need to remember off-hand how to rig up boilerplate for a table test harness in ${current_language}, or if I wrote a snippet with string ops and if statements and I wish it were using regexes and named capture groups, it's now easy to mostly-accurately convert it to the other form instead of just sighing and moving on.

But that's boring nerd shit and LLMs didn't change who thinks boring nerd shit is boring or cool.

reply
> because the real version of LLM coding is boring and unsexy

Some people do find it unfun, saying it deprives them of the happy "flow" of banging out code. Reaching "flow" when prompting LLMs arguably requires a somewhat deeper understanding of them as a proper technical tool, as opposed to a complete black box, or worse, a crystal ball.

reply
[dead]
reply
Software engineering is only about 20% writing code (the famous 40-20-40 split). Most people use it only for the first 40%, and very succesfully (im in that camp). If you use it to write your code you can theorettically maybe get 20% time improvement initially, but you loose a lot of time later redoing it or unraveling. Not worth bothering.
reply
20% is one of those cool lies SWEs have been able to push through (like “our jobs are oh so very special we can’t really estimate it, we’ll create an entire sub-industries with our industry to make sure everyone knows we can’t estimate”).

SWEs spend 20% of the time writing code for exactly the same reason brick-layers spend 20% of their time laying bricks

reply
There’s also just the negative association factor.

I use LLMs in my every day work. I’m also a strong critic of LLMs and absolutely loathe the hype cycle around them.

I have done some really cool things with copilot and Claude and I keep sharing them to within my working circle because I simply don’t want to interact that much with people who aren’t grounded on the subject.

reply
I would be interested to hear your take on Copilot vs Claude. I have used Copilot (trial) in VS Code and I found it to mostly meet my needs. It could generate some plans and code, which I could review on the go. I found this very natural to me as I never felt 'left behind' in whatever code the AI was generating. However, most of the posts I see here are on Claude (I haven't tried it) and very few mentions of Copilot. What is your impression about them and the use cases each is strong in?
reply
It can be any number of things. From spending hour or two just writing requirements, to giving an example of existing curated code from another project you wrote and would like to emulate, or rewriting existing apps in a different language/architecture (sort of like translating), to serving as a QA agent or reviewer for the LLM agent, or vice versa.

I kinda like how you can just use it for anything you like. I have bazillion personal projects, I can now get help with, polish up, simplify, or build UI for, and it's nice. Anything from reverse engineering, to data extraction, to playing with FPGAs, is just so much less tedious and I can focus on the fun parts.

reply
> Some people never get to the part where they review the code. They go straight to their LinkedIn or blog and start writing (or having ChatGPT write) posts about how manual coding is dead and they’re done writing code by hand forever. Some people review the code and declare it unusable garbage, then also go to their social media and post how AI coding is completely useless and they’re not going to use it for anything. This blog post shows the journey that anyone not in one of those two vocal minorities is going through right now.

What’s really happening is that you’re all of those people in the beginning. Those people are you as you go through the experience. You’re excited after seeing it do the impossible and in later instances you’re critical of the imperfections. It’s like the stages of grief, a sort of Kübler-Ross model for AI.

reply
It's actually common for human-written projects to go through an initial R&D phase where the first prototypes turn into spaghetti code and require a full rewrite. I haven't been through this myself with LLMs, but I wonder to what extent they could analyse the codebase, propose and then implement a better architecture based on the initial version.
reply
If you write that first prototype in Rust, with the idiomatic style of "Rust exploratory code" (lots of defensive .clone()ing to avoid borrowck trouble; pervasive interior mutability; gratuitous use of Rc<> or Arc<> to simplify handling of the objects' lifecycle) that can often be incrementally refactored into a proper implementation. Very hard to do in other languages where you have no fixed boilerplate marking "this is the sloppy part".
reply
Rust is a language for fast prototyping? That’s the one thing Rust is absolutely terrible at imo, and I really like the production/quality/safety aspects of Rust.
reply
It's not specialized to fast prototyping for sure, but you can use for that with the right boilerplate.
reply
Someone used Claude Code to generate a very simple staffing management app. The sort of thing that really wouldn't take that long to make, but why pay for any software when you can just ignore the problem, amiright? Anyway, the code that got generated was full of SQL injection issues for the most absurd sorts of things. It would have 80% of the database queries implemented through the ORM, but then the leftover stuff was raw string concat junk, for no good reason because it wasn't even doing any dynamic query or anything that the ORM couldn't do.
reply
For me it’s just a matter of “does this actually save me time at all?”

If it generates the slop version in a week but it takes me 3 more weeks to clean it up, could I have I just done it right the first time myself in 4 weeks instead? How much money have I wasted in tokens?

reply
A car saves you time in getting to and from the store. But if you don't learn to drive, and just hop in the car and press things, you're going to crash, and that definitely won't save you time. Cars are also more expensive than walking or a bike, yet people still buy them.
reply
I already know how to drive stick (trad coding), I don’t feel like I’m gaining much by switching to automatic transmission.
reply
> does this actually save me time at all?

Soooooo....

As one who hasn't taken the plunge yet -- I'm basically retired, but have a couple of projects I might want to use AI for -- "time" is not always fungible with, or a good proxy for, either "effort" or "motivation"

> How much money have I wasted in tokens?

This, of course, may be a legitimate concern.

> If it generates the slop version in a week but it takes me 3 more weeks to clean it up, could I have I just done it right the first time myself in 4 weeks instead?

This likewise may be a legitimate concern, but sometimes the motivation for cleaning up a basically working piece of code is easier to find that the motivation for staring at a blank screen and trying to write that first function.

reply
Well for me, the amount of time/effort as a function my of my motivation has acted as a natural gatekeeper to bad ideas. Just because I can do something with AI now doesn’t necessarily mean that I should. I am also weary of trading time and effort for outright money right out of my own pocket to find out, especially when I find the people I’d be giving money to so reprehensible. I don’t live somewhere where developers make a lot of money. I’m not poor in any stretch but not rich enough that I can waste money on slop for funsies. But I can spend a month on validating a side project because I find coding as a hobby enjoyable in and of itself, and I don’t care if I throw out a few thousand lines of code after a little while and realize I’m wasting my time.

Cleaning up agent slop code by hand is also a miserable experience and makes me hate my job. I do it already because at $DAYJOB because my boss thinks “investing” in third worlders for pennies on the dollar and just giving them a Claude subscription will be better than investing in technical excellence and leadership. The ROI on this strategy is questionable at best, at least at my current job. Code Review by humans is still the bottleneck and delivering proper working features has not accelerated because they require much more iteration because of slop.

Would much rather spend the time making my own artisanal tradslop instead if it’s gonna take me the same amount of time anyway - at least it’s more enjoyable.

reply
> you need to learn how to use them correctly in your workflow and you need to remain involved in the code

I completely agree that this is the case right now, but I do wonder how long it will remain the case.

reply
Without wanting to sound rude: I think the mistake people make with AI prototypes is keeping the code at all.

The AI’s are more than capable of producing a mountain of docs from which to rebuild, sanely. They’re really not that capable - without a lot of human pain - of making a shit codebase good.

reply
It's a very accurate and relatable post. I think one corollary that's important to note to the anti-AI crowd is that this project, even if somewhat spaghettified, will likely take orders of magnitude less time to perfect than it would for someone to create the whole thing from scratch without AI.

I often see criticism towards projects that are AI-driven that assumes that codebase is crystalized in time, when in fact humans can keep iterating with AI on it until it is better. We don't expect an AI-less project to be perfect in 0.1.0, so why expect that from AI? I know the answer is that the marketing and Twitter/LinkedIn slop makes those claims, but it's more useful to see past the hype and investigate how to use these tools which are invariably here to stay

reply
> this project, even if somewhat spaghettified, will likely take orders of magnitude less time to perfect than it would for someone to create the whole thing from scratch without AI

That's a big leap of faith and... kinda contradicts the article as I understood it.

My experience is entirely opposite (and matches my understanding of the article): vibing from the start makes you take orders of magnitude more time to perfect. AI is a multiplier as an assistant, but a divisor as an engineer.

reply
vibing is different from... steering AI as it goes so it doesn't make fundamentally bad decisions
reply
Both of these are not really the right way to use AI to code with. There are two basic ways to code with AI that work:

1. Autocomplete. Pretty simple; you only accept auto-completes you actually want, as you manually write code.

2. Software engineering design and implementation workflow. The AI makes a plan, with tasks. It commits those plans to files. It starts sub-agents to tackle the tasks. The subagents create tests to validate the code, then writes code to pass the tests. The subagents finish their tasks, and the AI agent does a review of the work to see if it's accurate. Multiple passes find more bugs and fix them in a loop, until there is nothing left to fix.

I'm amazed that nobody thinks the latter is a real thing that works, when Claude fucking Code has been produced this way for like 6 months. There's tens of thousands of people using this completely vibe-coded software. It's not a hoax.

reply
#2 does not negate my steering suggestion, so I'm not sure how you can conclude nobody thinks it's a real thing that works

also Claude Code is notoriously poorly built, so I wouldn't tout it as SOTA

reply
> when Claude fucking Code has been produced this way for like 6 months

And people can look at the results (illegally) because that whole bunch of code has been leaked. Let's just say it's not looking good. These are the folks who actually made and trained Claude to begin with, they know the model more than anyone else, and the code is still absolute garbage tier by sensible human-written code quality standards.

reply
Those extreme takes are taken mostly for clicks or are exaggerated second hand so the "other side's" opinion is dumber than it is to "slam the naysayers". Most people are meh about everything, not on the extremes, so to pander to them you mock the extremes and make them seem more likely. It's just online populism.
reply
I'll take the other side of this.

Professional software engineers like many of us have a big blind spot when it comes to AI coding, and that's a fixation on code quality.

It makes sense to focus on code quality. We're not wrong. After all, we've spent our entire careers in the code. Bad code quality slows us down and makes things slow/insecure/unreliable/etc for end users.

However, code quality is becoming less and less relevant in the age of AI coding, and to ignore that is to have our heads stuck in the sand. Just because we don't like it doesn't mean it's not true.

There are two forces contributing to this: (1) more people coding smaller apps, and (2) improvements in coding models and agentic tools.

We are increasingly moving toward a world where people who aren't sophisticated programmers are "building" their own apps with a user base of just one person. In many cases, these apps are simple and effective and come without the bloat that larger software suites have subjected users to for years. The code is simple, and even when it's not, nobody will ever have to maintain it, so it doesn't matter. Some apps will be unreliable, some will get hacked, some will be slow and inefficient, and it won't matter. This trend will continue to grow.

At the same time, technology is improving, and the AI is increasingly good at designing and architecting software. We are in the very earliest months of AI actually being somewhat competent at this. It's unlikely that it will plateau and stop improving. And even when it finally does, if such a point comes, there will still be many years of improvements in tooling, as humanity's ability to make effective use of a technology always lags far behind the invention of the technology itself.

So I'm right there with you in being annoyed by all the hype and exaggerated claims. But the "truth" about AI-assisted coding is changing every year, every quarter, every month. It's only trending in one direction. And it isn't going to stop.

reply
> However, code quality is becoming less and less relevant in the age of AI coding, and to ignore that is to have our heads stuck in the sand. Just because we don't like it doesn't mean it's not true.

Strongly disagree with this thesis, and in fact I'd go completely the opposite: code quality is more important than ever thanks to AI.

LLM-assisted coding is most successful in codebases with attributes strongly associated with high code quality: predictable patterns, well-named variables, use of a type system, no global mutable state, very low mutability in general, etc.

I'm using AI on a pretty shitty legacy area of a Python codebase right now (like, literally right now, Claude is running while I type this) and it's struggling for the same reason a human would struggle. What are the columns in this DataFrame? Who knows, because the dataframe is getting mutated depending on the function calls! Oh yeah and someone thought they could be "clever" and assemble function names via strings and dynamically call them to save a few lines of code, awesome! An LLM is going to struggle deciphering this disasterpiece, same as anyone.

Meanwhile for newer areas of the code with strict typing and a sensible architecture, Claude will usually just one-shot whatever I ask.

edit: I see most replies are saying basically the same thing here, which is an indicator.

reply
However, code quality is becoming less and less relevant in the age of AI coding

It actually becomes more and more relevant. AI constantly needs to reread its own code and fit it into its limited context, in order to take it as a reference for writing out new stuff. This means that every single code smell, and every instance of needless code bloat, actually becomes a grievous hazard to further progress. Arguably, you should in fact be quite obsessed about refactoring and cleaning up what the AI has come up with, even more so than if you were coding purely for humans.

reply
> However, code quality is becoming less and less relevant in the age of AI coding, and to ignore that is to have our heads stuck in the sand. Just because we don't like it doesn't mean it's not true.

Strong disagree. I just watched a team spend weeks trying to make a piece of code work with AI because the vibe coded was spaghetti garbage that even the AI couldn’t tell what needed to be done and was basically playing ineffective whackamole - it would fix the bug you ask it by reintroducing an old bug or introducing a new bug because no one understood what was happening. And humans couldn’t even step in like normal because no one understood what’s going on.

reply
Okay, so you observed one team that had an issue with AI code quality. What's your point?

In 1998, I'm sure there were newspaper companies who failed at transitioning online, didn't get any web traffic, had unreliable servers crashed, etc. This says very little about what life would be like for the newspaper industry in 1999, 2000, 2005, 2010, and beyond.

reply
Im arguing that code quality very much still matters and will only continue to matter.

AI will get better at making good maintainable and explainable code because that’s what it takes to actually solve problems tractably. But saying “code quality doesn’t matter because AI” is definitely not true both experientially and as a prediction. Will AI do a better job in the future? Sure. But because their code quality improves not because it’s less important.

reply
It seems that your opinion is based on expectations for the future then, which is notoriously difficult to predict.
reply
It's not that hard to predict that obviously useful new technology is going to improve over time.

Guns, wheels, cars, ships, batteries, televisions, the internet, smartphones, airplanes, refrigeration, electric lighting, semiconductors, GPS, solar panels, antibiotics, printing presses, steam engines, radio, etc. The pattern is obvious, the forces are clear and well-studied.

If there is (1) a big gap between current capabilities and theoretical limits, (2) huge incentives for those who to improve things, (3) no alternative tech that will replace or outcompete it, (4) broad social acceptance and adoption, and (5) no chance of the tech being lost or forgotten, then technological improvement is basically a guarantee.

These are all obviously true of AI coding.

reply
There's almost no point in arguing about this anymore. Neither you nor the other person are going to be convinced. We just have to wait and see if a new crop of 100x productivity AI believer companies come along and unseat all the incumbents.
reply
But hindsight is 20/20 as they say. In 2020 people predicted that Facebook Horizon would only go one direction, always improve and become as pervasive as the internet. So when you predict that the design and architecture capabilities of models will continue to improve, thus making code quality irrelevant, you sound very confident. And if in five years you are right, you will brag about it here. If not, well I for one will not track you down and rub it in your face. Peace out.
reply
You're confusing betting on a company/product vs betting on technological improvement in general.

It is absolutely the case that virtual reality technology will only get better over time. Maybe it'll take 5, or 10, or 20, or 40 years, but it's almost a certainty that we'll eventually see better AR/VR tech in the future than we have in the past.

Would you bet against that? You'd be crazy to imo.

reply
There's a kid outside the window of the place I'm staying who's been in the yard playing and talking with people online through his VR headset for like 2+ hours. He's living in the future. Whatever happens, he and his friends are going to continue to be interested in more of this.

Whether what they're using in 20 years is produced by the company formerly known as Facebook or not is a whole different question.

reply
I don't buy this at all. Code quality will always matter. Context is king with LLMs, and when you fill that context up with thousands of lines of spaghetti, the LLM will (and does) perform worse. Garbage in, garbage out, that's still the truth from my experience.

Spaghetti code is still spaghetti code. Something that should be a small change ends up touching multiple parts of the codebase. Not only does this increase costs, it just compounds the next time you need to change this feature.

I don't see why this would be a reality that anyone wants. Why would you want an agent going in circles, burning money and eventually finding the answer, if simpler code could get it there faster and cheaper?

Maybe one day it'll change. Maybe there will be a new AI technology which shakes up the whole way we do it. But if the architecture of LLMs stays as it is, I don't see why you wouldn't want to make efficient use of the context window.

reply
I didn't say that you "want" spaghetti code or that spaghetti code is good.

I said that (a) apps are getting simpler and smaller in scope and so their code quality matters less, and (b) AI is getting better at writing good code.

reply
Apps are getting bigger and more ambitious in scope as developers try to take advantage of any boost in production LLMs provide them.
reply

  > However, code quality is becoming less and less relevant in the age of AI coding, and to ignore that is to have our heads stuck in the sand. Just because we don't like it doesn't mean it's not true.

  > [...]

  > We are increasingly moving toward a world where people who aren't sophisticated programmers are "building" their own apps with a user base of just one person. In many cases, these apps are simple and effective and come without the bloat that larger software suites have subjected users to for years. The code is simple, and even when it's not, nobody will ever have to maintain it, so it doesn't matter. Some apps will be unreliable, some will get hacked, some will be slow and inefficient, and it won't matter. This trend will continue to grow.
I do agree with the fact that more and more people are going to take advantage of agentic coding to write their own tools/apps to maker their life easier. And I genuinely see it as a good thing: computers were always supposed to make our lives easier.

But I don't see how it can be used as an argument for "code quality is becoming less and less relevant".

If AI is producing 10 times more lines that are necessary to achieve the goal, that's more resources used. With the prices of RAM and SSD skyrocketing, I don't see it as a positive for regular users. If they need to buy a new computer to run their vibecoded app, are they really reaping the benefits?

But what's more concerning to me is: where do we draw the line?

Let's say it's fine to have a garbage vibecoded app running only on its "creator" computer. Even if it gobbles gigabytes of RAM and is absolutely not secured. Good.

But then, if "code quality is becoming less and less relevant", does this also applies to public/professional apps?

In our modern societies we HAVE to use dozens of software everyday, whether we want it or not, whether we actually directly interact with them or not.

Are you okay with your power company cutting power because their vibecoded monitoring software mistakenly thought you didn't paid your bills?

Are you okay with an autonomous car driving over your kid because its vibecoded software didn't saw them?

Are you okay with cops coming to your door at 5AM because a vibecoded tool reported you as a terrorist?

Personally, I'm not.

People can produce all the trash they want on their own hardware. But I don't want my life to be ruled by software that were not given the required quality controls they must have had.

reply
> nobody will ever have to maintain it, so it doesn't matter

I'm curious about software that's actively used but nobody maintains it. If it's a personal anecdote, that's fine as well

reply
> However, code quality is becoming less and less relevant in the age of AI coding, and to ignore that is to have our heads stuck in the sand. Just because we don't like it doesn't mean it's not true.

It's the opposite, code quality is becoming more and more relevant. Before now you could only neglect quality for so long before the time to implement any change became so long as to completely stall out a project.

That's still true, the only thing AI has changed is it's let you charge further and further into technical debt before you see the problems. But now instead of the problems being a gradual ramp up it's a cliff, the moment you hit the point where the current crop of models can't operate on it effectively any more you're completely lost.

> We are in the very earliest months of AI actually being somewhat competent at this. It's unlikely that it will plateau and stop improving.

We hit the plateau on model improvement a few years back. We've only continued to see any improvement at all because of the exponential increase of money poured into it.

> It's only trending in one direction. And it isn't going to stop.

Sure it can. When the bubble pops there will be a question: is using an agent cost effective? Even if you think it is at $200/month/user, we'll see how that holds up once the cost skyrockets after OpenAI and Anthropic run out of money to burn and their investors want some returns.

Think about it this way: If your job survived the popularity of offshoring to engineers paid 10% of your salary, why would AI tooling kill it?

reply
> That's still true, the only thing AI has changed is it's let you charge further and further into technical debt before you see the problems. But now instead of the problems being a gradual ramp up it's a cliff, the moment you hit the point where the current crop of models can't operate on it effectively any more you're completely lost.

What you're missing is that fewer and fewer projects are going to need a ton of technical depth.

I have friends who'd never written a line of code in their lives who now use multiple simple vibe-coded apps at work daily.

> We hit the plateau on model improvement a few years back. We've only continued to see any improvement at all because of the exponential increase of money poured into it.

The genie is out of the bottle. Humanity is not going to stop pouring more and more money into AI.

> Sure it can. When the bubble pops there will be a question: is using an agent cost effective? Even if you think it is at $200/month/user, we'll see how that holds up once the cost skyrockets after OpenAI and Anthropic run out of money to burn and their investors want some returns.

The AI bubble isn't going to pop. This is like saying the internet bubble is going to pop in 1999. Maybe you will be right about short term economic trends, but the underlying technology is here to stay and will only trend in one direction: better, cheaper, faster, more available, more widely adopted, etc.

reply
> What you're missing is that fewer and fewer projects are going to need a ton of technical depth. > I have friends who'd never written a line of code in their lives who now use multiple simple vibe-coded apps at work daily.

Again it's the opposite. A landscape of vibe coded micro apps is a landscape of buggy, vulnerable, points of failure. When you buy a product, software or hardware, you do more than buy the functionality you buy the assurance it will work. AI does not change this. Vibe code an app to automate your lightbulbs all you like, but nobody is going to be paying millions of dollars a year on vibe coded slop apps and apps like that is what keeps the tech industry afloat.

> Humanity is not going to stop pouring more and more money into AI.

There's no more money to pour into it. Even if you did, we're out of GPU capacity and we're running low on the power and infrastructure to run these giant data centres, and it takes decades to bring new fabs or power plants online. It is physically impossible to continue this level of growth in AI investment. Every company that's invested into AI has done so on the promise of increased improvement, but the moment that stops being true everything shifts.

> The AI bubble isn't going to pop. This is like saying the internet bubble is going to pop in 1999.

The internet bubble did pop. What happened after is an assessment of how much the tech is actually worth, and the future we have now 26 years later bears little resemblance to the hype in 1999. What makes you think this will be different?

Once the hype fades, the long-term unsuitability for large projects becomes obvious, and token costs increase by ten or one hundred times, are businesses really going to pay thousands of dollars a month on agent subscriptions to vibe code little apps here and there?

reply
> Again it's the opposite. A landscape of vibe coded micro apps is a landscape of buggy, vulnerable, points of failure. When you buy a product, software or hardware, you do more than buy the functionality you buy the assurance it will work. AI does not change this. Vibe code an app to automate your lightbulbs all you like, but nobody is going to be paying millions of dollars a year on vibe coded slop apps and apps like that is what keeps the tech industry afloat.

This is what everyone says when technology democratizes something that was previously reserved for a small number of experts.

When the printing press was invented, scribes complained that it would lead to a flood of poorly written, untrustworthy information. And you know what? It did. And nobody cares.

When the web was new, the news media complained about the same thing. A landscape of poorly researched error-ridden microblogs with spelling mistakes and inaccurate information. And you know what? They were right. That's exactly what the internet led to. And now that's the world we live in, and 90% of those news media companies are dead or irrelevant.

And here you are continuing the tradition of discussing a new landscape of buggy, vulnerable products. And the same thing will happen and already is happening. People don't care. When you democratize technology and you give people the ability to do something useful they never could do before without having to spend years becoming an expert, they do it en masse, and they accept the tradeoffs. This has happened time and time again.

> The internet bubble did pop... the future we have now 26 years later bears little resemblance to the hype in 1999. What makes you think this will be different?

You cut out the part where I said it only popped economically, but the technology continued to improve. And the situation we have now is even better than the hype in 1999:

They predicted video on demand over the internet. They predicted the expansion of broadband. They predicted the dominance of e-commerce. They predicted incumbents being disrupted. All of this happened. Look at the most valuable companies on earth right now.

If anything, their predictions were understated. They didn't predict mobile, or social media. They thought that people would never trust SaaS because it's insecure. They didn't predict Netflix dominating Hollywood. The internet ate MORE than they thought it would.

reply
> This is what everyone says when technology democratizes something that was previously reserved for a small number of experts.

What part of renting your ability to do your job is "democratizing"? The current state of AI is the literal opposite. Same for local models that require thousands of dollars of GPUs to run.

Over the past 20 years software engineering has become something that just about anyone can do with little more than a shitty laptop, the time and effort, and an internet connection. How is a world where that ability is rented out to only those that can pay "democratic"?

> When the printing press was invented, scribes complained that it would lead to a flood of poorly written, untrustworthy information. And you know what? It did. And nobody cares.

A bad book is just a bad book. If a novel is $10 at the airport and it's complete garbage then I'm out $10 and a couple of hours. As you say, who cares. A bad vibe coded app and you've leaked your email inbox and bank account and you're out way more than $10. The risk profile from AI is way higher.

Same is even more true for businesses. The cost of a cyberattack or a outage is measured in the millions of dollars. It's a simple maths, the cost of the risk of compromise far oughtweights the cost of cheaper upfront software.

> You cut out the part where I said it only popped economically, but the technology continued to improve.

The improvement in AI models requires billions of dollars a year in hardware, infrastructure, end energy. Do you think that investors will continue to pour that level of investment into improving AI models for a payout that might only come ten to fifteen years down the road? Once the economic bubble pops, the models we have are the end of the road.

reply
"Thousands of dollars of GPU" as a one-time expense (not ongoing token spend) is dirt cheap if it meaningfully improves productivity for a dev. And your shitty laptop can probably run local AI that's good enough for Q&A chat.
reply
> Tests created a similar false comfort. Having 500+ tests felt reassuring, and AI made it easy to generate more. But neither humans nor AI are creative enough to foresee every edge case you’ll hit in the future; there are several times in the vibe-coding phase where I’d come up with a test case and realise the design of some component was completely wrong and needed to be totally reworked. This was a significant contributor to my lack of trust and the decision to scrap everything and start from scratch.

This is my experience. Tests are perhaps the most challenging part of working with AI.

What’s especially awful is any refactor of existing shit code that does not have tests to begin with, and the feature is confusing or inappropriately and unknowingly used multiple places elsewhere.

AI will write test cases that the logic works at all (fine), but the behavior esp what’s covered in an integration test is just not covered at all.

I don’t have a great answer to this yet, especially because this has been most painful to me in a React app, where I don’t know testing best practices. But I’ve been eyeing up behavior driven development paired with spec driven development (AI) as a potential answer here.

Curious if anyone has an approach or framework for generating good tests

reply
The false comfort usually comes from line coverage. I had a model at 97.2% coverage, 92 tests. Ran equivalence partitioning on it (partition inputs into classes, test one from each) and found 6 real gaps: a search scope testing 1 of 5 fields, a scope checking return type but not filtering logic, a missing state machine branch. SimpleCov said the file was covered. The logical input space was not. The technique is old (ISTQB calls it specification-based testing) but the manual overhead made it impractical until recently. Agents made it possible to apply across 60+ models, which is the one thing they have changed for testing so far.
reply
I've always thought that writing good tests (unit, integration or e2e) is harder than the actual coding by maybe an order of magnitude.

The tricky part of unit tests is coming up with creative mocks and ways to simulate various situations based on the input data, w/o touching the actual code.

For integration tests, it's massaging the test data and inputs to hit every edge case of an endpoint.

For e2e tests, it's massaging the data, finding selectors that aren't going to break every time the html is changed, and trying to winnow down to the important things to test - since exhaustive e2e tests need hours to run and are a full-time job to maintain. You want to test all the main flows, but also stuff like handling a back-end system failure - which doesn't get tested in smoke tests or normal user operations.

That's a ton of creativity for AI to handle. You pretty much have to tell it every test and how to build it.

reply
Use tla+ and have it go back and forth with you to spec out your system behavior then iterate on it trying to link the tla+ spec with the actual code implementing it

Pull out as many pure functions as possible and exhaustively test the input and output mappings.

reply
[dead]
reply
Long term, I think the best value AI gives us is a poweful tool to gain understanding. I think we are going to see deep understanding turn into the output goal of LLMs soon. For example, the blocker on this project was the dense C code with 400 rules. Work with LLMs allowed the structure and understanding to be parsed and used to create the tool, but maybe an even more useful output would be full documentation of the rules and their interactions.

This could likely be extracted much easier now from the new code, but imagine API docs or a mapping of the logical ruleset with interwoven commentary - other devtools could be built easily, bug analysis could be done on the structure of rules independent of code, optimizations could be determined on an architectural level, etc.

LLMs need humans to know what to build. If generating code becomes easy, codifying a flexible context or understanding becomes the goal that amplifies what can be generated without effort.

reply
Looks like a clear divide in people‘s experiences based on how they use these new tools:

1) All-knowing oracle which is lightly prompted and develops whole applications from requirements specification to deployable artifacts. Superficial, little to no review of the code before running and committing.

2) An additional tool next to their already established toolset to be used inside or alongside their IDE. Each line gets read and reviewed. The tool needs to defend their choices and manual rework is common for anything from improving documentation to naming things all the way to architectural changes.

Obviously anything in between as well being viable. 1) seems like a crazy dead-end to me if you are looking to build a sustainable service or a fulfilling career.

reply
> architecture is what happens when all those local pieces interact, and you can’t get good global behaviour by stitching together locally correct components

This is a great article. I’ve been trying to see how layered AI use can bridge this gap but the current models do seem to be lacking in the ambiguous design phase. They are amazing at the local execution phase.

Part of me thinks this is a reflection of software engineering as a whole. Most people are bad at design. Everyone usually gets better with repetition and experience. However, as there is never a right answer just a spectrum of tradeoffs, it seems difficult for the current models to replicate that part of the human process.

reply
I’ve had a couple wins with AI in the design phase, where it helped me reach a conclusion that would’ve taken days of exploration, if I ever got there. Both were very long conversations explicitly about design with lots of back and forth, like whiteboarding. Both involved SQL in ClickHouse, which I’m ok but not amazing at — for example I often write queries with window functions, but my mental model of GROUP BY is still incomplete.

In one of the cases, I was searching for a way to extract a bunch of code that 5-6 queries had in common. Whatever this thing was, its parameters would have to include an array/tuple of IDs, and a parameter that would alter the table being selected from, neither of which is allowed in a clickhouse parameterized view. I could write a normal view for this, but performance would’ve been atrocious given ClickHouse’s ok-but-not-great query optimizer.

I asked AI for alternatives, and to discuss the pros and cons of each. I brought up specific scenarios and asked it how it thought the code would work. I asked it to bring what it knew about SQL’s relational algebra to find the an elegant solution.

It finally suggested a template (we’re using Go) to include another sql file, where the parameter is a _named relation_. It can be a CTE or a table, but it doesn’t matter as long as it has the right columns. Aside from poor tooling that doesn’t find things like typos, it’s been a huge win, much better than the duplication. And we have lots of tests that run against the real database to catch those typos.

Maybe this kind of thing exists out there already (if it does, tell me!) but I probably wouldn’t have found it.

reply
Note I believe this one because of the amount of elbow grease that went into it: 250 hours! Based on smaller projects I’ve done I’d say this post is a good model for what a significant AI-assisted systems programming project looks like.
reply
I had the same experience, been working on my project for a few months and it started very easy and then I lost control of the code base. Had to rewrite a lot of things. The code AI writes does not look bad, but there is something wrong about it. It just does not feel right. You still need to steer it a lot. But I am very happy that I could write a quite complex project with almost no dependencies at all. Only used Electron. I don't even use npm. That is very promising how far you can get without relying on any libraries/frameworks. You can check it here https://github.com/AgentWFY/AgentWFY MIT license.
reply
I got the Syntaqlite Rust/Python extension working in WebAssembly and Pyodide a few weeks ago: https://github.com/simonw/research/tree/main/syntaqlite-pyth...

I just extended that demo to one that runs the resulting Pyodide library in a browser with a playground interface for trying it out: https://tools.simonwillison.net/syntaqlite

reply
I appreciate these kind of fact-based posts. Thank you for this.

Unfortunately, AI seems to be divisive. I hope we will find our way back eventually. I believe the lessons from this era will reverberate for a long time and all sides stand to learn something.

As for me, I can’t help but notice there is a distinct group of developers that does not get it. I know because they are my colleagues. They are good people and not unintelligent, but they are set in their ways. I can imagine management forcing them to use AI, which at the moment is not the case, because they are such laggards. Even I sometimes want to “confront” them about their entire day wasted on something even the free ChatGPT would have handled adequately in a minute or two. It’s sad to see actually.

We are not doing important things and we ourselves are not geniuses. We know that or at least I know that. I worry for the “regular” developer, the one that is of average intellect like me. Lacking some kind of (social) moat I fear many of us will not be able to ride this one out into retirement.

reply
> because they are such laggards

I am a technologist. But I am seriously concerned about the ecological consequences of the training and usage of AI. To me, the true laggards are those, who have not understood yet, that climate change requires a prudent use of our resources.

I don't mind people having fun or being productive with AI. But I do mind it when AI is presented as the only way of doing things.

reply
Don't waste time thinking about the comment you replied to.

Only an AI would bother to create a throwaway account to post such a shallow comment that is mostly fearmongering to push people to use AI.

reply
This is the hardest it's ever going to be. That's been my mode for the last year. A lot of what I did in the last month was complete science fiction as little as six months ago. The scope and quality of what is possible seems to leap ahead every few weeks.

I now have several projects going in languages that I've never used. I have a side project in Rust, and two Go projects. I have a few decades experience with backend development in Java, Kotlin (last ten years) and occasionally python. And some limited experience with a few other languages. I know how to structurer backend projects, what to look for, what needs testing, etc.

A lot of people would insist you need to review everything the AI generates. And that's very sensible. Except AI now generates code faster than I can review it. Our ability to review is now the bottleneck. And when stuff kind of works (evidenced by manual and automated testing), what's the right point to just say it's good enough? There are no easy answers here. But you do need to think about what an acceptable level of due diligence is. Vibe coding is basically the equivalent of blindly throwing something at the wall and seeing what sticks. Agentic engineering is on the opposite side of the spectrum.

I actually emphasize a lot of quality attributes in my prompts. The importance of good design, high cohesiveness, low coupling, SOLID principles, etc. Just asking for potential refactoring with an eye on that usually yields a few good opportunities. And then all you need to do is say "sounds good, lets do it". I get a little kick out of doing variations on silly prompts like that. "Make it so" is my favorite. Once you have a good plan, it doesn't really matter what you type.

I also ask critical questions about edge cases, testing the non happy path, hardening, concurrency, latency, throughput, etc. If you don't, AIs kind of default to taking short cuts, only focus on the happy path, or hallucinate that it's all fine, etc. But this doesn't necessarily require detailed reviews to find out. You can make the AI review code and produce detailed lists of everything that is wrong or could be improved. If there's something to be found, it will find it if you prompt it right.

There's an art to this. But I suspect that that too is going to be less work. A lot of this stuff boils down to evolving guardrails to do things right that otherwise go wrong. What if AIs start doing these things right by default? I think this is just going to get better and better.

reply
But why are you making projects in so many languages? The language is very rarely the barrier to performance, especially if you don't even understand the language.
reply
> There’s an uncomfortable parallel between using AI coding tools and playing slot machines28. You send a prompt, wait, and either get something great or something useless. I found myself up late at night wanting to do “just one more prompt,” constantly trying AI just to see what would happen even when I knew it probably wouldn’t work. The sunk cost fallacy kicked in too: I’d keep at it even in tasks it was clearly ill-suited for, telling myself “maybe if I phrase it differently this time.”

Oof, this hit very close to home. My workplace recently got, as a special promotion, unlimited access to a coding agents with free access to all the frontier models, for a limited period of time. I find it extremely hard to end my workday when I get into the "one more prompt" mindset, easily clocking 12-hour workdays without noticing.

reply
The description of working with AI tools really resonates with me. It's dangerous to work on my codebase when I'm tired, since I don't feel like doing it properly, so I play slots with Claude, and stay up later than I should. I usually come back later and realize the final code that gets generated is an absolute mess.

It is really good for getting up to speed with frameworks and techniques though, like they mentioned.

reply
You should take advantage of these states of cognitive exhaustion by asking Claude to document and explain the codebase to you, and checking whether it still makes sense. If there are things that you have trouble understanding in that state, make a note of them to check later whether they can be simplified.
reply
Same for me. What I liked about the article was the emphasis on the mental model. Staying up late using the a lot machine is not helping me to remember the model better
reply
This is very close to my experience. And I agree with the conclusion I would like to see more of this
reply
Really great to see a realistic experience sans hype about AI tools and how they can have an impact.

> But when I reviewed the codebase in detail in late January, the downside was obvious: the codebase was complete spaghetti...It was extremely fragile; it solved the immediate problem but it was never going to cope with my larger vision...I decided to throw away everything and start from scratch

This part was interesting to me as it lines up with Fred Brooks "throw one away" philosophy: "In most projects, the first system built is barely usable. Hence plan to throw one away; you will, anyhow."

As indicated by the experience, AI tools provide a much faster way of getting to that initial throw-away version. That's their bread and butter for where they shine.

Expecting AI tools to go directly to production quality is a fool's errand. This is the right way to use AI - get a quick implementation, see how it works and learn from it but then refactor and be opinionated about the design. It's similar to TDD's Red, Green, Refactor: write a failing test, get the test passing ASAP without worrying about code quality, refactor to make the code better and reliable.

In time, after this hype cycle has died down, we'll come to realize that this is the best way to make use of AI tools over the long run.

> When I had energy, I could write precise, well-scoped prompts and be genuinely productive. But when I was tired, my prompts became vague, the output got worse

This part also echoes my experience - when I know well what I want, I'm able to write more specific specifications and guide along the AI output. When I'm not as clear, the output is worse and I need to spend a lot more time figuring it out or re-prompting.

reply
It's a huge mistake to start building with Claude without mapping out a project in detail first, by hand. I built a pretty complex device orchestration server + agent recently, and before I set Claude to actually coding I had ~3000 lines of detailed design specs across 7 files that laid out how and what each part of the application would do.

I didn't have to review the code for understanding what Claude did, I reviewed it for verifying that it did what it had been told.

It's also nuts to me that he had to go back in later to build in tests and validation. The second there is an input able to be processed, you bet I have tests covering it. The second a UI is being rendered, I have Playwright taking screenshots (or gtksnapshot for my linux desktop tools).

I think people who are seeing issues at the integration phase of building complex apps are having that happen because they're not keeping the limited context in mind, and preempting those issues by telling their tools exactly how to bridge those gaps themselves.

reply
Thank you. The learning aspect of reading how AI tackles something is rewarding.

It also reduces my hesitation to get started with something I don't know the answer well enough yet. Time 'wasted' on vibe-coding felt less painful than time 'wasted' on heads-down manual coding down a rabbit hole.

reply
"Knowing where you are on these axes at any given moment is, I think, the core skill of working with AI effectively."

I like this a lot. It suggests that AI use may sometimes incentivize people to get better at metacognition rather than worse. (It won't in cases where the output is good enough and you don't care.)

reply
Does SQLite not have a lemon parser generated for its SQL?

When I ported pikchr (also from the SQLite project) to Go, I first ported lemon, then the grammar, then supporting code.

I always meant to do the same for its SQL parser, but pikchr grammar is orders of magnitude simpler.

reply
This resonates. I had a project sitting in my head for years and finally built it in about 6 weeks recently. The AI part wasn't even the hard part honestly, it was finally commiting to actually shipping instead of overthinking the architecture. The tools just made it possible to move fast enough that I didn't lose momentum and abandon it like every other time.
reply
This essay perfectly encapsulates my own experience. My biggest frustration is that the AI is astonishingly good at making awful slop which somehow works. It’s got no taste, no concern for elegance, no eagerness for the satisfyingly terse. My job has shifted from code writer to quality control officer.

Nowhere is this more obvious in my current projects than with CRUD interface building. It will go nuts building these elaborate labyrinths and I’m sitting there baffled, bemused, foolishly hoping that THIS time it would recognise that a single SQL query is all that’s needed. It knows how to write complex SQL if you insist, but it never wants to.

But even with those frustrations, damn it is a lot faster than writing it all myself.

reply
Trim your scope and define your response format prior to asking or commanding.

Most of my questions are "in one sentence respond: long rambling context and question"

reply
This resonates with my experience.

I have several Open Source projects and wanted to refactor them for a decade. A week ago I sat down with Google Gemini and completely refactored three of my libraries. It has been an amazing experience.

What’s a game changer for me is the feedback loop. I can quickly validate or invalidate ideas, and land at an API I would enjoy to use.

reply
Did you already have good integration tests?
reply
This a very insightful post. Thanks for taking the time to share your experience. AI is incredibly powerful, but it’s no free-lunch.
reply
The author mentions a C codebase. Is AI good at coding in C now? If so, which AI systems lead in this language?

Ideally: local; offline.

Or do I have to wrestle it for 250 hours before it coughs up the dough? Last time I tried, the AI systems struggled with some of the most basic C code.

It seemed fine with Python, but then my cat can do that.

reply
C is actually one of the better supported languages for AI assistants these days, a lot better than it was a year or two ago. The hallucination of APIs problem has improved alot. Models like Claude Sonnet and Qwen 2.5 Coder have much stronger recall of POSIX/stdlib now. The harder remaining challenge with C is that AI still struggles with ownership and lifetime reasoning at scale. It can write correct isolated functions but doesnt always carry the right invariants across a larger codebase, which is exactly the architecture problem the article describes.

For local/offline Qwen 2.5 Coder 32B is probably your strongest option if you have the VRAM (or can run it quantized). Handles C better than most other local models in my experience.

reply
Thanks Morpheus_Matrix. I'll take a look at Qwen 2.5 Coder 32B for offline C. I appreciate your guidance.

By extraordinary coincidence, I was just a moment ago part-of-the-way through re-watching The Matrix (1999) and paused it to check Hacker News. There your reply greeted me.

Wild glitch!

reply
The 8-year wait is the part that stands out. Usually the question is "why start now" not "why did it take 8 years". Curious if there was a specific moment where the tools crossed a threshold for you, or if it was more gradual.
reply
For me, the amount of tedium that comes with any new project before I can get to the "good stuff" is a blocker. It's so easy to sit down with excitement, and then 3 hours later, you're still wrestling with basic dependencies, build pipelines, base CSS, etc.
reply
Have you tried using starting templates for projects? For many platforms there are cookiecutters or other tools to jump over those.
reply
It's kind of click bait tho. "I took 3 months and AI to build a SQLite tool" is not going to stand out. The 8 year wait gives a sense of scale or difficulty but that's actually an illusion and does not reflect the task itself.
reply
Great write-up. As a side note (not a Googler myself and this is 100% my opinion) Lalit’s team was hiring in London, UK. If you are interested in working in low level performance tools, this might be a very cool opportunity!
reply
> Of all the ways I used AI, research had by far the highest ratio of value delivered to time spent.

Seconded!

reply
The author apparently skipped ai-assisted refactoring and auditing before moving to prod.
reply
Great write-up with provenance
reply
This article is describing a problem that is still two steps removed from where AI code becomes actually useful.

90 percent of the things users want either A) dont exist or B) are impossible to find, install and run without being deeply technical.

These things dont need to scale, they dont need to be well designed. They are for the most part targeted, single user, single purpose, artifacts. They are migration scripts between services, they are quick and dirty tools that make bad UI and workflows less manual and more managable.

These are the use cases I am seeing from people OUTSIDE the tech sphere adopt AI coding for. It is what "non techies" are using things like open claw for. I have people who in the past would have been told "No, I will not fix your computer" talk to me excitedly about running cron jobs.

Not everything needs to be snap on quality, the bulk of end users are going to be happy with harbor freight quality because it is better than NO tools at all.

reply
> This article is describing a problem that is still two steps removed from where AI code becomes actually useful.

But it does a good job of countering the narrative you often see on LinkedIn, and to some extent on HN as well, where AI is portrayed as all-capable of developing enterprise software. If you spend any time in discussions hyping AI, you will have seen plenty of confident claims that traditional coding is dead and that AI will replace it soon. Posts like this is useful because it shows a more grounded reality.

> 90 percent of the things users want either A) dont exist or B) are impossible to find, install and run without being deeply technical. These things dont need to scale, they dont need to be well designed. They are for the most part targeted, single user, single purpose, artifacts.

Yes, that is a particular niche where AI can be applied effectively. But many AI proponents go much further and argue that AI is already capable of delivering complex, production-grade systems. They say, you don't need engineers anymore. They say, you only need product owners who can write down the spec. From what I have seen, that claim does not hold up and this article supports that view.

Many users may not be interested in scalability and maintainability... But for a number of us, including the OP and myself, the real question is whether AI can handle situations where scalability, maintainability and sound design DO actually matter. The OP does a good job of understanding this.

reply
when he decided on rust, he could have looked up sqlite port, libsqlite does a pretty good job.
reply
I do not have anything resembling problems described. Before I ask AI to create new code (except super trivial things). I first split application into smaller functional modules. I then design structure of the code down to main classes and methods and their interaction. Also try to keep scope small. Then AI just fills out the actual code. I have no problems reviewing it. Sometimes I discover some issues - like using arrays instead of maps leading to performance issues but it is easily spotted.
reply
A key take away from this article is that you as a developer spending as much time on refactoring as on the actual feature. You are constantly requesting code reviews, architectural assessements, consolidations, extractions etc. only then you can empower AI to become a force multiplier. And prevent slop and spaghetti code to be created. Nice article
reply
[dead]
reply
[dead]
reply
[dead]
reply
[dead]
reply
[dead]
reply
[dead]
reply
deleted
reply
[dead]
reply
[dead]
reply
[dead]
reply
[dead]
reply
[flagged]
reply
Unlike many claims that AI works that are clearly bogus, this actually seems quite credible, because TFA describes in detail many problems encountered, which could have easily lead to a failure of the project, if not properly addressed.

There is no doubt that when used in the right way an AI coding assistant can be very helpful, but using it in the right way does not result in the fantastic productivity-increasing factors claimed by some. TFA describes a way of using AI that seems right and it also describes the temptations of using AI wrong, which must be resisted.

More important is whether the productivity improvement is worth a subscription price. Nothing that I have seen until now convinces me about this.

On the other hand, I believe that running locally a good open-weights coding assistant, so that you do not have to worry about token price or about exceeding subscription limits in a critical moment, is very worthwhile.

Unfortunately, thieves like Altman have ensured that running locally has become much more difficult than last year, due to the huge increases in the prices of DRAM and of SSDs. In January I have been forced to replace an old mini-PC, but I was forced to put in the new mini-PC only 32 GB of DDR5, the same as in the 7-year old replaced mini-PC. If I had made the upgrade a few months earlier, I would have put in it 96 GB, which would have made it much more useful. Fortunately, I also have older computers with 64 GB or 128 GB DRAM, where bigger LLMs may be run.

reply
> More important is whether the productivity improvement is worth a subscription price. Nothing that I have seen until now convinces me about this. On the other hand, I believe that running locally a good open-weights coding assistant, so that you do not have to worry about token price or about exceeding subscription limits in a critical moment, is very worthwhile.

This is one thing I also wonder about. If it's a really good programming helper, making 20% of your job 5x faster, then you can compute the value. Say for a $250K SWE this looks like $40k/year roughly. You don't want to hand 100% of that value to the LLM providers or you've just broken even, so then maybe it is worth $200/mo.

reply
Such a reckoning is possible when the cost of a subscription is truly predictable.

For now, there is a lot of unpredictability in the future cost of AI, whenever you do not host it yourself.

If you pay per token, it is extremely hard to predict how many tokens you will need. If you have an apparently fixed subscription, it is very hard to predict whether you will not hit limits in the most inconvenient moment, after which you will have to wait for a day or so for the limits to be reset.

Recently, there have been a lot of stories where the AI providers seem to try to reduce continuously the limits allowed by a subscription. There is also a lot of incertitude about future raises of the subscription prices, as the most important providers appear to use prices below their expenses, for now.

Therefore, while I agree with you that when something provides definite benefits you should be able to assess whether paying for it provides a net gain for you, I do not believe that using an externally-hosted AI coding assistant qualifies for such an assessment, at least not for now.

reply
EDIT:

After I have written the above, that the future cost of externally-hosted AI coding assistants is unpredictable, what I have written was confirmed by an OpenAI press release that the existing Codex users will be migrated during the following weeks towards token-based pricing rates.

Such events will not affect you if you use an open-weights assistant running on your own HW, when you do not have to care about token usage.

reply
You do have to care about token usage when chosing how to scale your hardware. If you do a negligible amount of AI inference for occasional simple Q&A (which is what most people do), you can get away with a very lean and cheap setup even when running very large, sophisticated models. Agentic use with function calls and responses etc. raises the amount of tokens you use over time by at least one order of magnitude.
reply
It's funny that he used Claude instead of gemini for this. Idk if his company is happy with free advertisement of a competitor
reply
Google owns 14% of Anthropic:

https://techcrunch.com/2025/03/11/google-has-given-anthropic...

They don't care. They want software engineers replaced by any means necessary. They know generative AI isn't a big business, that is why they slowwalk it themselves.

Replacement won't work of course, that is why marketing blog posts are needed.

reply
But they own 100% of Google, correct?
reply
article looks like a tweet turned into 30 paragraphs. hardly any taste.
reply
This is what a lot of business books are TBH
reply
Yes, how dare someone take an idea, develop it, and publish it outside the algorithm-driven rage pit. Truly terrible behavior! /s

Expanding a thought beyond 280 characters and publishing it somewhere other than the X outrage machine is something we should be encouraging.

reply
deleted
reply