What do the relatively hands-off "it can do whole features at a time" coding systems need to function without taking up a shitload of time in reviews? Great automated test coverage, and extensive specs.
I think we're going to find there's very little time-savings to be had for most real-world software projects from heavy application of LLMs, because the time will just go into tests that wouldn't otherwise have been written, and much more detailed specs that otherwise never would have been generated. I guess the bright-side take of this is that we may end up with better-tested and better-specified software? Though so very much of the industry is used to skipping those parts, and especially the less-capable (so far as software goes) orgs that really need the help and the relative amateurs and non-software-professionals that some hope will be able to become extremely productive with these tools, that I'm not sure we'll manage to drag processes & practices to where they need to be to get the most out of LLM coding tools anyway. Especially if the benefit to companies is "you will have better tests for... about the same amount of software as you'd have written without LLMs".
We may end up stuck at "it's very-aggressive autocomplete" as far as LLMs' useful role in them, for most projects, indefinitely.
On the plus side for "AI" companies, low-code solutions are still big business even though they usually fail to deliver the benefits the buyer hopes for, so there's likely a good deal of money to be made selling companies LLM solutions that end up not really being all that great.
Code is the most precise specification we have for interfacing with computers.
So I expect over time we will see genuine performance improvements, but Amdahl's law dictates it won't be as much as some people and ceo's are expecting.
Writing tests to ensure a program is correct is the same problem as writing a correct program.
Evaluating conformance is a different category of concern from ensuring correctness. Tests are about conformance not correctness.
Ensuring correct programs is like cleaning in the sense that you can only push dirt around, you can't get rid of it.
You can push uncertainty around and but you can't eliminate it.
This is the point of Gödel's theorem. Shannon's information theory observes similar aspects for fidelity in communication.
As Douglas Adams noted: ultimately you've got to know where your towel is.
One thing I hope we'll all collectively learn from this is how grossly incompetent the elite managerial class has become. They're destroying society because they don't know what to do outside of copying each other.
It has to end.
For fairly straightforward changes it's probably a wash, but ironically enough it's often the trickier jobs where they can be beneficial as it will provide an ansatz that can be refined. It's also very good at tedious chores.
People seem to gloss over this... As a CEO if people don't function like this I'd be awake at night sweating.
Which results the software engineering issue I’m not seeing addressed by the hype: bugs cost tens to hundreds of times their coding cost to resolve if they require internal or external communication to address. Even if everyone has been 10x’ed, the math still strongly favours not making mistakes in the first place.
An LLM workflow that yields 10x an engineer but psychopathically lies and sabotages client facing processes/resources once a quarter is likely a NNPP (net negative producing programmer), once opportunity and volatility costs are factored in.
The math depends on importance of the software. A mistake in a typical CRUD enterprise app with 100 users has zero impact on anything. You will fix it when you have time, the important thing is that the app was delivered in a week a year ago and was solving some problem ever since. It has already made enormous profit if you compare it with today’s (yesterday’s ?) manual development that would take half a year and cost millions.
A mistake in a nuclear reactor control code would be a total different thing. Whatever time savings you made on coding are irrelevant if it allowed for a critical bug to slip through.
Between the two extremes you thus have a whole spectrum of tasks that either benefit or lose from applying coding with LLMs. And there are also more axes than this low to high failure cost, which also affect the math. For example, even non-important but large app will likely soon degrade into unmanageable state if developed with too little human intervention and you will be forced to start from scratch loosing a lot of time.
We as an industry have been able to offload a lot of “how” via deterministic systems built by humans with expert understanding. LLMs give you the illusion of this.
1. I spoke to sales to find out about the customer
2. I read every line of the contract (SOW)
3. I did the initial requirements gathering over a couple of days with the client - or maybe up to 3 weeks
3. I designed every single bit of AWS architecture and code
4. I did the design review with the client
5. I led the customer acceptance testing
> We as an industry have been able to offload a lot of “how” via deterministic systems built by humans with expert understanding. LLMs
I assure you the mid level developers or god forbid foreign contractors were not “experts” with 30 years of coding experience and at the time 8 years of pre LLM AWS experience. It’s been well over a decade - ironically before LLMs - that my responsibility was only for code I wrote with my own two hands
I’m not saying trusting cheap devs is a good idea either. I do think cheap devs are actually at risk here.
I didn’t blindly trust the Salesforce consultants either. I also didn’t verify every line of oSql (not a typo) they wrote.
I disagree, in the sense that an engineer who knows how to work with LLMs can produce code which only needs light review.
* Work in small increments
* Explicitly instruct the LLM to make minimal changes
* Think through possible failure modes
* Build in error-checking and validation for those failure modes
* Write tests which exercise all paths
This is a means to produce "viable" code using an LLM without close review. However, to your point, engineers able to execute this plan are likely to be pretty experienced, so it may not be economically viable.
The gains are especially notable when working in unfamiliar domains. I can glance over code and know "if this compiles and the tests succeed, it will work", even if I didn't have the knowledge to write it myself.
... Errr... Yeah, that's not a great approach, unless you are defining 'work' extremely vaguely.
I still make an effort to understand the generated code. If there’s a section I don’t get, I ask the LLM to explain it.
Most of the time it’s just API conventions and idioms I’m not yet familiar with. I have strong enough fundamentals that I generally know what I’m trying to accomplish and how it’s supposed to work and how to achieve it securely.
For example, I was writing some backend code that I knew needed a nonce check but I didn’t know what the conventions were for the framework. So I asked the LLM to add a nonce check, then scanned the docs for the code it generated.
https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...
>When developers are allowed to use AI tools, they take 19% longer to complete issues—a significant slowdown that goes against developer beliefs and expert forecasts. This gap between perception and reality is striking: developers expected AI to speed them up by 24%, and even after experiencing the slowdown, they still believed AI had sped them up by 20%.
If we're being honest with ourselves, it's not making devs work faster. It at best frees their time up so they feel more productive.
I'd like to think that I have this under control because the methodology of working in small increments helps me to recognize when I've gotten stuck in an eddy, but I'll have to watch out for it.
I still maintain that the LLM is saving me time overall. Besides helping in unfamiliar domains, it's also faster than me at leaf-node tasks like writing unit tests.
Yes, code produced this way will have bugs, especially of the "unknown unknown" variety — but so would the code that I would have written by hand.
I think a bigger factor contributing to unforeseen bugs is whether the LLM's code is statistically likely to be correct:
* Is this a domain that the LLM has trained on a lot? (i.e. lots of React code out there, not much in your home-grown DSL)
* Is the codebase itself easy to understand, written with best practices, and adhering to popular conventions? Code which is hard for humans to understand is also hard for an LLM to understand.
It introduces unnecessary indirection, additional abstractions, fails to re-use code. Humans do this too, but AI models can introduce this type of architectural rot much faster (because it's so fast), and humans usually notice when things start to go off the rails, whereas an AI model will just keep piling on bad code.
---
applyTo: '**'
---
By default:
Make the smallest possible change.
Do not refactor existing code unless I explicitly ask.
Under this, Claude Opus at least produces pretty reliable code with my methodology even under surprisingly challenging circumstances, and recent ChatGPTs weren't bad either (though I'm no longer using them). Less powerful LLMs struggle, though.But I would never do the same for Azure.
"Seniors will do expert review" will slowly collapse.
No one cares about handcrafted artisanal code as long as it meets both functional and non functional requirements. The minute geeks get over themselves thinking they are some type of artists, the happier they will be.
I’ve had a job that requires coding for 30 years and before ther I was hobbyist and I’ve worked for from everything from 60 person startups to BigTech.
For my last two projects (consulting) and my current project, while I led the project, got the requirements, designed the architecture from an empty AWS account (yes using IAC) and delivered it. I didn’t look at a line of code. I verified the functional and non functional requirements, wrote the hand off documentation etc.
The customer is happy, my company is happy, and I bet you not a single person will ever look at a line of code I wrote. If they do get a developer to take it over, the developer will be grateful for my detailed AGENTS.md file.
We know from experimentation that agents will change anything that isn’t nailed down. No natural language spec or test suite has ever come close to fully describing all observable behaviors of a non-trivial system.
This means that if no one is reviewing the code, agents adding features will change observable behaviors.
This gets exposed to users as churn, jank, and broken work flows.
2. Assuming that techniques that work with human developers that have severely impaired judgement but are massively faster at producing code is a bad idea.
3. There’s no way you have enough experience with maintaining code written in this way to confidently hand wave away concerns.
So many people on HN are so insulted that the people who put money in our bank accounts and in some cases stock in our brokerage accounts ever cared about their bespoke clean code, GOF patterns and they never did. LLM just made it more apparent.
It’s always been dumb for PR to be focused on for loops vs while loops instead of focusing on whether functional and non functional requirements are met
Speak for yourself. I don't hire people like you.
Even in late 2023 with the shit show of the current market, I had no issues having multiple offers within three weeks just by reaching out to my network and companies looking for people with my set of skills.
You sound like a bozo, I can sniff it through my screen.
Guess what? I also stopped caring how registers are used and counting clock cycles in my assembly language code like it’s the 80s and I’m still programming on a 1Mhz 65C02
But do you look at any of the AI output? Or is it just "it works, ship it"?
What I checked.
1. The bash shell scripts I had it write as my integration test suite
2. To make sure it wasn’t loading the files into Postgres the naive way -loading the file from S3 and doing bulk inserts instead of using the AWS extension that lets it load directly from S3. It’s the differ xe between taking 20 minutes and 20 seconds.
3. I had strict concurrency and failure recovery requirements. I made sure it was done the right way.
4. Various security, logging, log retention requirements
What I didn’t look at - a line of the code for the web admin site. I used AWS Cognito for authentication and checked to make sure that unauthorized users couldn’t use the website. Even that didn’t require looking at the code - I had automated tests that tested all of the endpoints.
I've witnessed human developers produce incredibly convoluted, slow "ETL pipelines" that took 10+ minutes to load single digit megabytes of data. It could've been reduced to a shell script that called psql \copy.