Except that he got good at his short game by the end. LLMs will get there sooner than we think.
I think LLMs are great, and I think people who can use them to get to the green in one and take it from there will soar, just like people who could identify a problem and solve it themselves did in the past.
We do this every day. I'm sorry to say, we are indeed shipping in days what used to take weeks.
I do systems programming. Before AI feature development roughly went like, design, implement, test, review with some back edges and a lot of time spent in test and review.
AI has made the implementation part much faster, at the cost of even more time spent testing and reviewing, though still an improvement overall.
We do not see the weeks to days improvement though. The bottleneck before was testing and reviewing, and they are even bigger bottlenecks now.
What kind of work do you do, and what kind of workflow were you using before and after AI to benefit so much?
I'll stop you right there. AI is not good at systems programming, it's good at CRUD web development, which is where most people are seeing the gains.
AI has solved simple CRUD, yes, but CRUD, was easy before.
Maybe they're using AI for testing and reviewing more than you are, not just for coding?
Maybe they're using AI for testing and reviewing more than you are?
For things like web frontents/backends, though, it works beautifully. I ship things in days that would take me weeks to write by hand, and I'm very fast at writing things by hand. The AI also ships many fewer bugs than our average senior programmer, though maybe not fewer bugs than our staff programmers.
The boost is for what are glorified crud apps which it 1000x the tedious work. However, the choices it makes along the way quickly blows up without cleaning. Seniors know how to keep their workstation clean or they should.
I have an example in my line of work. Full service rewrite in a new language. Would have taken forever without AI. AI makes it easier, faster. The service has better throughput, uses less machines. Having a complete full test harness that allows us to ensure we are meeting all the functionality of the previous service is key. AND we are keeping the old service on standby because we know we don't know what might be wrong with the new one.
What's your example?
Most devs aren’t working on cutting edge, low level, mission critical systems. AI is great for that. Every company I personally know have been fast shipping features that are being used daily by millions of people for the past 7 months.
We have the same thing on my team, and we also understand the limitations of AI generated code. If you’re more or less experienced, you can easily see the “good” and “bad” sides of it. So you kinda plan it out in a way that you can “evolve AI generated software”. I wouldn’t say the same thing in 2025 January, but it’s much different times now. Things are already working.
If you're truly "managing fleets of agents" there's no way you're able to sift through the good and the bad in the output. If your AI-generated code is evolvable (which is hard to tell right now) then you're not writing it with "fleets of agents". If you are writing it with fleets of agents, I would bet it's not evolvable; you just haven't reached the breaking point yet.
Isn't this wrong? I thought engineered systems meant something designed with limits.
I have literally built and shipped multiple things that would have taken me many many months to do and I’ve done it in under a week.
Many of these are LLM heavy features where the LLM can literally self-evaluate and self-optimize. I start with a general feature, it will generate adverse, synthetic data, it will build a feature, optimize it the figure out new places to improve. 1 year ago, this would have taken an entire team months to do, now, it’s 2 or 3 days of work.
I have experienced areas where high productivity can be had without much loss in quality. So I can believe it. But it really depends on what you’re doing and I firmly believe many companies will run out of easy stuff that we can blaze through with AI fairly quickly. At least that’s where we seem to be heading
There are strengths, but if you think its writing stream of code and just using it as is, I would LOVE to compete against you.
Look at the best models from Spring 2025, and compare with now (and similarly for Springs 2024 and 2025). Armstrong and lots of others are betting that this trend will continue, and if it does, the LLMs will ship code the LLMs understand, and whether any human specifically understands any particular part will mostly not matter.
I find this particularly funny. There were more than a couple Star Trek Episodes where some alien planet depends on some advanced AI or other technology that they no longer understand, and it turns out the AI is actually slowly killing them, making them sterile, etc. (e.g. https://en.wikipedia.org/wiki/When_the_Bough_Breaks_(Star_Tr... )
Sure, Star Trek is fiction, but "humans rely on a technology that they forget how to make" is a pretty recurrent theme in human history. The FOGBANK saga was pretty recent: https://en.wikipedia.org/wiki/Fogbank
It just amazes me that people think "Sure, this AI generated code is kinda broken now, but all we need is just more AI code to fix it at some unknowable point in the future because humans won't be able to understand it!"
The problem is that executives could take the 15-20% productivity boost and be content, but they read stuff like this, get greedy, and they don't understand the risk they're taking.
If the average programmer is this bad, then there must be better-than-average programmers reviewing the code. The problem with agents is that they can produce code at a far higher volume than the average programmer.
Anyway, I don't know how well the average programmer programs, but if you commit agent-generated code without careful review, your codebase will be cooked in a year or two.
This is how I feel. It’s building things for me that work. I don’t care how it works under the hood in many cases.
Just a minute ago 5.5 looked at some human-written code of mine from last year and while it was making the changes I asked for it determined the existing code was too brittle (it was) and rewrote it better. It didn't mention this in its summary at the end, I only know because I often watch the thinking output as it goes past before it hides it all behind a pop-open.
I also find I need to run an llm code review or two against any code it produces to even get to the point where ’s it’s ready for human review.
In any case they served as an extremely valuable tool.