The more specific your requirements the closer you get to natural language not being useful anymore.
What do you use them for? For most AI users it's usually CRUD and I've never seen a web server or frontend in APL like languages.
The reason why programming is hard is because most languages force you to use a hammer when you need a screw driver. LLMs are very good at misusing hammers and most people find them useful for that reason.
If you use a sane dsl instead the natural language description of a problem is always more complex and much longer than the equivalent description in a dsl. It's also usually wrong to boot.
This is what algebra used to look like before variables: https://en.wikipedia.org/wiki/Archimedes%27s_cattle_problem#...
I don't think you will find anyone who can do better than an LLM at one shotting the prose version of the problem. Both will of course be wrong.
But I also don't think you will find an LLM that can solve the problem faster than a human with Prolog when you have to use the prose description of the problem.
Any semi-competent AI Agent can do that with a plan you've written in 5 minutes.
But in general the statement is really not true anymore, generic projects/problems have a pretty good chance that the AI can one shot a working solution from a lazily typed vague prompt.
The volume of people successfully adopting agentic engineering practices suggests this stuff isn't rocket science, but it is a learned skill and takes setup.
A year later into heavy AI coding, my experience is what you're describing should aid in being able to run 5+ agents simultaneously on a project because you know what you're doing, you set it up right, and you know how to tell agents to leverage that properly.
To give an example of where I hear this, it is indistinguishable from the things I hear from my coworkers: "You just need the right setup!" (IMO the actual difference is I need to turn off the part of my brain that cares about what the code actually does or considers edge cases at all) What I actually see, in practice, are constant bugs where nobody ever actually addresses the root cause, and instead just paves over it with a new Claude mass-edit that inevitably introduces another bug where we'll have to repeat the same process when we run into another production issue.
We end up making no actual progress, but boy do we close tickets, push PRs, and move fast and oh man do we break things. We're just doing it all in-place. But at least we're sucking ourselves off for how fast we're moving and how cutting edge we are, I guess.
I dunno, maybe I'm doing it wrong, maybe my team is all doing it wrong. But like I said the things they say are indistinguishable from the common HN comment that insists how this stuff is jet fuel for them, and I see the actual results, not just the volume of output, and there's no way we're occupying the same reality.
Translating that into code can happen directly by you, or into prompt iterations that need to result in the same/similar coded representation.
In other words, when it matters how something works and it is full of intricate details, you do not need to specify it, you just do it (eg. as an example which is probably not the best is you knowing how to avoid N+1 query performance issue — you do not need a ticket or spec to be explicit, you can just do it at no extra effort — models are probably OK at this as it is such a pervasive gotcha, but there are so many more).
You setup the environment and then you do the work. Unless you are switching employers every week, you invest in writing that stuff down so the generation is right-ish and generate validation tooling so it auto-detects the mistakes and self-repairs.
imagine you have to implement a specific algorithm for a quantum computer.
There's no value setting up AI to do the writing for you. That might be orders of magnitude harder then writing the algorithm directly.
For highly specialized one-off features, it doesn't always pay off.
On the other hand, if all you do are some generic items that AI can do well... then I'm not sure you're going to have a job long term, your prompts and automation will be useful for the new junior hires that will be specialized in using these and cost effective.
It's like talking legalese vs plain English; or formal logic vs English. Some people have the formal stuff come more naturally, and then spitting code out is not a burden.
What's your definition of "successfully"?
More LOC committed per day is probably the only one that's guaranteed when you let spicy autocomplete take the wheel.
I don't think it's at all possible to reason about the other more meaningful metrics in software development, because we simply don't have the context of what each human is working on, and as with the WYSIWYG fad of 3 decades ago, "success" is generally self-reported, by people who don't know what they don't know, and thus they don't know what spicy autocomplete is getting woefully wrong.
"But it {compiles,runs,etc}" isn't a meaningful metric when a large portion of the code in question is dynamic/loosely typed in a non-compiled language (JavaScript, Python, Ruby, PHP, etc).
But man AI is phenomenal for getting stuff out of your head and working quick.
Current AI systems are extremely serial, in that very little of the inherent parallelism of the problem is utilized. Current-gen AI systems run at most a few hundreds of thousands of operations in parallel, while for frontier models, billions of operations could be run in parallel. Or in other words, what currently takes AI 8 hours will take it barely long enough for you to perceive the delay after you release the enter key.
For a demo, play around with https://chatjimmy.ai/ , the AI chatbot of Taalas, where they etched the model into silicon in a distributed way, instead of saving it in RAM and sucking it to execution units by a straw. It's a 8B parameter model, so it's unsuitable for complex problems, but the techniques used for it will work for larger models too, and they are working to get there.
And even Taalas is very far from the limits. Modern better quality LLM chatbots operate at ~40 tokens per second. The Taalas chatbot operates at 17000 tokens/s. If you took full advantage of parallelism, you should be able to have a latency of low hundreds of clock cycles per token, or single request throughput of tens of millions of tokens per second. (With a fully pipelined model able to serve one token per clock cycle, from low hundreds of requests.) Why doesn't everyone do it like that right now? Because to do this, you need to etch your model into silicon, which on modern leading edge manufacturing is a very involved process that costs hundreds of millions+ in development and mask costs (we are not talking about single chips here, you can barely fit that 8B model into one), and will take around a year. So long as the models keep improving so much that a year-old model is considered too old to pay back the capital costs, the investment is not justified. But when it will be done, it will not just make AI faster, it will also make it much more energy-efficient per token. Most of the energy costs are caused by moving data around and loading/storing it in memory.
And I want to stress that none of the above is dependent on any kind of new developments or inventions. We know how to do it, it's held back only by the pace of model improvement and economics. When models reach a state of truly "good enough", it will happen. It feels perverse to me that people are treating this situation as "there was a per-AI period that worked like X, now we are in a post-AI period and we have figured out that it will work like Y". No. We are at the very bottom of a very steep curve, and everything will be very different when it's over.
Everything I do to interact with my computer is through an agent now.
Everything I do to interact with my computer is still the same.
See how boring you are?
Telling the agent your high level plan that you are extremely familiar with and then having the agent execute on 2000 lines of code is FASTER then having you execute on that 2000 lines of code. There is no reality where that can be physically beaten by even someone who's typing really quickly with zero pause. Physically impossible.
Less boring or not? Another way to put it... although my answer is boring, I think I'm right. He is either a liar or like many other people lacks skill in using AI... because the transition to AI is happening so fast... not many people are fully utilizing AI to it's maximum potential. Many still use IDEs, many still interact with terminal. Many people still don't use it to configure infrastructure, do database administration, deploy code... etc.
Also, haven't you ever had a situation where the prompt you started with ends up being longer than the final code diff? Perhaps a subtle bug that's hard to describe/trigger, but ended up having a simple root cause like an off-by-one error?
Also also, coding agents are infamous for generating way more code than is strictly necessary. The 2000 lines of code that the agent generated may well have been only 200 lines of code had you written it yourself.
I know that a better plan could mean fewer iterations, but again that extends the time you need to spend on that plan => the total time of the AI solution
Honestly, I'm still faster than AI cooking scrambled eggs, but definitely not faster than neither AI (or compiler) in translating stuff into code.
No way U beat an LLM on this, even on trivial ones. LLMs are better at that since at least 2024, if you haven't noticed, then you're not doing enough SQL perhaps.
But, of course it took years for people to realize they cannot outpace Visual Studio in the 90s by being very good at x86 assembly.
But during feature development? Not possible. And I consider myself a very fast developer
Bugs happen during feature development, as you say, but then Claude is in the context, and I don't need to tell it where to go, it sees the bug with failing tests, or smth similar.
BTW. One thing that helps my Claude with debugging harder problems is that I tell it to apply scientific method to debugging. Generate hypotheses, gather pros/cons evidence, write to a journal file debug-<problem>.md, design minimal experiments to debunk hypotheses.
You can add that as a skill, and sometimes it will pick it up automatically, but it works wonders just as a single sentence in the input.
besides it is a system that you query, it responds. I'm sure your dbs are not always 'right' and particularly when you as the wrong questions.
You speak as if AI development is frozen, and you ignore the poster's point:
> that gap will only increase as LLMs get more intelligent