upvote
And that prompt will basically be 2000 page spec Bible à la IBM circa 1960, see waterfall. Unless AI develops mindreading (and advanced mindreading at that), single prompt creation of actual complex software products will never happen. You'll one shot a simple non scientific calculator, but not Excel or Vim or Nginx.
reply
Why not? Given a proper spec, you should absolutely be able to one-shot Excel, particularly if we put it at the level of complexity of, say, Excel 1.0 for Mac.

Current models aren't capable of that, but that doesn't mean it's not possible.

reply
The issue is not the models, the issue is that this method ws tried before, and humans suck at writing what they want. Developing in small increments allowing feedback was an answer to this issue.

If you made models able to code to long spec, you would be left with the hard issue of having to write them.

reply
An interesting question for me is "can the LLMs predict what humans want?".

Like if you show the LLM a page, can the LLM review the page and then spit out a review that is close to what a human would say about the page?

reply
Yes, my current nightmare is I have a very long queue of specs to write and need to work with non technical staff to help them put in words what it is they actually want.

Software was always that way, though.

reply
Seems like this would be a good time to use this famous quote:

> given the sufficiently smart compiler

For those unaware, this is a similar quote used by compiler proponents. The first full compiler was created in 1957 (+/- 70 years ago) and the "sufficiently smart compiler" never happened, hand written code from the best coders still is faster. Now, that doesn't mean that compilers didn't do the job well enough, we just accepted that 90-95% of the top speed was enough for almost everything.

To the LLM one shotting point, it took 30 (40?) years for compilers to be good enough for the mass market. Caveat early adopter and investor.

Plus what pyrale said.

reply
One shot prompting/tooling is the only reasonable way to use an llm in my opinion. You should not be having an LLM operating for hours creating thousands of lines of new code that you can never review or maintain. You can actually be highly productive modifying a single file or two at a time, ideally as focused and little context as possible, without the llm being given full permission to add as much context as possible along the way to maximize revenue for the developers of the harness.

The agentic engineering paradigm is just a narrative trend pushed by AI companies to get people to 10x their token consumption per prompt. It plays into people's laziness and addiction to dopamine too causing addict like behavior in people that fall prey to this trend.

reply
I disagree fundamentally.

If I do that, I'm literally slower then just doing the change without sufficiently specifying it to the model.

I can see how a junior dev or generally someone that's not particularly knowledgeable about the language or framework they're working with may benefit from such usage, but for experienced people there is very little value in that approach.

I say this because I've just had to face this decision this month with Copilot introducing the usage based billing. I attempted to scale back my usage, first with non-opus - output essentially became discardable as it continually hallucinated no existing fields in the responses of Apis etc... Then my scoping the changes smaller and smaller, until I ultimately gave up and reduced usage to just generating tests.

reply
I agree. And at work it has been producing some of the worst GUI test cases I have ever seen.

What is tested often makes no sense at all, completely implausible edge cases are tested on internals, while it doesn't create tests for the overall application using user events.

And some things in these test cases are downright ridiculous: instead of instantiating your classes, it sets up some barebones fake objects reimplementing some of the behavior of your actual class, then ignores the TypeScript errors via force cast or similar.

Then it proceeds to slap some test ids on the output, stubs components and dependencies more or less randomly, adds some assertions on test ids and calls it a day.

Apparently that's good enough for many colleagues to open a MR for that garbage.

That said, at home with SOTA models I happily hand large units of work to it, outsource much of the thinking, and get workable results. I think this is the future.

reply
deleted
reply
deleted
reply
I disagree, fundamentally.

I see little value in throwing a ton of context at an llm and waiting 10-20 minutes for a coin flip on whether or not its going to produce junk. I'd rather do quick 60 second turns, get most of the way there and fix the rest myself if I have to. I'd rather honestly just not use them.

reply
Well the point was that id rather spend 30 seconds doing it myself then formulate a prompt with enough context for the model to implement it within 60 seconds. Also these numbers are unrealistic.

Everyone that I've ever interacted with and claims to prompt in "seconds" actually needs multiple minutes to think about the solution they want the model to implement - and then need twice as long to formulate that into a sentence which provides the model enough context to actually do that

So the more realistic estimates are "I'd rather spend the 2 minutes just implementing the minor change myself, instead of spending 1.5 minutes thinking about it, then 2.5 minutes writing the prompt and then waiting 1 minute for it to finish"

reply
I would agree with all those points, and my numbers are a little off. I really just don't want to use any of it. I'm more excited about fast FIM autocomplete that works well, something like cursor tab without cursor. If something can increase my wpm and take strain off my fingers that would be nice. At this point latency and accuracy is terrible though.
reply
The trick is to do something else in those 20 minutes (or, ideally, even longer).

That's the main value I've been getting out of coding agents. I have them do (comparatively) simpler tasks or explorative tasks in the background while I'm in a meeting, doing code reviews, or otherwise working on something else.

reply
[dead]
reply