undefined

points

[-]

The anecdote is compelling, but there's an interesting measurement gap. METR ran a randomized controlled trial with experienced open-source developers — they were actually 19% slower with AI assistance, but self-reported being 24% faster. A ~40 point perception gap.

Doesn't mean the tools aren't useful — it means we're probably measuring the wrong thing. "Prompt engineering" was always a dead end that obscured the deeper question: the structure an AI operates within — persistent context, feedback loops, behavioral constraints — matters more than the model or the prompts you feed it. The real intelligence might be in the harness, not the horse.

by tibbar3 hours ago|

parent|

[-]

Respectfully, was this comment AI generated? It has all the signs.

And scaffolding does matter a lot, but mostly because the models just got a lot better and the corresponding scaffolding for long running tasks hasn't really caught up yet.

by nemooperans14 minutes ago|

parent|

[-]

Ha, fair call. I use Claude a lot and it's definitely rubbed off on how I write and even think (which is something to explore in itself sometime). The scaffolding point is from building though, not prompting. Been doing AI-integrated dev for about a year and the gap between "better model" and "actually useful in production" is almost entirely the surrounding architecture. You're right the infrastructure hasn't caught up yet, that's kind of the whole problem right now. Most teams are building fancier autocomplete when the real problems are things like persistent memory and letting learned patterns earn trust over time.

by JohnMakin5 hours ago|

prev|

[-]

It really doesn't matter how "good" these tools feel, or whatever vague metric you want - they hemorrhage cash at a rate perhaps not seen in human history. In other words, that usage you like is costing them tons of money - the bet is that energy/compute will become vastly cheaper in a matter of a couple of years (extremely unlikely), or they find other ways to monetize that don't absolutely destroy the utility of their product (ads, an area we have seen google flop in spectacularly).

And even say the latter strategy works - ads are driven by consumption. If you believe 100% openAI's vision of these tools replacing huge swaths of the workforce reasonably quickly, who will be left to consume? It's all nonsense, and the numbers are nonsense if you spend any real time considering it. The fact SoftBank is a major investor should be a dead giveaway.

by df2dd49 minutes ago|

parent|

[-]

Indeed. Many of the posts I see on here are hilarious.

Have any of you tried re-producing an identical output, given an identical set of inputs? It simply doesn't happen. Its like a lottery.

This lack of reproducibility is a huge problem and limits how far the thing can go.

by nfg5 hours ago|

parent|

prev|

[-]

> In other words, that usage you like is costing them tons of money

Evidence? I’m sure someone will argue, but I think it’s generally accepted that inference can be done profitably at this point. The cost for equivalent capability is also plummeting.

by JohnMakin4 hours ago|

parent|

[-]

I didn't think there would need to be more evidence than the fact they are saying they need to spend $600 billion in 4 years on $13bn revenue currently, but here we are.

Here you go: https://www.wsj.com/livecoverage/stock-market-today-dow-sp-5...

by tibbar4 hours ago|

parent|

[-]

Right, but if OpenAI wanted to stop doing research and just monetize its current models, all indications are that it would be profitable. If not, various adjustments to pricing/ads/ etc could get it there. However, it has no reason to do this, and like all the other labs is going insanely into debt to develop more models. I'm not saying that it's necessarily going to work out, but they're far from the first company to prioritize growth over profitability

by zippothrowaway4 hours ago|

parent|

[-]

Nope. The only "all indications" are that they say so. They may be making a profit on API usage, but even that is very suspect - compare against how much it actually costs to rent a rack of B200s from Microsoft. But for the millions of people using Codex/Claude Code/Copilot, the costs of $20-$30-$200 clearly don't compare to the actual cost of inference.

by javascriptfan695 hours ago|

prev|

[-]

What was the feature and what was the note?

by tibbar4 hours ago|

parent|

[-]

It was a modest update to a UX ... certainly nothing world-changing. (It's also had success with some backend performance refactors, but this particular change was all frontend.) The note was basically just a transcription of what I was asked to do, and did not provide any technical hints as to how to go about the work. The agent figured out what codebase, application, and file to modify and made the correct edit.

by javascriptfan693 hours ago|

parent|

[-]

That's pretty neat! Thanks for elaborating.

by tapoxi3 hours ago|

prev|

[-]

Yeah but was Cursor using Claude? What's the moat that any of these companies have that prevents me from using another LLM?