Doesn't mean the tools aren't useful — it means we're probably measuring the wrong thing. "Prompt engineering" was always a dead end that obscured the deeper question: the structure an AI operates within — persistent context, feedback loops, behavioral constraints — matters more than the model or the prompts you feed it. The real intelligence might be in the harness, not the horse.
And scaffolding does matter a lot, but mostly because the models just got a lot better and the corresponding scaffolding for long running tasks hasn't really caught up yet.
And even say the latter strategy works - ads are driven by consumption. If you believe 100% openAI's vision of these tools replacing huge swaths of the workforce reasonably quickly, who will be left to consume? It's all nonsense, and the numbers are nonsense if you spend any real time considering it. The fact SoftBank is a major investor should be a dead giveaway.
Have any of you tried re-producing an identical output, given an identical set of inputs? It simply doesn't happen. Its like a lottery.
This lack of reproducibility is a huge problem and limits how far the thing can go.
Evidence? I’m sure someone will argue, but I think it’s generally accepted that inference can be done profitably at this point. The cost for equivalent capability is also plummeting.
Here you go: https://www.wsj.com/livecoverage/stock-market-today-dow-sp-5...