I find this somewhat puzzling. I thought things were moving quickly, but at this time last year I couldn't even get Claude (using Cursor) to spin me up a service skeleton that would compile, let alone do anything meaningful.
I know it feels like a long time somehow, but it was only between November and February that things started to actually somewhat work without significant hand holding. Even now, it seems like we're still figuring out how to fully leverage the current models and tooling, even in organizations that have largely gotten on board.
No. The very fact they are trying to "warn" us means it's all marketing.
This has been corroborated for me on the engineering front that I can't find a single IC I respect who actually thought there was any evidence AI was going to live up to the hype. I saw a lot of people I always thought were idiots/sycophants/brown nosers go insane with AI. Never saw anyone id trust to help me cross a street blindfolded say more that "I may be wrong, but I'm not seeing any evidence yet".
It can be massively over hyped for it's current capacity and decimate the white collar work.
A lot of the difference of opinion is down to their point of view. At my dayjob, LLMs will not live up to anything because the enterprise is not structured to take advantage of it's strength. That's unlikely to change within the foreseeable future.
I strongly suspect you mostly talked with people coming from just such a background, because it's hard to go beyond our own bubbles
I've been using it to do this for 2 years now. And many people with me. The change you mention is one of is primarily one of Overton windows, of vibes.
Ignoring instructions - whether in AGENTS.md or my prompt - is the worst of it, and it routinely happens. It just waives things that I explicitly told it to do as part of the design.
Vibe coders (in the true sense, zero oversight) claim that you just need to prompt it carefully. That's completely untrue when faced with your careful prompt being ignored.
I even have "don't overrule me without asking" in my global AGENTS.md, and it simply doesn't do that.
I try to avoid > 200k contexts, as the 1M context is where I first saw the massive decrease in reliability.
And my AGENTS is really short, and I said it was ignoring decisions in the prompt.
You’ve been sold something that simply doesn’t work for the purported use case (intelligence) and instead is like a stupid database of all world knowledge with the appearance of intelligence.
Useful tools at times (if you bear in mind their limitations), but not close to intelligent, independent agents.
A "stupid" database would be better, based on what I get when I ask whether all of Oregon state is North of New York City. Indian English has a word for it: oversmart.
You really need to look into hooks based on your coding agent. This is very much a solved problem as I demonstrate with
https://github.com/gitsense/pi-brains
I have a test repo
https://github.com/gitsense/gsc-rules-demos
that shows how you can block and warn and do other things.
You obviously can't have a "Don't make a mistake" rule though.
The agreed architecture is to use signing between two micros, so that a third can orchestrate between them in zero trust way (and to prevent a distributed monolith). It just decides that we can trust the third and skips the signing.
Basically I treat it like a junior dev. We don’t get junior devs to write code correctly by cajoling them just right, we add CI gates. It still works.
Architectural decisions are not lintable.
Im not certain things will look too different a year from now either. We still have serious bottlenecks in terms of focus/attention you have for both delegating agent work and being able to review it. Even if we solve the "trust what ai does" problem, these cognitive deficit issues still exist - for teams coordinating work, even users adopting new shit, etc.
As an industry we are leaning heavy into accepting "slop" as the status quo - we care more about efficiency of output right now. Slop will get better & we can become more adaptive to living with the paradox of amazing yet delicate systems generated by AI. But I feel big shifts coming in this regard and if/when it does we may find ourselves in the dystopia of broader unemployment with worse net outcomes.
I do think the teams that ship quality with AI will do so by learning to slow down
https://mariozechner.at/posts/2026-03-25-thoughts-on-slowing...