undefined

points

[-]

I'm sure that explains some of it but I really don't think it explains most of the people who have been AI-pilled in the last nine months. There was no amount of context I could give GPT-4o that would make it a net benefit to use that for agentic development. I tried it with quite sophisticated prompt systems and much simpler ones, compendiums of code & business analysis and sparser ones. Yet it just wasted my time - still there were people using Cursor with that model and saying it was life changing. I didn't have that experience until Opus 4.5 - its possible I could have had it earlier but that was when I happened to try it again.

by ghshephard10 minutes ago|

parent|

[-]

I think many of the people who have become "AI Pilled" (I'll include myself here) had it happen in the last 3 months. Even over the Christmas break, when the Wiggums loop got so much coverage - I still wasn't that blown away going into January/February- 50%+ of the time I'd just write the code myself. I like coding.

But - I don't know if it was April, or May - but very recently - the coding harnesses paired with decent SOTA models like Opus 4.8/GPT 5.5 - just started showing a lot more consistency, and completeness, and sometimes downright clever behavior - that they started to become way more useful.

Just one out of hundred+ examples - I gave Claude Code (Opus 4.8 High) a complex task that involved consul, vault - but I had neglected to give it sandbox permission to download from hashicorp.com. So - it created a entire test harness that simulated both the behavior of Vault and Consul - created all it's test cases, verified that they passed - and when I came back 40 minutes later said that it was all done.

It's test harnesses so accurately simulated the behavior of Vault/Consul - that on first try - no refactoring whatsoever - all of the protobuf/AESGCM/API behavior (that has varied significantly between versions) - worked.

This was something that would have taken me, someone super super familiar with the code and tools and APIs - a minimum of 3 solid days of work - and that would likely involve hundreds of attempts and refactors as I unwound all the weird encryption and packaging layers. It zero-shotted a full solution without having an API to test against

If these agents actually have an actual test-harness - It's honestly hard to imagine what they can't do - subject only to imagination and budget at this point.

Speaking personally - something changed Between January and, Let's say May - in which instead of seeing these things as mostly interesting technology demonstration, in which the flaws outweighed the benefits - I now genuinely think they are the future of programming. I'm dubious that I'll write much software manually in the future - beyond what I do for personal pleasure.

by jmalicki41 minutes ago|

prev|

[-]

Which way do you think that goes? Are the ones who "get it" the ones who are captivated or see them as incremental?