For example, "Generate me some repeatable code to ask system X for data about Y, pull out value Z, and submit it to system W."
I hear what you're saying, but I think it's going to be entertaining watching people go "I guess this is why we paid Bob all of that money all those years".
But it stands to reason that would be a huge shift if a system accessible to non-technical users could mostly handle those edge cases, even when "handle" means failing silently without taking the entire thing down, or simply raising them for human intervention via Slack message or email or a dashboard or something.
And Bob's still going to get paid a lot of money he'll just be doing stuff that's more useful than figuring out how negative numbers should be parsed in the ETL pipeline.
That said, we should not underestimate the ability of companies to limp along with something broken and buggy, especially when they're being told there's no budget to fix it. (True even before LLMs.)
Is your AI not even doing try/catch statements, what century are you in?
Agreed. I've spent the last few years building an EMR at an actual agency and the idea that users know what they want and can articulate it to a degree that won't require ANY technical decisions is pure fantasy in my experience.
I don't think we'll get their by scaling current techniques (Dario disagrees, and he's far more qualified albeit biased). I feel that current models are missing critical thinking skills that I feel you need to fully take on this role.
If Opus 4.6 had 100M context, 100x higher throughput and latency, and 100x cheaper $/token, we'd be much closer. We'd still need to supervise it, but it could do a whole lot more just by virtue of more I/O.
Of course, whether scaling everything by 100x is possible given current techniques is arguable in itself.
Yea, we'll see. I didn't think they'd come this far, but they have. Though, the cracks I still see seem to be more or less just how LLMs work.
It's really hard to accurately assess this given how much I have at stake.
> and he's far more qualified albeit biased
Yea, I think biased is an understatement. And he's working on a very specific product. How much can any one person really understand the entire industry or the scope of all it's work? He's worked at Google and OpenAi. Not exactly examples of your standard line-of-business software building.