I don't think we'll get their by scaling current techniques (Dario disagrees, and he's far more qualified albeit biased). I feel that current models are missing critical thinking skills that I feel you need to fully take on this role.
If Opus 4.6 had 100M context, 100x higher throughput and latency, and 100x cheaper $/token, we'd be much closer. We'd still need to supervise it, but it could do a whole lot more just by virtue of more I/O.
Of course, whether scaling everything by 100x is possible given current techniques is arguable in itself.
Yea, we'll see. I didn't think they'd come this far, but they have. Though, the cracks I still see seem to be more or less just how LLMs work.
It's really hard to accurately assess this given how much I have at stake.
> and he's far more qualified albeit biased
Yea, I think biased is an understatement. And he's working on a very specific product. How much can any one person really understand the entire industry or the scope of all it's work? He's worked at Google and OpenAi. Not exactly examples of your standard line-of-business software building.