upvote
This is the right question but hard to answer in practice. Shipped features vary too much in complexity to compare directly. What you can actually track is cycle time: time from ticket opened to merged PR. That's gone from ~3 days to ~6 hours on greenfield work for us.

The catch is that token spend and quality aren't correlated the way you'd expect. Low-spend months when I'm directing carefully and reviewing every diff tend to produce better code than high-spend months where I'm letting agents run longer chains. The expensive runs generate more code, not necessarily better code.

Jensen's $250k figure only makes sense if you're running dozens of parallel agents continuously. Most engineers are doing something more like augmented pairing. The unit economics are actually pretty good at $100-200/month per person. Beyond that you're hitting diminishing returns unless you've built actual agent infrastructure to parallelize and verify the work.

reply
We are way past this question now.
reply