It's also pretty funny sometimes how it gives weird future roadmap estimates ("part 2 - 3 weeks, part 3 - 2 months", etc.) and when you tell it to actually do those changes it's pretty much done in half an hour
Raw pre-training data includes plenty of conversations between professional builders and some of those include estimates.
I believe the outputs are a training coincidence with consequences that are opportunitistic for the labs.
It doesn't estimate.
It generates tokens that read like estimates associated with the context in its training material.
What would you expect the generator to output instead?
https://openrouter.ai/deepseek/deepseek-v4-pro?sort=throughp...
E.g. occasionally it makes the dumbest mistakes you've ever seen and can't correct them. However it's fairly rare, and if you know the domain really well, occasionally popping in the code and pushing it towards the correct solution takes like 20seconds or whatever.
So the speed you can move with flash + high domain knowledge beats opus by a mile in my experience.
I tried to switch back to 4.8 for a bit when it came out, feels so bad waiting 20mins for a mediocre solution when I could have had everything complete - with multiple iteration cycles - in flash in like 3-5mins.
Basically I never have to wait - yes I have to tell it little corrections occasionally (but I know the domain really well so that's not an issue), but it's so much faster than anything else it's kinda crazy. I love the super fast speeds with high involvement development cycle.
I actually enjoy using agentic development flows for the first time now - whereas with Claude I absolutely hated it. That 5 to 20 min wait after every prompt absolutely killed my desire to even want to work at all.