upvote
I've long believed those numbers were faked by Anthropic/OpenAI to serve as a form of advertisement. The estimates are impossible to verify and their ability to do "2 days of work" in 10 minutes will presumably make the user go "Wow, I just saved SO much time!" Plus, the unnecessary text eats up the users' tokens so it helps the companies on the backend, as well.
reply
I agree with you that labs are benefiting from those outputs but I'm skeptical that labs are purposefully training the models to produce those outputs.

Raw pre-training data includes plenty of conversations between professional builders and some of those include estimates.

I believe the outputs are a training coincidence with consequences that are opportunitistic for the labs.

reply
> the estimates

It doesn't estimate.

It generates tokens that read like estimates associated with the context in its training material.

What would you expect the generator to output instead?

reply
All the models have broken estimates. They're trained heavily on jira and GitHub tasks and issues, that's why their estimates are human.
reply
All models do it. It's their training. They didn't have "a person does this in a week but an LLM could in a minute" in their training yet. They also don't have the concept of elapsed time unless you ask them how long something has taken.
reply
It repeats what it has seen in the training data. Expecting it to reason about the complexity of a task is a pipe dream. The best is to tell it not to come back with estimates, and when it does, remove them anyway.
reply