undefined

points

[-]

Agent mania setting in

It's also pretty funny sometimes how it gives weird future roadmap estimates ("part 2 - 3 weeks, part 3 - 2 months", etc.) and when you tell it to actually do those changes it's pretty much done in half an hour

by smith70181 hours ago|

parent|

[-]

I've long believed those numbers were faked by Anthropic/OpenAI to serve as a form of advertisement. The estimates are impossible to verify and their ability to do "2 days of work" in 10 minutes will presumably make the user go "Wow, I just saved SO much time!" Plus, the unnecessary text eats up the users' tokens so it helps the companies on the backend, as well.

by leodavi1 hours ago|

parent|

[-]

I agree with you that labs are benefiting from those outputs but I'm skeptical that labs are purposefully training the models to produce those outputs.

Raw pre-training data includes plenty of conversations between professional builders and some of those include estimates.

I believe the outputs are a training coincidence with consequences that are opportunitistic for the labs.

by Terretta23 minutes ago|

parent|

prev|

[-]

> the estimates

It doesn't estimate.

It generates tokens that read like estimates associated with the context in its training material.

What would you expect the generator to output instead?

by AgentMasterRace1 hours ago|

parent|

prev|

[-]

All the models have broken estimates. They're trained heavily on jira and GitHub tasks and issues, that's why their estimates are human.

by dizhn51 minutes ago|

parent|

prev|

[-]

All models do it. It's their training. They didn't have "a person does this in a week but an LLM could in a minute" in their training yet. They also don't have the concept of elapsed time unless you ask them how long something has taken.

by throw123456789139 minutes ago|

parent|

prev|

[-]

It repeats what it has seen in the training data. Expecting it to reason about the complexity of a task is a pipe dream. The best is to tell it not to come back with estimates, and when it does, remove them anyway.

by RussianCow2 hours ago|

prev|

[-]

Do you mean Flash and not Pro? I haven't tried it personally, but according to OpenRouter, the fastest DeekSeep V4 Pro providers are only ~50tps. That's slower than Claude Opus.

https://openrouter.ai/deepseek/deepseek-v4-pro?sort=throughp...

by sarjann1 hours ago|

parent|

[-]

I don't think token speed matters as much when a lot of tokens are needed to achieve a task. E.g. artificial analysis benchmarks where deepseek v4 is one of the biggest token burners to go through the benchmark.

by specproc2 hours ago|

parent|

prev|

[-]

Yeah, flash is crazy fast, but I've found performance variable.

by binary001024 minutes ago|

parent|

[-]

Flash is amazing if you know the domain really well.

E.g. occasionally it makes the dumbest mistakes you've ever seen and can't correct them. However it's fairly rare, and if you know the domain really well, occasionally popping in the code and pushing it towards the correct solution takes like 20seconds or whatever.

So the speed you can move with flash + high domain knowledge beats opus by a mile in my experience.

I tried to switch back to 4.8 for a bit when it came out, feels so bad waiting 20mins for a mediocre solution when I could have had everything complete - with multiple iteration cycles - in flash in like 3-5mins.

by flowbarai13 minutes ago|

parent|

prev|

[-]

[flagged]

by binary001029 minutes ago|

prev|

[-]

I exclusively use deepseek v4 flash now, completely stopped using slow models like Claude.

Basically I never have to wait - yes I have to tell it little corrections occasionally (but I know the domain really well so that's not an issue), but it's so much faster than anything else it's kinda crazy. I love the super fast speeds with high involvement development cycle.

I actually enjoy using agentic development flows for the first time now - whereas with Claude I absolutely hated it. That 5 to 20 min wait after every prompt absolutely killed my desire to even want to work at all.

by tmaly2 hours ago|

prev|

[-]

This reminds me of the Peter / Boris comments on writing loops to keep the agents busy.

by behnamoh58 minutes ago|

prev|

[-]

Same. How can DeepSeek serve the V4-Pro at such high speeds despite the sanction?

by 2 hours ago|

prev|

[-]

deleted

by 2 hours ago|

prev|

[-]

deleted