undefined

points

[-]

I had been saying this on HN repeatedly: people are going to use the smartest models for coding. They don't care how cheap your tokens are if they don't have the highest probability of solving your programming tasks.

And I was dead wrong. Now I mostly use DeepSeek Pro myself.

by vb-84484 hours ago|

parent|

[-]

> people are going to use the smartest models for coding. They don't care how cheap your tokens are

I actually think that's still true and will continue to be true as long as someone else subsidizes the tokens. Once the "free money" runs out, things will get interesting.

by jstummbillig2 hours ago|

parent|

[-]

Including for DeepSeek you mean.

by Someone123430 minutes ago|

parent|

[-]

Yes including for DeepSeek. But while DeepSeek Pro doesn't run on other people's infrastructure, several other Chinese models you get competitors competing to offer them on price.

We'll see how it winds up, but we could see models get licensed over half a dozen+ compute vendors, and then you pick your price/offering/features favorite.

by 6AA4FD13 hours ago|

parent|

prev|

[-]

Props for making a falsifiable claim, noticing it was falsified, and owning up to it.

by weitendorf21 hours ago|

parent|

prev|

[-]

I pretty strongly feel the opposite way. Granted I have not used deepseek enough to “know” their model idiosyncrasies as well as Anthropic, so there is a partial skill issue. But I just find it really hard to justify using a less powerful model while I work.

The most I’ve ever spent in a month extra on API tokens for my own work is $200, and I pay for the $200/mo Claude. I use these models quite a lot, though not idly (I usually just walk around and do other stuff until I know how im going to approach the next set of problems). So it costs me about $3000/year to get as much as I want of the best model available. Already that seems low enough to not be worth stressing out too much about optimizing it, because it feels like an indisputable good value, and trying to save money with a less powerful model would be optimizing for a $1000-$2000 saving at the expense of a large portion of my work taking longer or being more frustrating and iterative.

That’s not a flex or anything, I get that in other countries $3000/yr is a lot of money for a software developer and also a lot of people would perhaps rationally be better off doing X% worse at work or spending Y% more time on tasks to save $Z, if their productivity improvements didn’t translate to more salary. Otherwise if your performance has more upside I really do think that the smartest models are better with the current pricing scheme. Deepseek and the other Chinese models spend a LOT of time thinking, and tend to be much more jagged (benchmaxxed) in performance. How can dealing with that over an entire year be worth $2k?

The only situation I can think of where sacrificing my own time/performance to save on inference is batch compute (of course, $1k vs $100k is different from $30 vs $3k) or work where the tier 2 models have crossed the “good enough” threshold. But I think Opus is not even close to that threshold generally yet. As it gets smarter I, and I think most others probably, just try to do harder things faster and hit the next wall.

by chrsw12 minutes ago|

parent|

[-]

I agree. My company pays for my tokens so I use the best models I can. I'm more worried about the quality of the work and the speed of accomplishing tasks than I am on saving the most money on every token.

Now, if they come back and tell me I can't spend as much om tokens, I'll have to change my strategy. But everything I'm hearing so far is we're going to be increasing our token spend this year and probably next year too. Not crazy increases but maybe enough to still keep using the latest models without being anxious about every prompt.

by 59nadir17 hours ago|

parent|

prev|

[-]

Not even SotA models are good enough to generate code (beyond functions or small, very simple modules) that I'd be happy shipping, so I've decided to just not have them do that. And given this, it has basically turned out that what's left is information gathering + analysis + design overview stuff.

I've just recently started trying out DeepSeek 4 Flash and I was very skeptical at first because I've had some really good experiences with GPT-5.{4,5}, and couldn't possibly believe that this model they charge nothing for could give me similar results, but it absolutely shreds through things and ends up giving me very good answers in almost no time. I also like that it doesn't really seem to have much personality, it's given me mostly just facts and data so far without any additions to the prompt by me.

In my own agent I also specifically prompt to remove flowery language, snark, etc., but I haven't tried it with models like GPT-5.x which I've found has too much personality and tries to make it seem like I'm talking to a human too much.

by solenoid093720 hours ago|

parent|

prev|

[-]

I feel similarly. I'll gladly pay to use the most intelligent model I can find on the best harness I have. Sometimes this is GPT Pro, sometimes this is Opus.

I ask AI a lot of questions, not only about code but about my personal life, and I would be willing to pay very large sums to have the best quality output.

by jhonof20 hours ago|

parent|

prev|

[-]

I think that's true for now, but eventually there will reach a point where a model is good enough (approaching that right now with frontier models) and there will be diminishing returns. I don't need a PHD level Genius to build me an analytics dashboard for example, so why would I pay for a model with that level of intelligence when I can (eventually) self host a good enough model and run queries for electricity cost + hardware.

by evilduck8 hours ago|

parent|

[-]

I think we are approaching that now, with correct expectations. With frontier large models you can often one-shot tasks with vague prompts for stuff like creating CRUD APIs and dashboards around a simple data model since it's such a solved-problem now. With something like Qwen3.6 27B or 35B-A3B and a Strix Halo level computer or a MBP with 32GB or more or RAM, you may need to be more explicit and stay involved and be a little more patient, but you can absolutely get work done with it or delegate tasks to it successfully.

My Framework Desktop does a lot of similar work as my Claude subscription at work (Cowork, chats) for 100W of power draw and a little patience waiting for a slow GPU with limited memory bandwidth to crunch the numbers. Agentic coding is obviously weaker but CRUD development and visualization dashboards are within reach, and I'm usually pleasantly surprised at its ability to self-manage devops.

by SoftTalker20 hours ago|

parent|

prev|

[-]

You pay $3k/year for personal use? Or out of your own pocket but for your job?

by pizzafeelsright3 hours ago|

parent|

[-]

I started paying $100/month a few years ago to now ~$5k a year out of pocket for personal use to learn and grow in my position at work.

by weitendorf20 hours ago|

parent|

prev|

[-]

It's through my startup, so both I guess. Generally I find my bottleneck to be attention and focus, and the opportunity cost of not going back to work at my prior employers absolutely dwarfs the amount of money I spend on tools, so it's not hard for me to justify spending $200/mo on something I use every day that makes me more productive and generally removes bullshit from my life.

At my prior job there was still what felt like a strong enough correlation between my actual performance and my pay that I don't think I would have had a hard time justifying the expense there either; now I absolutely don't. With the current state of the models, it's baffling to me to hear about professional software developers planning their work around their $20/mo subscription's quotas.

Obviously it's more complicated than more tokens = more productive, but I see them less like SaaS and more like gasoline, where if I run out or need more to do what I'm doing, as long as I'm not being wasteful, I just buy more. Why would I waste a day walking 30 miles by foot when I can just pay $5 for gasoline and drive?

by yyhhsj052119 hours ago|

parent|

prev|

[-]

I do that for personal use too (although $2.4k/yr for me because I only have an Claude Max subscription). Outside of my hobby projects Opus also manages my personal accounting, researches and organizes info (travel plan, what to buy and where to buy, etc), helps me reply to emails when I'm working in the kitchen, etc. I consider it well worth the price. Tbh I'm willing to pay more than what I currently do, but competition is good for the consumers.

by surgical_fire20 hours ago|

parent|

prev|

[-]

I thought the same way until I tried DeepSeek. I am genuinely impressed at how capable it is.

by KronisLV18 hours ago|

parent|

prev|

[-]

> And I was dead wrong. Now I mostly use DeepSeek Pro myself.

I've wasted over a hundred Euros re-doing work that was done badly due to the model not being up to task (Vue with TS + wrapper components around PrimeVue, needing to handle event and property passthrough and deal with the stupid Vue SFC issues, TS made this much worse than JS would be). I think it was the GLM model through Cerebras Code at the time, in addition to some GPT and Gemini models with the API pricing.

That said, DeepSeek V4 Pro is pretty good and I can totally see myself offloading some of the work, as long as a better model reviews the work and provides suggestions/tests for it.

by simplyluke21 hours ago|

parent|

prev|

[-]

The other thing that's changing is more and more CFOs are looking at the AI spend in engineering departments and hitting the brakes. Token leaderboards were cool when the spend wasn't a double-digit-percent of the entire department's budget including salaries.

by zuzululu7 hours ago|

parent|

prev|

[-]

you weren't wrong your tasks/problems didn't warrant a frontier model and it was always solvable with a cheap chinese model

doesn't invalidate the rest of us working on tough problems that demand more expensive models and valuable enough to justify it

by bachmeier20 hours ago|

parent|

prev|

[-]

Your comment is a slice of the reasoning underlying the "AI will take all the jobs" claim. I would constantly see references to what AI could do and how fast it was improving. Never a word about cost. We should anticipate that there will always be demand for human labor, for cheap models, for local models, and probably even frontier models.

by jwitthuhn20 hours ago|

parent|

prev|

[-]

Yeah I've also found that models are good enough that the extra spend on premium models isn't always worth it, particularly for my small personal toy projects.

A $20 claude sub goes a long way when you plan with Opus and execute with Sonnet.

by dcchambers21 hours ago|

parent|

prev|

[-]

I think two things happened:

1. The sheer number of tokens that a coding agent can use flipped the math upside down on this equation. If you use the most expensive model for everything those costs quickly become untenable, even for software companies.

2. We realized many of the coding problems we're solving aren't incredibly difficult.

by peheje21 hours ago|

parent|

prev|

[-]

I mean indsight is 20/20, but saying that is like saying "everyone will just use the best tools". That's not what we see most places in the world for most types of resources.

by sergiotapia11 hours ago|

parent|

prev|

[-]

You should try Composer 2.5 within cursor. It's so fast, shockingly fast. Going back to gpt/claude is like using dial-up. And it's great for code work. So far nothing has really tripped it up backend, frontend or reporting metabase dashboard stuff. It's nuts.

by SoftTalker20 hours ago|

prev|

[-]

> CFO/CTOs might find out that deploying on an internal cluster of GPUs is far more cheaper and reliable

I think you're right especially if you're someplace that already has a data center, such as a university. Solves a lot of privacy concerns as well.

by ok12345621 hours ago|

prev|

[-]

Qwen3.6:35b is good enough for a lot of stuff.

I just used ollama with a shell script to tackle my directory of papers/literature. I converted the first 6 pages of each document to PNG, handed them off to Qwen, and told it to spit out BibTeX, including the abstract. Two days later it was done, and I didn't spend anything on "tokens."

by abyssin4 hours ago|

parent|

[-]

Why PNG? Isn’t an image format more expensive to process?

by raylad4 hours ago|

prev|

[-]

Possibly a deliberate strategy by the Chinese to undermine the US AI industry, data centers, and basically everything that’s powering the economy.

Just like they did with the US steel industry in the 80s.

by mariopt20 hours ago|

prev|

[-]

I’ve been using Kimi 2.6, GLM 5.1 , Minimax 2.7 and lately deepseek. I only spend 40$ a month and I don’t see the point in paying for Opus/Codex.

Chinese models are really quite good at a lot of stuff.

by fittingopposite9 hours ago|

parent|

[-]

Which harness?

by mariopt2 hours ago|

parent|

[-]

I use opencode with all of them except Kimi, I noticed Kimi performs better with kimi-cli and also save a bit of quota.

Z.ai does recommend to use claude cli as a harness for GLM5.1, I still get good results with opencode.

by replwoacause11 hours ago|

prev|

[-]

Anybody know what the most capable Chinese model is that can be used in production and is cheaper than US frontier models? Would that still be Deepseek? My interest is getting as close to Gpt5.5 or Opus quality as I can get, but for less $.

by reppap15 hours ago|

prev|

[-]

The problem with going for open source models is that you are betting on some third party to keep doing expensive model training and releasing it for free, forever. What do you do if deepseek never release another update to the model?

by julianlam8 hours ago|

parent|

[-]

I continue to use the model I downloaded... for free?

by surgical_fire20 hours ago|

prev|

[-]

I am having some great experience with DeepSeek. In fact, it seems to perform better than Claude or Codex in my use case.

I don't see myself returning to Claude or Codex anytime soon.

by ihsw19 hours ago|

prev|

[-]

[dead]

by pants221 hours ago|

prev|

[-]

The Chinese models are only cheap on subsidized Chinese hosting. I have yet to find a USA-hosted Chinese model with a very clear value advantage over US models.

by wg020 hours ago|

parent|

[-]

No true. Also - put Deepseekv4 Flash on your local with effort set to "high" and you'll see that many many are using that model on their own machines without paying anyone anything.

Its just that some of us didn't imagine having GPUs would be advantageous and were not gamers on the side. Those who had beefy GPUs or GPU rigs for any reason, they rarely need to go anywhere else.

At least I am so impressed with Deepseekv4 AFTER using Claude Opus 4.7 for significant amount of time that I am not going anywhere but Deepseekv4.

The model is just INSANE. Things I have done with it include attempting to write a 2.5D game engine in C with full animation and map rendering layer by layer.

by pants220 hours ago|

parent|

[-]

You'll need to spend at least $20K on a workstation that can run DS4 Flash. It would take ages to reach that much in token spend at the speeds it runs at, and if you factor electricity costs you will likely never break even vs using API.

by weitendorf19 hours ago|

parent|

prev|

[-]

There are basically two tiers of "Chinese models" in this context, the "edge" sized ones with ~30B parameters or less, and the big ~1T models that can basically only run in the datacenter.

I don't think it's as simple as saying China's hosting is subsidized, they have generally cheaper electricity and labor costs than in the US and don't have access to the top tier models, and a large internal market where the big models are the best thing they can run with what they have. So obviously they max out on their top models (which are trained with their hardware market in mind, not ours) and get the economy of scale from that, and can run generally the same hardware for less money than in the US because

The edge models are very cheap to run and can do so on inexpensive hardware. They are like 95% cheaper to run than Haiku, so the math is in their favor for certain batch workloads. Most people just run the models for themselves when they do that without making it available on openrouter or whatever, because you can just provision a gpu node and use it as needed, and it's not that expensive to run this family of models.

Is your problem that you want to call Chinese models hosted in the US because you're worried about the data handling?

by pants219 hours ago|

parent|

[-]

I obviously don't know the full economics of the Chinese-hosted models, but estimates[1] put the cost of hardware (servers + networking) at 70-80% of the total cost. Those things aren't meaningfully cheaper in China, so serving DeepSeek at 1/3 the cost of the cheapest US provider doesn't really compute unless it's heavily subsidized or we believe that Chinese engineers are just that much better at optimization.

Edge models, yes, they can be convenient to run batch jobs locally. I still would argue there's no economic benefit over paying for models. Haiku has a bad price/perf but others in that class are significantly cheaper in hosted APIs.

Doesn't matter what I think, the reality is that the majority of enterprises (where the real $ comes from) will not consider sending their data to China.

1. https://epoch.ai/data-insights/ai-datacenter-cost-breakdown

by torginus16 hours ago|

parent|

[-]

Hardware is arbitrarily priced, with the floor being as little money as it costs to make it, and the ceiling being how much competitors are willing to pay for it - the latter is much more of the driver of current pricing in the West than in China.

In a free market, the country would not matter, but Chinese models are often running on domestic hardware which does not directly compete with Nvidia GPUs and thus they can't get away charging as much for it.

by fittingopposite9 hours ago|

parent|

[-]

Numbers?

by ekidd21 hours ago|

parent|

prev|

[-]

The Chinese models are surprisingly cheap and performant sitting under my desk. Qwen3.6 27B is nowhere near as autonomous as Opus 4.7, but it runs in 24GB of VRAM. And it's actually great for the use cases where I'm going to carefully read and understand all the code anyway.

If you want to support a team of engineers, DeepSeek V4 Flash is antirez's current favorite. And you could support a team of engineers pretty nicely for $40-50k. Which might not make sense if you're on a Claude MAX 5x plan or the old enterprise group plan with fixed price seats. But Anthropic is switching their enterprise contracts over to token-based pricing, at which point $50k is looking pretty good.

by joshhart10 hours ago|

parent|

prev|

[-]

Fireworks will serve them for $1.74 / $0.14 / $3.48. That's input / cached input / output. https://fireworks.ai/models/deepseek-ai/deepseek-v4-pro . Call it about a third the price of Sonnet.

Not nearly as cheap as the Chinese infra but still pretty cheap.

by harsh319520 hours ago|

parent|

prev|

[-]

You can find them on Deepinfra. Palo Alto company. Similar cheap price.

by pants220 hours ago|

parent|

[-]

Not similar. DeepInfra[1] has DS4 Pro pricing at $1.30/$2.60 which is 3X the Deepseek[2] (Chinese) hosting at $0.435/$0.87. DeepInfra is also very slow at 37 t/s and uses an FP4 quant[3], so intelligence will be degraded slightly.

Meanwhile you could use Grok 4.3 for the same price which is smarter and 5X faster[4].

1. https://deepinfra.com/pricing

2. https://api-docs.deepseek.com/quick_start/pricing

3. https://artificialanalysis.ai/models/deepseek-v4-pro/provide...

4. https://artificialanalysis.ai/models/grok-4-3

by wirybeige18 hours ago|

parent|

[-]

DS4 Pro/Flash were post trained with QAT, so they are already quantized to FP4 for the most part. That's why when downloading the weights, they are much smaller than what their weights at fp8 or fp16 would be. For example, Flash is a 284B model, but its GB size is only ~160GB. OFC maybe DeeppInfra went even further, but there is no proof of that.

by pants210 hours ago|

parent|

[-]

Interesting then that OpenRouter[1] tags many providers as FP8 and DeepInfra as FP4.

1. https://openrouter.ai/deepseek/deepseek-v4-pro

by __mharrison__21 hours ago|

parent|

prev|

[-]

Odd take. I'm running them locally at my desk (DGX Spark and 128GB MBP). They work fine for 90% of what most folks do. Admittedly, they do run slower on my hw than on the cloud.

by pants221 hours ago|

parent|

[-]

Running them locally is cool and has privacy/autonomy benefits, but you can't really make a value case for it. Guaranteed if you run the math you will never run enough inference to pay off your hardware vs buying tokens. Last time I ran the math on my MBP I'd have to run inference 24 hours a day for 5+ years to pay off the cost of my MBP, not accounting for electricity costs.

by iooi21 hours ago|

parent|

[-]

Is this because of the tok/s? Since it's pretty easy to run up a $5k bill in API usage for Claude/ChatGPT in a month.

by pants221 hours ago|

parent|

[-]

Yes, because of the limits on tok/s, and you have to compare apples to apples, not Gemma 27B to Opus 4.7.

by hedora20 hours ago|

parent|

[-]

Assuming the local models get the job done (e.g., you adjust your workflow so that you can run the local machine 100% all the time, or whatever), then the time to payback isn't very high. MSRP for a 128GB AMD was $1400 at launch. That's 7 months of claude code subscription. If you assume a 5 year depreciation cycle, you can buy a cluster of 8 such machines and still come out ahead. (Power is a few hundred watts per machine peak -- maybe 7 machines if you include electricity.) Of course, I'm assuming non-bubble numbers. Those boxes are like $3K now. Still, a normal person would probably not buy 8 of them at once. Instead, they'd space out buying a machine every few years as the technology improves.

For me, things are getting better faster than my ability to review / trust the resulting code, so tok/sec isn't a bottleneck anymore. Instead, quality of the tokens is the bottleneck. That points to me wanting a 1TB DRAM iGPU once they're available at pre-bubble RAM pricing.

by pants220 hours ago|

parent|

[-]

You're comparing the highest tier Claude subscription to something Qwen3.5-122B-A10B running locally, apples to oranges.

If you compare to a smarter US model like Grok 4.3, $1400 will pay for 560M output tokens, which at ~25 t/s locally using it nonstop for 8 hours a day would take two years to pay back. Not accounting for bubble prices or electricity.

by __mharrison__18 hours ago|

parent|

[-]

Is the goal maximum t/s?

According to openrouter, Opus 4.8 is 128 t/s. So 10x faster than my antirez/ds4.

by slopinthebag19 hours ago|

parent|

prev|

[-]

The value of not having a reliance on a third party company, and not needing an internet connection, and having total privacy: ∞

by fragmede18 hours ago|

parent|

prev|

[-]

Just have to put some numbers on privacy and autonomy. What's the fine to my company if I get hacked and leak all my customer's PII? What's the cost in productivity lost if OpenAI/Anthropic/Google decides to suspend my account for an unknown reason?

by slopinthebag19 hours ago|

parent|

prev|

[-]

Huh? They're several times cheaper than SOTA models at market rate prices.

by pants219 hours ago|

parent|

[-]

If you are only looking at US hosting providers, models from US labs easily meet or beat models from Chinese labs on the same intelligence level. I'm not comparing DeepSeek with Opus because those are on different levels of performance.

by slopinthebag18 hours ago|

parent|

[-]

Deepseek v4 Pro on US hosting is like 1.5x cheaper and 5x cheaper on input/output compared to Sonnet, and that's not even a fair comparison because Deepseek is much stronger than Sonnet. It's more reasonable to compare with Opus 4.5, which is much more expensive.

by pants215 hours ago|

parent|

[-]

Sure but you can also look at Grok 4.3, which is smarter and faster than DeepSeek at the same price point.