I think the frontier labs will need to drop their high per-token prices at least for their low and mid-level models for the reason that several Chinese models (at least Qwen, DeepSeek, Kimi and GLM) are "close enough" that with the right harness they are cost effective alternatives.
They won't necessarily need to close the gap - at least not yet -, because these models won't necessarily compete at the same token counts. E.g. at least some of them need to do far more work to solve the same problems.
But, yeah, the prices will come down one way or the other.
At the same time, even the subscriptions for the cheap Chinese models are probably subsidised, and those subscriptions are likely to get less generous over time.
Not only that, China may subsidize AI, but so does the US.
My rates (before PG&E were forced to concede) were as high as 49¢/kWh, a 7x factor.
These are residential rates and not industrial ones, but I hope my point is clear.
China has very cheap power compared to the US, there's a reason why they had to ban bitcoin to get rid of miners.
"Mean wholesale electricity prices in 2024 were lowest in SPP ($27.87/MWh), the Southeast ($29.72/MWh), and Southern California ($29.95/MWh), and highest in the Northwest ($59.98/MWh)."
https://www.ferc.gov/sites/default/files/2025-03/25_State-of...If my math is right, divide those by 10 for cents per kWh
https://www.bbc.com/news/business-58733193 https://www.cbc.ca/news/business/china-power-cuts-1.6193281
So you have on one end the token revenue trending down, on the other end the training cost going up for the next frontier models, and you need to pay back your 10y debt.
Not necessarily, the bond holders could simply take a massive hair cut and lose shitloads of money. On the topic of bubbles and exuberance, Jeff Bezos made the salient point that there was a massive over-invested biotech boom in the 1990s and tons of sophisticated investors ended up losing lots of money. But humanity still kept the medical advancements made by the boom. Stocks going down didn't un-research drugs, and it won't un-research new GPUs or un-build datacenters.
Drugs cost pennies to manufacture after they are researched and make their way through the approval pipeline. There are many generic drug manufacturers who can work off the existing formulas.
The more apt comparison is that LLMs won't be un-trained. Opus 4.8 now exists. Even if Anthropic somehow went bankrupt, that particular asset could, at the very least, be sold for proverbial pennies on the dollar to a "generic" inference provider.
If a bankrupt AI company maintains enough of a skeleton crew to consolidate and archive its intellectual property it could be sold off to another company, but there are also timelines where it all ends up digital dust in the wind.
Only if that skeleton crew had deep deep pockets. If Anthropic closed their doors tomorrow because the market collectively saw that AI was not profitable and so open sourced everything, there wouldn't be any money to train Opus 5.0... it would then have to fall on governments to put money into the hat (which I can't see happening unless it was Europe)
Hardware fails, and also scales out in terms of efficacy to run it as more power efficient, modern hardware turns up. It requires constant investment to keep it useful, and cost efficient
When AI pops, we'll temporarily have some extra compute capacity that will be horrendously uneconomical to run due to the high grid load and low consumer demand, before they get shutdown. There's simply no real use for them at this scale
It’s really not obvious the infrastructure we are building for AI stuff is something that will benefit humanity over time.
Without talking about the fact that bubbles are extremely destructive. Bezos is obviously someone who came out ok from the dotcom bubble but we are talking about something that destroys a lot of value globally. That has real, direct consequences, not just investors losing some money. The US economy is currently only growing because of the AI bet
Inference is much cheaper than training a new model, so running them just for inference is a completely different thing than having to price in the fact that at the moment all of these companies need to compromise between compute for inference and compute for training new models. If no new models were to be trained, and all the compute was inference only, that would change everything when it comes to the overall compute cost of AI.
Dotcom infra buildup is a bad comparison, in that it wasn't even close to being all utilized. The infra was completely overproportional to the day to day usage.
If all these other data centers were anywhere near coming on line, that 300mw data center would be a rounding error not a line item as it is right now.
So someone's signed contracts for way more and way larger data centers, someone's purchased billions in hardware for these not yet operational data centers. I'm wondering how depreciation's going to work on all these assets...
Anyhow, I'm not really sure what "max capacity" is here, nor am I really aware when they're going to be delivering the operational assets that are currently levered to their eyeballs and consuming 1/3rd of the memory made on the planet.
As far as inference vs training, have new gotten radically better than old models or only marginally (at the cost of 10x or more the training costs)?
Very exciting stuff.
With investing timing matters a lot.
Replace servers with regular compute.
If the AI industry collapses, it would seem like the price of DDR etc. would dramatically decrease and lower demand for remote gaming
These AI "GPUs" are worse for gaming than even the crappiest actual GPUs (with a G as in Graphics). Also, the display drivers won't support them, not officially at least.
The feature being bundled in with GamePass makes it worth it. I used to VPN home and try and run games remotely, but it was honestly a bit of a pain. Just pressing a button and having the game launch is quite nice.
You just run the models and sell the tokens. The demand will still be there even if there will be less money in chasing new frontier model
> GPU are pretty specialized hardware, without AI a data center full of outdated graphics cards isn’t really too valuable.
AI accelerators used in DC are not really "graphic cards" any more, you ain't running gaming on it
I think the lighter 40 series cards like L40 still have OK graphics features. But otherwise yeah, after the Ampere generation graphics features went down the drain. The A100 and A40 cards can do graphics well but it already makes no sense in terms of power-to-performance ratio.
I could imagine something like “inference is done at home or in China, that’s the price to beat” and it’s not worth keeping all those GPUs cool out in Nevada.
The fiber laid during the dotcom bubble never paid back the investors or lenders, but it's still profitably connecting customers all these years later.
Big AI investor tells us that investing in AI is good. Oh, the surprise!
Does that invalidate this point? Yes. Because it makes no sense. The big money is not going to R&D but to build infrastructure that will be outdated in 5 years.
Big money is going to build infrastructure which is fundamentally required for R&D. They aren't separate, they are the same thing. It sounds like you're complaining that Pfizer isn't investing in drug research, they are buying mass spectrometers and micron fidelity microscopes. Same thing!
> „[AI vendors are] paying for a fixed cost with a depreciating commodity“
That's just a confusing way to say you don't think future models will be worth the development costs. Because if future models are significantly better, why would the price of tokens to access those models deprecate?
e.g. an interesting possible canary in this coal mine is that there’s been a 200% increase in the rate of new apps appearing on Apple’s App Store, but it has not been accompanied by a 200% increase in the rate at which people are buying apps.
I don't believe this aligns with the reality of any major company, unless your business is in the literal sense "selling code" your revenue and profit is tangential to the quantity of code you produce. Google is a good example of this: most of their revenue and profit comes from their ad network, which is disconnected from their development productivity and instead heavily reliant on network effects and time in market. If I was a new competitor with infinite AI funds to throw at whatever problem I choose, I can't simply capture their market by developing an exact copy of Google's ad platform. In the same way, Google can't substantially grow their ad network by coding "more" or "better", they still need more customers and consumers to interact with their network to see any increase in revenue.
So it doesn't directly follow that a productivity increase will inherently follow an AI usage increase.
what makes YouTube YouTube is not the video player it’s the servers that can handle petabytes of uploads a day and billions of views. YouTube software wise, is no different from the 100s of porn websites that are coded by small European teams
‘uber for my industry’ is not a sensible business strategy
Honestly, if you know guys whose bottleneck is pure software dev — please let me know, I have a good, experienced team in Eastern Europe, we can do wonders in product development. But coming up with sensible business ideas and executing on them in the real world is crazy hard and extremely rare.
That would be half a trillion[1] redirected to regular people just from Google Ads.
[1] snatched my number from here: https://pixis.ai/blog/2025-google-advertising-benchmarks-for...
An AI generated man talking about his product building journey to make a pressure washer hose that didn't need power (in the AI video it didn't even have a water supply connected!) that was going to be banned in a week because it was too powerful so buy now.
I've seen AI slop before and scam ads before but the combination of the two gave me some real tingly spider-sense that things are going to get worse and that some unethical people will make a lot of money from it so be in no hurry to stop it.
You can't consider it in vacuum. AI takes limited resources. So far it winded up cost on near every consumer electronics that runs an OS, and it winded up cost of energy that is used by the entire industry and every single customer
It's not just the cost of datacenters, it's cost of infrastructure (that given current direction of US govt will just be paid from people's fucking taxes and bills..) and cost of other industries turning outright unprofitable "thanks" to demands of AI
- most tasks do not require the latest frontier models, even if they are a magnitude more intelligent (we don’t actually know if that will be the case). Current Gemini flash is cheap, fast, and pretty capable with good guidance for most tasks
- now that companies pay API costs instead of a subscription they will be setting restrictions on token use to not have their budget explode (like Uber in this submission), that’s a strong incentive to NOT use expensive models, and limit their thinking budget
- there is competitive pressure from China and others who can offer very decent performances at a fraction of the token price
- the price of tokens for the frontier models is likely to go up, but the price to access older models is what depreciates! The overall price per token is going down now that we are in a new world where companies understand that token maxing is one of the stupidest concept ever created by humankind.
This is why I'm building role-model, a routing protocol and a router runtime: https://role-model.dev/
The real measure should be cost per ~equivalent task result, not cost per token nor tokens per task.
I think its only accounting depreciation.
I have been using my laptop for a decade, what is stopping datacenters from using the purchased GPU chips for a decade?
The solder joints are notorious to fail at a high rate too.
They can't run larger modern models. They can't run smaller models as fast as newer servers. So their remaining market is applications where customers are okay with older, smaller models and slower performance.
They have to price the service lower than competitors due to the lower performance. The older GPUs are less efficient so it costs them more to keep them running. They're paid off, but they're taking up valuable power, space, and cooling in a data center.
Eventually there is a tipping point where it's better to replace that space and power budget with something new that has more demand.
The parts are sold off on the open market. There's an equilibrium demand for the parts from other data centers keeping older servers running and from hobby people who are okay with a jet engine sounding toaster of a GPU running in their home.
Why take risk when you can spend money and take no risk
When you have waitlists for many many months for Blackwell GPUs, keeping the old ones around as long as customers are willing to pay for them is great.
If I as a customer have a use case for a machine learning model I developed awhile ago, so an insect identification model, I had an ML researcher/eng develop it back in 2019, and it runs fine on a 2018-era T4 GPU (NVidia 2080 era), why mess with it?
I've seen those vision researchers want to train on H100s at the time and being told know, wait for the T4s.
I've seen T4s running BERT models for document classification.
When there are enough Blackwells in data centers that H100s are useless for inference by your standards (I don't know if we've arrived there or not yet), there will be people who, say, want to run the Taco Bell ordering chatbot on them. There will be people who have applications that are just fine with Qwen 2.5 who will be happy renting them.
There seems to be this crazy consensus that hyperscalers are going to go into their datacenters and throw away their old GPUs. The reality is they have a ton of paying customers for them.
And there may be insect identification apps from 2019 that say "you know what? H100s have gotten cheap enough I can use a VLLM so the user can describe where they saw the insect too", or the McDonald's website support chatbot developers say "Hey, the bigger cheapers have gotten cheap enough we can upgrade our models to Qwen 2.5".
The frontier level GPUs in e.g. AWS have a huge premium. When the newer generations come out, they will be able to cut prices to a bit of a premium over the operational costs and still make a profit, and there are a ton of down-market customers who will be interested, who aren't willing to try to outbid Anthropic for Blackwells.
Chips do wear out and need to be replaced (entropy do be like that and durability is not a primary concern for chip design) so you'll need to refresh your stock and, even if you don't need cutting edge models, the price of all chips at scale will go up over time. It may feel unintuitive since, when the PS3 was released PS1s were extremely cheap - but if you're struggling to understand this effect from your experiences in the consumer market you're actually looking at the price factor that starts making antiques increase in value since at a certain point they become scarce goods. The market price for an NES is higher today than it was in 2003 because the price had already bottomed out from demand from the general consumer market but the demand remaining (speedrunners and the like) is now fixed or growing while the supply is inevitably shrinking.
If you build a 100MW data center with GPU compute and three years laster a new data center opens with the same cost for GPUs and same electricity cost you do, but can do twice as much compute, you quickly lose business unless the market is just so constrained customers can't afford to be picky. But the moment there's slack in the market you'll see major migrations off of providers that have the same cost but half, or quarter of the same performance.
So when you see someone talking about GPUs fully deprecating in value in 1-3 years this is what they're talking about. Right now it's not a big deal because there's no slack in the market. But once there is, the bottom will drop out.
When they stray too close to the line ... you get Intel's 13/14th gen chips that wear out after 1-2 years instead of 10-20 years. Intel calls it "Vmin drift" because that doesn't sound scary, but the actual point is that various wear-out mechanisms push the chip outside of its design envelope - increasing the voltage or lowering the clock speed may get it to run for a while longer, but you're living on borrowed time as the various circuits just stop working right and you get unpredictable instruction mis-execution: https://fgiesen.wordpress.com/2025/05/21/oodle-2-9-14-and-in...
This was simply poor design, it took Intel ages to really figure out what went wrong and "resolve" it.
It cost them far more than it made.
Despite no moving parts things broke anyway and, even if it doesn't break, the vendor can make you change the technology just by playing with maintenance cost of the older one, limiting or removing spare parts from the market.
https://s24.q4cdn.com/101481333/files/doc_financials/2026/q3...
"Hard Drive exabyte shipments of 199EB, up 39% YoY, with ~90% shipped to data center customers"
"Data center revenue of $2.5B, up 55% YoY, driven by strengthening cloud and enterprise demand"
And an article: https://www.seagate.com/stories/articles/the-ai-era-doesnt-r...
I know that a lot of cloud storage has tiered models, where the "expensive, but faster" tiers are SSDs, but then the slower cheaper tiers are HDDs, and the "cold storage" can be HDDs that are turned off all the way to tiers like AWS's S3 "deep archive glacier" tier being tape drives.
In future, we might have fixed cost GPUs but not today.
The V100 (2017 -> 9 years old) can be rented from $0.02 to $0.37/h (right now I can find a V100 with a Xeon Gold 6140 and 48GB RAM for $0.165/h). Let's assume the guy you rent it to pins it at its 250W TDP and let's ignore the running costs of CPU/RAM/etc... Then you draw 1/4 kwh for that compute hour. The industrial electricity prices in the US vary between 7.5 and 25 ct per kwh (depending on state, time of day, etc...), so at 100% efficiency, assuming nothing ever breaks, and the CPU consumes 0W you earn about 14ct/h.
And remember: V100s hours are sometimes sold at 1/10th the price.
If I pick average conditions you need to start thinking of whether it is worth it to rent them out: Usually it isn't unless you have them anyways and just sell idle capacity.
It's barely worth it to run them in a pure "is it profitable" sense, if we also account for the opportunity cost of taking up a slot in your datacenter it seizes to be worth it really quickly.
And yeah, it does feel like GPUs will start losing values slower going forward with Moore's Law being dead for a while. It used to be that 3-5 years old GPUs were more useful as space heaters than GPUs, but that's much less of the case today.
I believe they do, but I too would love to know more details because there are several ways this can happen. Electromigration, package failures, VRAM failures, dielectric breakdown... Hopefully there will be studies soon similar to that old Google paper on HDD failures!
Though, those capabilities are maybe just a few years out, funnily it's taking AI to make it potentially doable.
Thats the main issue here.
These were about half of the cost of an used GPU just used for gaming. By that pricr, I'd say a GPU kept busy has twice as high a chance of failure after two years of use.
Not great, not terrible.
As for duty cycles, the chips are perfectly happy at 100% operation. Cooling and power componants fail, not the chips. But it costs manpower to repair such things and manpower is inconveniant these days. A gpu with any sort of fault just gets dumped.
Isn’t that just more work than logging it yourself?
Automating it has been way better for me than the alternative of breaking my flow whenever I'm switching tasks to chart my time, or logging all my hours for the week in one sitting. Different strokes for different folks I suppose.
Saves like $2-3 per session. Same quality code.
Model routers allow this to happen automatically without any more work by the user.
> a shittier model
A ton of tasks don't require the most expensive frontier models, etc.
> I’m not sure why anyone does it
1. Faster solutions from the LLM - also reduces employee costs of having the employee waiting on the LLM
2. Avoiding things like the half-billion dollar per month bill for a single company’s LLM use recently reported in Axios
> Compounding the problem, labs in China often release dual-use capable models as open-weight. Once a model is open-weight, safeguards that do exist can be removed, making the model available to any state or non-state actor to use for malicious purposes, including the cyber and CBRN misuse those safeguards were built to prevent.
And apparently OpenAI and Anthropic think so, too - why else would they try so hard to ban them instead of outcompeting them?
This makes no sense, 99% of the people using Chinese models are using them via Western inference providers who are running them and serving them to people over openrouter or whatever. If anyone is stealing your data it would be an American or European inference provider. A model has no ability to send data anywhere.
China bad by default, right?
- Oh, they must have been blocked from entering the Chinese market!
But none of that is true. You could see global brands everywhere here — Tesla, Unilever, KFC, Apple, and so on.
---
Or have you ever actually done cross-border trade? Or any international business collaboration? If you had, you’d definitely realize that what’s really stopping you is U.S. legislation. At least, that was the case with our former U.S. partner
Why even bother with 'forced IP transfer' when you can just take it?
Safeguards trained into the model (ie exist in the weights) can’t be removed.
There's a subreddit for people wanting to sex-talk to various models. It just so happens that the same prompt they use to 'jailbreak' SOTA models for sex talks also works if you want to have model write malware, or tell you how to design a highly illegal device.
Can anyone expand on this point? I read an article saying that the big AI co's datacentre spend was a bunch of lies because they can't build datacentres at anywhere near the rate they want to.
So it’s not even about datacenters.
Here’s a Reuters article about TSMC: https://www.reuters.com/world/asia-pacific/broadcom-flags-su...
So this is actual committed contracts with all kinds of companies such as Apple, NVidia, AMD.
Also, the whole reason they can’t build data centers faster is precisely because of this.
That was because the supplies the datacentre needed were constrained - supply-constrained, not end-user demand constrained, so would be in agreement with the GP comment (and the article I read didn't imply anything about lying).
A paranoid part of me thinks that these models are all inherently biased and instructed to be pro CCP, with specific gaps in their training data related to undesirable historic events and political ideas.
You'd be surprised how much of bias exists in easily extractable information. Now imagine how much of that happens during training, that you can't easily extract.
So this is largely a moot point. Yes, Chinese models will likely have some weird things injected into them. But so do the US models. Do I care? Not in the slightest. Models are my code monkeys, and if the code leaves my machine, I assume IP is leaked be it a Chinese model that clearly tells me they do use the data, or US models that pinky promise they don't.
Your main audience would be snake oil salesmen trying to prove their AI products are unbiased and not under the thumb of any outside influence. This doesn't address the biases of the model itself, but that's not your business. Your business is selling tokens and security certificates. If you can get the right angel investor, you could maybe have your new standard required for some government applications.
edit: Actually American inference providers are cheaper for Chinese models. There's way more competition here because the Chinese aren't idiots and investing every last dollar they have into data centers for llms that don't make money..
Also, there are a lot of competition in China. Like a lot. You might know better than me as well, but although the biggest AI-labs are based in USA, the adoption is weirdly global. Like as a general sense of what's going on - you can see AI-related ads literally everywhere in Tokyo, almost all the time, in every single screen in public.
Of course though they are not necessarily a viable solution for companies with security requirements etc. given it is just a single person project, but they still serve as a proof it can be done.
For deepseek-v4-pro:
- $0.350 in, $0.003000 cache, $0.80 out https://crof.ai/pricing
- $0.435 in, $0.003625 cache, $0.87 out https://api-docs.deepseek.com/quick_start/pricing
Deepseek shot themselves in the foot because they never intended to serve V4 Pro for .80c mm ouput, that was a promotional price that was meant to expire (and still might). They intended for v4 to cost $4.00 per million but Western inference providers drove down the price because they can operate at negative margins to try and push competition out. I can assure you they are losing a ton of money @ ~80cents.
My point is, its Western inference providers that are establishing the floor price of inference. They are willing to operate at a loss in order to put their competition out of business. Chinese providers are typically at or above the prices set by American/western providers if you go looking on the Chinese internet. You aren't going to get deals from China for inference except through this one instance with Deepseek v4 Pro which wasn't even supposed to be permanent pricing.
Source: directly involved in these discussions. You can downvote as much as you'd like but you can't ignore the facts.
Can you expand on this?
Just looked into it, seems like at most they have just 3.2, not 4: https://aws.amazon.com/bedrock/pricing/
Looking around their catalogue more, most of their models seem quite outdated, aside from the OpenAI and Anthropic ones (but those get more expensive). I wouldn't willingly pick Bedrock and would instead throw money at OpenRouter, that has both a bunch of providers, as well as almost any model for you to try.
Raise, they are going to raise the prices. We will spend more on AI infrastructure in 2026 and 2027 than the gross sales of the entire global software and services sector. Current pricing is at a major loss for current providers.
I wonder how often the Agent actually follows the guidance. I do see them follow it when I look. But it doesn't seem so every time.
The LLM can easily do this type of stuff, just tell it and it'll happily do it. This is exactly what I mean when I tell people they need to work closer with the AI, tell it how to do things. Don't just tell it what to do and get frustrated when it does it differently than you would.
A good way to achieve this without writing huge prompts is tell it to plan the change first. Just give it some vague low-effort directions. It'll usually get most things right, you tell it what you want different and once you're happy you tell it to go ahead.
I have experienced enterprise codebases that have been DRY'd to the point they become ossified.
It's also possible in many of these cases to identify sub-patterns you could abstract, to create a set of tools you can compose in different ways in order to satisfy the different use cases. Instead of one function/component you make multiple, and use them together.
All this stuff is just basic programming but I've mostly given up trying to preach about it. Most people don't care, and even if they did care they just don't have the talent to write really good code. It's rare to find a dev who does really solid work. In my experience you either do it because that's who you are, or nothing I say will make any difference.
Claude 100% of the time even thinks we use laravel despite the project being some old lumen codebase, so most of laravels features are not available. It also gets the PHP version we are using wrong 100% of the time.
I also think your excuse is bad. "The code is legacy fucked so I'll just legacy fuck it some more because I can't be bothered to make an effort"
We tend to obsess over software quality when it’s the least important thing for a business. It’s just a means to an end.
- Takes weeks or months to get simple features out the door, and when they're out they're buggy as hell and the bugs never get fixed. Sound familiar?
> I’d never touch any line of code unless I absolutely have to
And this is how legacy code is made. Years of everyone "never touching anything they don't have to" leads to a giant steaming pile of shit.
> unless the business is willing to face some down time
How does a simple refactor cause downtime? I do this kind of stuff all the time and pretty much never cause any downtime. In the very rare cases that prod downtime does occur it's generally not because of some simple code refactor, and we have it back up in no time by just rolling it back. Unless it's not related to the code at all, in which case it also wasn't a refactor that caused it.
You would edit Claude.md to say things like what tech the project is using, because that's the entire point of claude.md. It's literally the solution to the exact problem you're complaining about. Any information you want it to know, you put in there and then it knows it. And you can tell Claude to make or update the file for you.
I'm not one of the people telling you how smart LLMs are. I'm telling you how to use it efficiently, by not expecting it to know everything but rather provide the information that it needs in order to be a more useful tool.
Are they even making money off them now ?
I genuinely do not know how prices can get lower from the current major providers in NA without the whole market collapsing. Everyone is spending copious amounts of money to presumably make more money back.
The biggest reason large models are un-attainable for local applications is the lack hardware with large amount of unified/graphics memory (and the cost of the platforms that do). Once the memory slog goes back to normal and hardware manufacturers adapt to demand, we may see consumer hardware with large memory capacity effectively opening the door for slow but usable frontier model inference (assuming improvements in model efficiency and compute capacity)
At that point, inference becomes a race to the bottom. The large labs hope they can attain a leap in capability (which is increasingly looking bleak, with a average catch-up of just a few months) or market dominance through integration (integration in platforms and OS, exclusive deals with companies or governments).
For coding agents, i suspect no player will manage lock in enough market to enforce pricing much higher than the true inference cost, and catering to programmers becomes an unsustainable proposition. We will instead be further hit with a lot of AI integrated into our other tooling costs, such as GitHub, Microsoft suite, G-suite, forcing in AI functions as a value-ad into the total cost without giving the option to exclude them. (using their market position)
So my question remains the same: How are the players investing 100s of billions in buildout going to hope to make this back? Market capture looks bleak, inference looks like a race to the bottom. End users look like they could be beneficiaries. Where do the big boys go?
Well, they just rent their hardware, so I'm not so sure. But they'll both be public soon and we should get that breakout in their cost structures, somewhat.
I'm not sure about OpenRouter but I wouldn't be surprised if they offer a US-based provider of DeepSeek.
For reference, Cursor has their first own light fork of Kimi that they use as their baseline coding and review model.
V3 pricing from them was right in line with what the commodity providers are charging.
Not everyone using AI is using it to code core value IP.
https://martinalderson.com/posts/no-it-doesnt-cost-anthropic...
There's no way that all AI inference providers are colluding and/or all running at a massive loss, meaning the cheap Chinese model prices must be the real cost it takes to run frontier-class models PLUS their margin.
Look at Deepseek 4 Pro. https://openrouter.ai/deepseek/deepseek-v4-pro/providers Deepseek and Baidu are subsidising prices but they probably train on inputs. I have no model training and ZDR in OpenRouter enabled, and the first provider that shows up there is Deepinfra, significantly more expensive than Deepseek. BUT much cheaper than Sonnet 4.6 and ChatGPT GPT-5.4.