So large companies are getting billed a lot more than those discount subscription plans.
Claude can be very good but enterprise pricing doesn't make sense to me.
The real cost effective way is giving a team $20 cursor $20-100 Claude $20-200 codex.
I'm spending 1k on Claude enterprise easily and that's with trying to spread it on codex and cursor using pi.
What's your source for Opus being a 5T model?
> and tiny distillations from DeepSeek that perform well only in benchmarks.
I don't think you know what you're talking about. Local models aren't “distillations from Deepseek”.
And they don't perform well “only in benchmarks”, Qwen 3.6 is a very decent model (obviously it's not Opus, but it's also much faster and speed is a quality of its own).
From this paper
Elon Musk tweeted that Grok is 0.5T or 1/10th the size of Opus. https://xcancel.com/elonmusk/status/2042123561666855235#m
While this source's reliability is certainly debatable, the size matches the results of this paper, in which researchers estimated the parameter count from model knowledge. https://01.me/research/ikp/
Massive understatement. Nowadays it has become hard to find a single Musk statement that doesn't contain at least one lie.
> the size matches the results of this paper, in which researchers estimated the parameter count from model knowledge. https://01.me/research/ikp/
Thanks for the pointer. This estimation has Grok 6 times bigger than Musk claims it is, so maybe that's where the lie is.
(I'm quite skeptical about that number though, it would be quite disappointing for the US tech if their flagship models had to be that much larger than the Chinese ones for such a small edge in performance. Because I don't think US labs are incompetent, I'd bet that US flagships aren't more than 2/3 times bigger than Chinese flagship. Otherwise it really doesn't bode well.)
Probably Elon Musk: https://eu.36kr.com/en/p/3760679047267075
Like "Full Self-Driving" from coast-to-coast by 2016?
He's lagging the AI race despite having tons of compute available, so he tries to make a narrative about how it's not that the model is behind, it's just smaller than the competition.
This is a temporary phenomenon. Expect either drastic price increases or draconian throttling or both in the coming months.
These companies are operating at huge loses and have hundreds of billions in liabilities and commitments. They need to turn on the money faucet sooner than later.
If prices keep going up, watch for companies to exit frontier models and go to local llama.cpp instances for 6-month-ago SOTA, with the flex of being housed within the office - no more privacy leakage, no more price gouging.
To be honest, I’m not sure why a Y-Combinator backed company hasn’t come out yet flooding the market with highly capable OPAI (pronounced “Oh-pah” as in what Greeks shout as the drink shots), which stands for “On-Prem AI”
… yes, I just made up OPAI right now lol
If we momentarily disregard the fact that YC itself owns billions of dollars worth of OpenAI shares[1], YC would plan to find demo-day investors willing to drive down the value of frontier labs. The coöpetition among VCs and the existing web of AI investments will mean no VC will be interested in investing in local AI...until after the frontier labs IPO.
1. Thanks to the self-dea^w foresight of former YC president Sam Altman
edit: I see in other comments on this thread you think Ed Zitron is a reliable pundit so that explains everything.
And you think it is unreasonable to consider this unsustainable?
For context, ChatGPT business subscriptions give you a fixed pool of credits to use, after which you get billed a la carte at inflated 1.75x rates vs API, or if you don't want to pay, you get access to anything but the non-reasoning models turned off for the month.
We also tried Claude Enterprise, which was unusable as people blew through their monthly limits in a matter of hours.
Looking at the pricing of 1-2T models like Kimi or DeepSeek on the open market, I'm tempted to assume that inference costs are closer to subscription pricing than to API pricing.
Especially considering that subscriptions a) distribute load over time via rate limits, and b) will include a lot of users who get only a fraction of the possible value, whether they are on a personal account where they are on the rate limit on the weekend but barely use it during the week, or are corporate users who were issued an account they rarely use. Subscription prices are usually measured on the average case, not the most extreme value a power user can get out of it
So just going on vibes?
While some people don't like his content, Ed Zitron shows a lot of evidence for your assumption being very wrong.
These companies are bleeding cash at ungodly rates. It's likely their API pricing is still subsidized if you look at their overall financial picture.
Related, there's a good reason those API prices keep going up a lot every new version and it's not just because the models are better.
Also, API prices going up a lot every new version is more an OpenAI thing, and even there it's a recent trend: GPT 5.0 was a big price drop compared to 4.1, and 4.1 was cheaper than 4o, which itself got a price cut at some point and is cheaper than 4. Meanwhile Anthropic's API pricing stayed stable for many versions, then got slashed to a third with the 4.2 release and have stayed at that level since.
Of course they do have to "make bank" in some way to offset the insane training costs. But whether they go for high prices or high volume, or offer some services as a loss leader to drive profits elsewhere is somewhat orthogonal to that
pure speculation. about as valuable as my linked wsj reporting i suppose. given thats the case, maybe you shouldnt claim so confidently that they are money incinerators.
Even very cheap mini-PCs and laptops can run any of the models run by cloud providers, albeit at a much lower speed (i.e. with the weights stored on SSDs).
Whether such a low speed is useful, depends on the application. For something like a coding assistant or bug scanning, an instant response is desirable, but certainly not necessary.
Anything can also be run on a cheap computer.
The difference is in speed. A cheap computer may run a big model up to a few orders of magnitude slower than datacenter hardware, depending on whether the LLM is small enough to fit in GPU memory, or it is small enough to fit in CPU memory or it is so big that it must spill on SSDs.
Depending on the application, the tradeoff between run time and run cost may happen to favor using local hardware, despite a much slower speed.
There are plenty of applications where doing them for negligible cost during an overnight job can be preferable to obtaining faster results at a very high price, for instance scanning for bugs in a mature code base using a great number of different open-weights LLMs, which can achieve similar bug coverage like using a single, but overpriced and unavailable SOTA LLM, e.g. Mythos.
Giving strong “640k is enough for anyone” vibes here.
Some might say the price wouldn't be great if you could actually process and validate it...
My hunch is that this is the source of much of the variability in outcomes upstream of HN commenters claiming extremes of, "This model changes everything!" to "This[same] model is crap."
We haven't operationalized what it means to "be good at prompting," nor developed proxies/heuristics/shibboleths for accessing prompting skill. There's community skepticism over whether prompting skill even exists. Besides even if prompting skill is real, who wants to hear, "Actually you kinda suck at prompting."