undefined

upvote

points

by GodelNumbering9 hours ago |

upvote

by __jl__8 hours ago|

[-]

This understates the cost increase. 3.5 Flash also uses more tokens. artificialanalysis.ai shows these difference to run the whole eval, which I think is more realistic pricing:

Gemini 2.5 flash (27 score): $172 (1.0x)

Gemini 2.5 pro (35 score): $649 (3.8x)

Gemini 3.0 Flash (46 score): $278 (1.6x)

Gemini 3.5 Flash (55 score): $1,552 (9.0x or 2.4x compared to 2.5 pro)

This is a massive price increase... 5.6x compared to Gemini 3.0 Flash

reply

upvote

by doginasuit9 hours ago|

[-]

They probably never intended to keep serving cheap models. This is a natural way to introduce the squeeze, now that they have people who built services on their API. It makes a lot of sense to have an abstraction layer where the provider doesn't matter. If you are working in Kotlin, Koog is excellent.

reply

upvote

by lanthissa8 hours ago|

[-]

switching models is insanely cheap compared to token cost on anything signficant, this is a take so cynical it misses the reality

reply

upvote

by Clueed6 hours ago|

[-]

in any corporate or half compliance-relevant setting switching isn't trivial. new DPA, subprocessor notifications, TIA, procurement review, security questionnaires, plus re-running your evals because prompts don't transfer 1:1. token cost is just one of the line items.

reply

upvote

by 6 hours ago|

[-]

deleted

reply

upvote

by lanthissa5 hours ago|

[-]

no it really not, even the soggiest bank has multiple api vendors atm.

reply

upvote

by alexandre_m4 hours ago|

[-]

I agree with parent. I'm not sure where your stance is coming from.

From what I hear, most enterprise AI deployments are seat-based subscriptions with annual commitments.

reply

upvote

by opsnooperfax2 hours ago|

[-]

50K FTE global firm. We’re still piloting ChatGPT. AI is a four-letter word and there are ridiculous ceremonies and hundreds of hours of overhead for every trivial use case.

Amusingly, Enterprise credits are more expensive than just paying a zero-commitment on-demand API fee. Personal accounts are still the best value.

reply

upvote

by p1esk4 hours ago|

[-]

Yes, I work at a 50 person startup and even here switching from CC to codex or cursor would be non-trivial for multiple reasons - not just the annual commitment.

reply

upvote

by opsnooperfax2 hours ago|

[-]

I think the big 3 are cartelizing and starting to ratchet up costs. GPT5.5 is not easily distinguishable from 5.1. I would it be shocked if we hit the ceiling and everyone is quietly positioning for the exit.

reply

upvote

by hnarn8 hours ago|

[-]

> now that they have people who built services on their API

People really can’t wait to be the next Zynga

reply

upvote

by rudedogg9 hours ago|

[-]

If Google is actually getting cheaper inference than everyone else with their TPUs, this smells like trouble to me. Maybe serving LLMs at a profit is proving difficult.

Or maybe they think because their benchmarks are good they can ramp up the prices. Seems like they don’t have the market share to justify a move like that yet to me.

reply

upvote

by tempaccount4209 hours ago|

[-]

This is not priced at inference cost.

My guess: it's the price at which they make more money than if they rent the TPUs to other companies.

The Gemini team has had trouble securing enough TPUs for their user's needs. They struggle with load and their rate limits are really bad. Maybe at a higher price, they have a better chance at getting more TPUs assigned?

reply

upvote

by gpm8 hours ago|

[-]

The cost at such they could rent out the TPUs, i.e. the market rate, is the inference cost.

Just because you are vertically integrated doesn't mean you get to discount the one business units products to the other. Doing so discounts the opportunity cost you pay and is just bad accounting.

reply

upvote

by KoolKat235 hours ago|

[-]

Basic business principle, you charge what people are willing to pay not what it costs.

reply

upvote

by dash26 hours ago|

[-]

Look up “double marginalisation”.

reply

upvote

by HDThoreaun7 hours ago|

[-]

Depends on if you have spare capacity I think. They have minimal competition so they might be maximizing profit by charging prices higher than what clears all their supply.

reply

upvote

by spyckie28 hours ago|

[-]

Its probably that in 1 or 2 years local (free) models will completely take the place of cheap models so cheap models need to move up the quality chain.

You have free local models for most tasks, $20 subscriptions for near-frontier intelligence, and API per token costs for frontier intelligence.

Flash seems to be targeting the near-frontier category.

reply

upvote

by TurdF3rguson7 hours ago|

[-]

That might work if it wasn't for FOMO. Are you ok with only $20 of frontier usage a month?

reply

upvote

by rohansood153 hours ago|

[-]

Subjective, but if we compare to compute not everyone needs the most expensive laptops or super computers for their work.

I think frontier models will be invaluable for scientific research, defense, financial analysis and such. But the average person probably would be reasonably well-served with a local model.

If you're in sales, customer service, product management and such - the leading open models at the 30B mark are already good enough.

reply

upvote

by booty8 hours ago|

[-]

Prevailing wisdom is that serving LLMs at a profit is achievable... it's when you factor in the cost of training them that prices get astronomical real fast.

Open-source model inference providers (who do not have to bear the cost of training) seem able to do it at much lower prices.

https://www.together.ai/pricing

https://fireworks.ai/pricing#serverless-pricing (scroll down to headline models)

Of course, it's possible that they are burning through investor cash as well, and apples-to-apples comparisons are not possible because AFAIK Google does not mention the size/paramcount for 3.5 Flash.

But if the prevailing wisdom is true, I think it's actually encouraging. It suggests that OpenAI and Anthropic could perhaps, if they need to, achieve profitability if they slow down model development and focus on tooling etc. instead. If true that's probably good news for everybody w.r.t. preventing a bursting of this economic bubble.

...my opinions here are of course, conjecture built on top of conjecture....

reply

upvote

by eklitzke4 hours ago|

[-]

Most of the training cost is not in the final training run, it's in all of the R&D (including salaries, equity, etc.) that it takes to get to the final training run. The actual cost of all of the TPUs (or GPUs), power, networking, storage, etc. for the final training run is significant, but it's even more expensive to have this huge R&D team doing frontier model development and using a lot of those same resources during development.

I think you're right that releasing models at a slower cadence would bring down costs to some degree, but it's not clear how much. All of these companies could significantly reduce their opex but at the risk of falling behind in terms of being at the frontier.

reply

upvote

by HDBaseT6 hours ago|

[-]

Not to discredit you, because you are 100% correct but tangential note about together.ai, they seem fairly unreliable with constant outages or higher than normal latency.

reply

upvote

by BoorishBears6 hours ago|

[-]

This is trouble if you're not Google/OpenAI/Anthropic: they're all shifting towards pricing for the economic value of the knowledge work they're aiding.

The economic value increases non-linearly as models get more intelligent: being 10% more capable unlocks way more than 10% in downstream value.

That's trouble because the non-linear component means at some point their margins will stop primarily defined by the cost of compute, and start being dominated by how intelligent the model is.

At that point you can expect compute prices to skyrocket and free capacity to plummet, so even if you have a model that's "good enough", you can't afford to deploy it at scale.

(and in terms of timing, I think they're all well under the curve for pricing by economic value. Everyone is talking about Uber spending millions on tokens, but how much payroll did they pay while devs scrolled their phones and waited for CC to do their job?)

reply

upvote

by IncreasePosts9 hours ago|

[-]

Maybe the margins are just very large for Google because they predict so much demand for 3.5?

reply

upvote

by GodelNumbering9 hours ago|

[-]

This combined with locally runnable models getting pretty good recently (e.g. Qwen 3.6) tells me that it's time to seriously consider local dev setup again

reply

upvote

by MASNeo9 hours ago|

[-]

Besides the cost you get the control, transparency and ability to identify small language models or LoRA you want to serve even more cost effective.

reply

upvote

by cft8 hours ago|

[-]

This should become the new Apple's hardware and software play. I am hopeful about the new CEO

reply

upvote

by hei-lima9 hours ago|

[-]

We need another "Deepseek moment" or else it will become impossible for the regular dude to use AI. It will become something that only big companies can afford.

reply

upvote

by SwellJoe8 hours ago|

[-]

We're having DeepSeek moments every couple of weeks.

Qwen 3.6 hit hard in the self-hosting space. It's incredibly capable for its size, really shaking up what's possible in 64GB or even 32GB of VRAM.

The Prism Bonsai ternary model crams a tremendous amount of capability into 1.75GB.

And, DeepSeek V4 is crazy good for the price. They're charging flash model prices for their top-tier Pro model, which is competitive with the frontier of a few months ago.

The winners in the AI war will be the companies that figure out how to run them efficiently, not the ones that eke out a couple percent better performance on a benchmark while spending ten times as much on inference (though the capability has to be there, I think we're seeing that capability alone isn't a strong moat...there's enough competent competition to insure there's always at least a few options even at the very frontier of capability).

reply

upvote

by Zambyte7 hours ago|

[-]

> It's incredibly capable for its size, really shaking up what's possible in 64GB or even 32GB of VRAM.

You can lower that to at least 24GB. I've been running Qwen 3.5 and 3.6 with codex on a 7900 XTX and the long horizon tasks it can handle successfully has been blowing my mind. I would seriously choose running my current local setup over (the SOTA models + ecosystem) of a year ago just based on how productive I can be.

reply

upvote

by hei-lima4 hours ago|

[-]

Gonna try it.

reply

upvote

by trollbridge7 hours ago|

[-]

We have Qwen 3.6-35b (6) on a 5090 (32GB) and it's blowing me away. Works fine for most (not all) code generation tasks. One developer here has been extremely stubborn about adopting AI; he's finally adopted it, albeit only when it's coming from a local model like this.

DeepSeek V4 Pro likewise is insanely good for the price. I simply point it at large codebases, go get a cup of coffee or browse Hacker News, and then it's done useful work. This was simply not possible with other models without hitting budget problems.

reply

upvote

by akulbe6 hours ago|

[-]

Any chance you'd be willing to talk further about your setup? I have 2 x 3090s in a local machine, and I'm still left with questions about how best to use stuff locally.

reply

upvote

by sheeshkebab4 hours ago|

[-]

You can only run heavily quantized models on all 3/4/5 rtx gpus (with 32gb or less vram) - and you probably want moe versions like Qwen 35b for this to run at speed somewhat comparable to Claude. It’s still not there to be honest but getting there. Personally I mess around with llama.cpp on m5 max with 128gb - it’s a decent setup to try various medium sized things, and runs llms surprisingly well without quantization, at least the moe models.

reply

upvote

by SwellJoe4 hours ago|

[-]

Two 3090s is 48GB, so it's possible to run the 6-bit quantization comfortably, which is fine. It doesn't start to get notably dumber until lower than that. It won't be as fast as a hosted model, but dual 3090s will be comfortably fast for interactive use with the MoE version and not terrible to use with the dense model. I run the dense model at 8 bits on my dual Radeon V620 desktop machine, which I think would be slower than two 3090s, or at least not notably faster.

reply

upvote

by hedgehog3 hours ago|

[-]

Have you done comparisons with 4 bit and seen a noticeable difference for coding tasks?

reply

upvote

by SwellJoe1 hours ago|

[-]

No, I've just seen benchmarks showing most models start degrading around 4-5 bits. That's not to say they become useless, just that down to about 6-bits (with careful hybrid quantizations like unsloth where some of the layers aren't quantized or are quantized at higher bit depths) the quality isn't measurably degraded, but below that there are measurable differences in performance.

People report good results from DeepSeek V4 Flash at 2 bits (the DwarfStar 4 folks are doing it, and I've tried it on my Strix Halo, but it's too slow to be usable, so I haven't bothered to figure out if it's actually smart enough to use for anything).

Anyway, it's obvious models have to degrade in terms of knowledge, at any quantization, even though it may not show up clearly on benchmarks until lower. If you halve the size of the data available, it necessarily loses information about the world.

reply

upvote

by squidbeak9 hours ago|

[-]

Deepseek had another moment a few weeks ago. V4 isn't far behind the US frontier, and so far its flash variant seems a very reliable coder and costs a pittance.

reply

upvote

by ai_fry_ur_brain8 hours ago|

[-]

Deepseek V4 (not flash) trippled in price too by the way (from Deepseek). Get used to this pattern.

This is what you get for relying on the generosity of billionaires. Keep offshoring your thinking ability to a machine and let me know how competitive you. Hint, you wont be. There's nothing special about being able to use an LLM.

reply

upvote

by barrell5 minutes ago|

[-]

Actually, deepseek v4 was 1/3 promotional price for the first month or so. This was pretty clearly communicated. The promotions window just ended is all.

reply

upvote

by npn8 hours ago|

[-]

Unlike other providers, Deepseek does promise that they will lower the price when their Huawei cards arrive in a few more months.

reply

upvote

by flakiness7 hours ago|

[-]

Give me a link. Cannot wait. One PSA is that they have 75% discount right now so it is already cheaper than the full price.

reply

upvote

by npn6 hours ago|

[-]

Weird, last time I checked it was right on the pricing page.

But even when it happens I doubt it would be as cheap as it is right now. Enjoy it while it lasts!

reply

upvote

by ls6128 hours ago|

[-]

Anyone can host Deepseek V4 on rented GPUs and sell inference on it. Price will very quickly converge to the marginal cost of inference. This is as close to a pure commodity as it gets in the AI space so competitive market economics will put in work. Same is true for any open-weights model.

reply

upvote

by ai_fry_ur_brain8 hours ago|

[-]

You dont understand the costs involved to run inference at scale

Please go run some numbers.The hardware needed to Run Deepseek v4 flash at 20 tps for a single session is nowhere close to what is required to run it at 50tps for 5,000 concurrent sessions.

Imagine what it takes to be profitible when running at 150 tps for 30cents per 1mm. You make less than 1k per month and the hardware required to run that cost 10k a month to rent with hardly any concurrent session capability.

reply

upvote

by gpugreg6 hours ago|

[-]

> Please go run some numbers.

- DeepSeek serves DeepSeek V4 Pro at 27 tps: https://openrouter.ai/deepseek/deepseek-v4-pro

- At 27 tps per user, a B300 GPUS will give you around 800 tokens per second (serving 30 users): https://developer-blogs.nvidia.com/wp-content/uploads/2026/0...

- That's 800 * 60 * 60 generated tokens per hour, at a cost of $0.87 per 1M tokens, or $2.50 per hour.

- For input and output tokens, the math is a bit more complicated because we have to make assumptions about their ratio. Using the published values from OpenCode, we get another $2.50 for cached tokens (which are almost free for DeepSeek) and another $3.40 for input tokens (which are a lot cheaper to compute than output tokens), which gives us a total of $8.50 per hour per B300 GPU.

- B300 GPUs can be rented for as low as $3.40 per hour, which is less than $8.50, so hosting DeepSeek V4 Pro is profitable.

You could also host it at fewer tps per user to raise the efficiency and therefore the profit even higher.

reply

upvote

by ls6126 hours ago|

[-]

Even not assuming Blackwell inference the $3.50/hr price is likely close to the marginal cost. The Deepseek R0 model is a little more than a third of the size of V4 and cost around $1/Mtok to serve at scale based on deepseek's blogs last year and Hopper rental prices.

reply

upvote

by ls6128 hours ago|

[-]

Yes it is more efficient in $/tok to run at scale than to run just for yourself. Everyone selling Deepseek V4 inference is selling an undifferentiated good. They have run the numbers on how much it costs and are competing against a dozen other outfits also selling undifferentiated open weights tokens. Whatever the dollar cost they face to rent those GPUs will be what they are able to charge in the competitive market. That is great for you and me because we can buy tokens at pretty much exactly what it costs to produce them.

reply

upvote

by dpoloncsak8 hours ago|

[-]

Mate why are you so mad at people upset the price trippeled? It's a fair complaint that people built services using the cheaper ones with the expectation future models would be similarly priced. You can avoid 'offloading thinking' while still building ontop of these models

reply

upvote

by zaptrem7 hours ago|

[-]

V4-Pro is about 2.4× total params and 1.3× active params of V3.2.

reply

upvote

by creationcomplex6 hours ago|

[-]

You're typing as your handwriting and letter sending abilities deteriorate to dust. Writing down information as your memory capacity decays. Remembering instead of living at the pure leading edge of perception dulling your reactions.

Smh, it's all downhill from the first unadulterated neuron.

reply

upvote

by aurareturn8 hours ago|

[-]

I think demand is too great and compute is not enough. Nothing to do with billionaires colluding to increase prices by 3x.

reply

upvote

by boutell6 hours ago|

[-]

Actually, why should Google collude on pricing? They have deep pockets and could starve out the competition while keeping prices low, if they really wanted.

I think it is priced high because it's basically their smartest model as well as their fastest, so why shouldn't they?

You can still use earlier generations of Flash at a lower cost if you want "fast and cheap and just OK," which often makes sense. (Just checked)

I would predict they will lower this price when 3.5 High appears, but perhaps not all the way.

reply

upvote

by xbmcuser8 hours ago|

[-]

What we need is a deepseek moment in hardware ie China reaching parity on node size that is the only way latest computers let alone latest ai will be available to us in the future otherwise the profit margins will push most production to AI.

reply

upvote

by throwa3562628 hours ago|

[-]

To be honest, China not having access to the latest hardware is exactly what has driven LLM technology forward the last 2 years.

reply

upvote

by humanfromearth97 hours ago|

[-]

Why?

reply

upvote

by Weryj7 hours ago|

[-]

Because it forced them to focus on efficiency, instead of throwing more compute at the problem.

Just like in software, some of the most beautiful solutions come from constraints. Think, the optimisations that game developers implemented because of the frame budget.

reply

upvote

by Viacol3 hours ago|

[-]

On top of that, China is also facing hardware constraints, which is pushing companies to develop better domestic chips for AI training. It'll be interesting to see how things perform once Huawei's newer hardware is fully deployed at DeepSeek.

reply

upvote

by blackoil1 hours ago|

[-]

Open Source ASML EUV. But will wipe off trillions from US stocks so 401k may not like that.

reply

upvote

by stared5 hours ago|

[-]

We have a "DeepSeek moment", https://github.com/antirez/ds4 (see https://news.ycombinator.com/item?id=48142108).

Or if you prefer smaller ones, Qwen3.6-35B-A3B, https://huggingface.co/bartowski/Qwen_Qwen3.6-35B-A3B-GGUF

reply

upvote

by segmondy9 hours ago|

[-]

You can use lots of open weight models today.

reply

upvote

by hei-lima8 hours ago|

[-]

That's one solution to the problem. But it still needs some good computational capabilities. Either we optimize the hell out of those models, or we wait for the hardware to become good enough for them.

reply

upvote

by Gigachad6 hours ago|

[-]

The real problem is the hardware to run them is still very expensive.

reply

upvote

by pianopatrick7 hours ago|

[-]

Maybe we can figure out better ways to use the models that can run on cheap hardware.

reply

upvote

by GeorgeOldfield8 hours ago|

[-]

gemini isn't even that good. just tested 3.5 on usual complex prompts to opus/chat 5.5. meh

reply

upvote

by k8sToGo8 hours ago|

[-]

Are you really comparing flash to opus? Shouldn't you be comparing pro?

reply

upvote

by CognitiveLens8 hours ago|

[-]

The benchmark tables in the Google announcement include Opus 4.7, and the numbers are very impressive. Caveat emptor, but it's not unreasonable to compare a new Flash to a current-gen Opus, even if some of the results confirm expectations

reply

upvote

by bachmeier7 hours ago|

[-]

Who would have guessed that something costing roughly a third as much wouldn't do as well at certain tasks.

reply

upvote

by kmac_8 hours ago|

[-]

Well, the first impression is that Gemini still goes off the instruction rails easier than other models, but I noticed that it tends to go back to the initial goal without holding a hand, which is a real improvement. It's really interesting that these models behave so differently.

reply

upvote

by fnordsensei9 hours ago|

[-]

3.5 flash is listed as stable rather than preview, or am I misreading?

https://ai.google.dev/gemini-api/docs/models/gemini-3.5-flas...

reply

upvote

by GodelNumbering9 hours ago|

[-]

ah I mistakenly wrote preview

reply

upvote

by dr_dshiv9 hours ago|

[-]

3.1 flash lite — $0.25/$1.50 — plus insanely fast.

3.1 flash lite isn’t quite as good as 3 flash preview (which is the most incredible cheap model… I really love it) — but 3.1 is half the price and the insane speed opens up different use cases.

For comparison, Opus models are $5/$25

reply

upvote

by SwellJoe8 hours ago|

[-]

Opus 4.7 is smarter than even Gemini 3.1 Pro on nearly every metric, though. You're comparing apples to oranges. Gemini 3.1 Flash is somewhere in the neighborhood between current Haiku and Sonnet, I think? Still a better value than the Anthropic models, I guess, which are quite pricey.

Since Gemini 3.5 Flash is raising the price to $1.50/$9.00, it's priced between Haiku and Sonnet. If it outperforms Sonnet, it remains a good value, I guess. Though DeepSeek V4 Flash is much cheaper than all of them, and seemingly competitive.

reply

upvote

by WarmWash7 hours ago|

[-]

>Opus 4.7 is smarter than even Gemini 3.1 Pro on nearly every metric,

Outside of coding, claude models are pretty meh. GPT and Gemini are the workhorses of science/math/finance.

reply

upvote

by robwwilliams5 hours ago|

[-]

Not in my fields of science: Genetics and neuroscience. The combination of Opus 4.7 Adaptive used with well structure project folders is amazingly useful.

reply

upvote

by epolanski5 hours ago|

[-]

And even on coding, they are mostly good at generating new code.

They sure are not at thorough analysis or debugging, etc.

reply

upvote

by OakNinja7 hours ago|

[-]

To be fair, Gemini 3.1 flash _lite_ supports structured output (guaranteed json), it’s super fast, runs circles around 2.5 flash and costs $0.25/$1.50.

I use it _a lot_ and it’s very capable if you just plan correctly. I actually almost exclusively use 3.1 flash lite and 2.5 flash lite (even cheaper) and we have 99.5% accuracy in what we do.

That said, I think we’ll see the lite/flash models and the pro models will diverge more price wise. The pro models will become more and more expensive.

reply

upvote

by WhitneyLand8 hours ago|

[-]

Their rationale might be that it’s size and intelligence are growing relative to the market.

Fwiw it’s beating Claude Sonnet in most benchmarking (benchmaxxing?), yet they’ve priced it almost half off on a per token basis.

Question is are you going to persuade anyone with this argument?

Are there many devs at Google who legit prefer Gemini over Claude and Codex? Would love to hear about that.

reply

upvote

by SyneRyder8 hours ago|

[-]

> Are there many devs at Google who legit prefer Gemini over Claude and Codex? Would love to hear about that.

A few weeks ago, Steve Yegge claimed he'd heard that Google employees are banned from using Claude & Codex.

https://x.com/Steve_Yegge/status/2046260541912707471

A number of Googlers replied to say that was totally false, including Demis Hassabis, but they were all on the DeepMind team.

https://x.com/demishassabis/status/2043867486320222333

This person here claims they left Google because of the ban, and because the ban applied outside of Google work as well:

https://x.com/mihaimaruseac/status/2046272726881693960

reply

upvote

by myko2 hours ago|

[-]

> and because the ban applied outside of Google work as well

I think false (or hasn't filtered to everyone lol)

reply

upvote

by dbbk9 hours ago|

[-]

I don't think they're really comparable. Seems they created the Flash-Lite tier to take the spot of the old Flash models.

reply

upvote

by GodelNumbering9 hours ago|

[-]

No, 2.5 had both flash and flash lite.

reply

upvote

by mlmonkey8 hours ago|

[-]

It is Google, after all ....

reply

upvote

by photonair9 hours ago|

[-]

In general, Gemini flash is still relatively cheaper compared to the "mini" version of the other big 2. However, I agree that newer version seem to have multiple X price increase (similar to the new ChatGPT) and we certainly need competition from the open source models to keep these guys in check with pricing.

reply

upvote

by malloryerik2 hours ago|

[-]

To me this is almost like a tone-deaf naming change.

Empty Slot (new Pro as Mythos competitor?)

Old Pro -> now Flash

Old Flash -> now Flash Lite

Old Flash Lite -> now Gemma (and not served by Google)

I say "almost" because the situation is more fluid and unstable than a normal naming change. If Apple were to do this with laptops, maybe it'd be like, Air gets better and pricier and becomes Pro-level model, Neo same way becomes Air-level model, etc. But Apple's too design oriented to do something like that. Google, well...

This change has made me decide to move to a multi-provider situation like through OpenRouter for consumer-facing LLM api in a service I'm building. I just can't trust Google to not constantly rearrange everything under our feet. Doesn't mean I won't use Gemini, but it clearly means I need to have others in the mix ready to go. In fact I used to use lots of Flash Lite, which is now Gemma territory, and I can't get that served by Google anymore and don't want to run my own hardware.

But in any case, I'd compare this "Flash" model with previous "Pro" on all metrics. It's kinda like if in clothes a Small suddenly became what was a Large, or at Starbucks a Grande became the new de facto Venti. And only for the new! drinks.

And if we think this way, it's possible that prices are actually falling?

reply

upvote

by LetsGetTechnicl9 hours ago|

[-]

Gen AI is unprofitable, especially at the insanely cheap rates they've been offering to get people in the door. So expect more increases in the future.

reply

upvote

by roadside_picnic8 hours ago|

[-]

These companies are unprofitable (as all companies at this stage and ambition should be) but I increasingly don't see any justification for the idea that it is fundamentally unprofitable.

Inference alone is certainly profitable. I'm running models at home that are comparable to performance of paid models a year or so ago for free. Even for much larger models the cost around inference serving are clearly manageable.

Training is where the costs are, but I'm increasingly convinced those too could have costs dramatically reduced if necessary. Chinese companies like Moonshot.ai are doing fantastic work training frontier models for a fraction of the cost we're seeing from Anthropic/OpenAI.

This isn't like Uber or Doordash where the economics fundamentally don't make sense (referring to the early days of these services where rates were very cheap).

It's a compelling story that "current AI is unsustainable", but it doesn't pan out in practice for a multitude of reasons (not the least of which is that we can always fall back to what models did last year for basically free).

reply

upvote

by ReliantGuyZ8 hours ago|

[-]

And if you can run those strong models at home for free, why would hosting them be a successful business for any of these providers?

Profitable maybe, in terms of having low costs, but why pay Google or whoever when you can do it yourself for cheaper/"free"?

reply

upvote

by HDThoreaun7 hours ago|

[-]

If you can run your server at home for free why would hosting it be a successful business for any of these propviders?

reply

upvote

by overrun116 hours ago|

[-]

Arguably nothing even has to change with training for this to be sustainable. Dario has claimed that Anthropic is profitable on a per training run basis. They aren't profitable because they choose to keep investing in increasingly large training runs.

reply

upvote

by dsdsfaa3 hours ago|

[-]

Cut the crap.

The value of the firm's operating assets = EBIT(1-t) - Reinvestment

You (Anthropic) want that sky-high valuation? Accept reinvestment is part of the equation.

If they decide to stop reinvesting, then they are as good as dead.

Moreover, they clearly are not re-investing cash flows from operations. Why do you think they are continually raising money? Lmao.

reply

upvote

by LetsGetTechnicl8 hours ago|

[-]

If it's profitable, why haven't they reported any profits? People like Ed Zitron have done the math and it just doesn't add up. I mean he just published this piece today: https://www.wheresyoured.at/ai-is-too-expensive/

reply

upvote

by anthonypasq7 hours ago|

[-]

Amazon was unprofitable for over a decade, and they were public. Theres no incentive to be profitable as a private company if you can continue to raise money.

Ed Zitron and Gary Marcus are... confused.

reply

upvote

by mynameisash6 hours ago|

[-]

> Amazon was unprofitable for over a decade, and they were public.

Amazon was unprofitable because they poured their revenue into growth. On paper, they were in the red, but everyone - especially investors - saw what was going to happen, given their trajectory.

Is it the case that any of these AI companies are actually making a ton of money and growing accordingly? AFAICT, we've just got [a] big players like Google that can subsidize AI in the hopes of waiting everyone else out and [b] private companies raising capital in the hopes that when the market returns to rationality, they may be solvent.

reply

upvote

by overrun116 hours ago|

[-]

Yes that is exactly what is happening. OpenAI and Anthropic are the fastest growing companies by revenue ever and their gross profit margins are healthy.

reply

upvote

by mynameisash5 hours ago|

[-]

According to this article[0]:

> HSBC Global Investment Research projects that OpenAI still won’t be profitable by 2030, even though its consumer base will grow by that point to comprise some 44% of the world’s adult population (up from 10% in 2025). Beyond that, it will need at least another $207 billion of compute to keep up with its growth plans.

This article is from six months ago. Was HSBC wrong; did something dramatically change in the last six months; is OpenAI not, in fact, profitable?, or are they in fact doing well but doing a huge investment (as was the case with Amazon 25ish years ago)?

I genuinely do not know, but my impression is that they're burning investment capital trying to compete with others' investment capital and Google's bottomless pockets.

[0] https://fortune.com/2025/11/26/is-openai-profitable-forecast...

reply

upvote

by LetsGetTechnicl2 hours ago|

[-]

Also OpenAI somehow having 44% of the world’s population as its customer base is a plainly absurd goal and will never happen, not in 5 years

reply

upvote

by dsdsfaa3 hours ago|

[-]

and to make matters worse, they are massively over-valued.

Whoever buys the stock at a richly priced 1tn at ipo is a bozo lmao. I know I know, index funds will be forced to hold it bypassing the 1 year rule. Disaster already.

reply

upvote

by LetsGetTechnicl2 hours ago|

[-]

Then why do they constantly need more and more funding from VC and Google and MS and NVIDIA? Why is it all circular dealing? Why aren’t there smaller AI startups running these smaller, “profitable” models?

reply

upvote

by timmytokyo6 hours ago|

[-]

But I've been told here -- over and over again -- that the cost of inference was going to go down as the technology matured.

The trend lines are going in the opposite direction.

reply

upvote

by goosejuice7 hours ago|

[-]

His entire brand is that the AI bubble will burst. By his account it was supposed to have several times by now. Like the doomers, it's not if it's when and they have to keep pushing back their predictions. Funny how both camps can be so confident. Alas, that's how they get eyes, ears and dollars.

That's not to say they will be or are wrong, it's just that they aren't exactly unbiased, or humble, sources.

reply

upvote

by booty8 hours ago|

[-]

Yeah, at this point I think the worst-case scenario for OpenAI/Anthropic/etc is to slow down frontier model development and focus on tooling and services, as opposed to imploding completely and bursting the economic bubble. I hope?

reply

upvote

by GaggiX9 hours ago|

[-]

If you don't need SOTA or near SOTA there are plenty of dirt cheap models, just look at Gemma 4 31B on Openrouter.

reply

upvote

by Gigachad6 hours ago|

[-]

For all of the use cases being hyped you really do, and you actually need something much better than the SOTA models to do what we are being told can be done.

The small models are useful for small things like summarizing text or search but not much else.

reply

upvote

by LetsGetTechnicl2 hours ago|

[-]

Yeah a lot of AI hype is look at the amazing new thing our new model can do! Like Google at this event. But when pressed about its pricing reality the answer is “use a worse cheaper model”?? Real convincing argument there

reply

upvote

by ai_fry_ur_brain8 hours ago|

[-]

[flagged]

reply

upvote

by npn8 hours ago|

[-]

It is insanely profitable though, if you cut out r&d cost, plus the marketing and loss leaders. Don't let them gaslight you.

Even anthropic who does not own any hardware still have a big margin providing claude models.

reply

upvote

by LetsGetTechnicl8 hours ago|

[-]

Then why haven't they reported any profits using GAAP (generally accepted accounting principles)? They all use ARR which is easily gamed.

reply

upvote

by overrun116 hours ago|

[-]

They aren't profitable on a GAAP basis and no one claims this. This obsession over profits is misguided. These are hyper growth companies growing at a scale never seen before. It is both deliberate and uncontroversial to invest in growth rather than slowing down to produce profits.

reply

upvote

by chillfox2 hours ago|

[-]

If my retirement money is going to end up invested in these companies, either directly when they IPO or indirectly through compute providers, then I would like to see some proof that they are capable of producing profits. "Trust me bro" just ain't gonna cut it.

reply

upvote

by npn7 hours ago|

[-]

I don't really sure, but might be they count hardware purchase as loss, too.

Google has just recently upgraded their TPUs.

reply

upvote

by timmytokyo6 hours ago|

[-]

Everything is insanely profitable if you ignore the costs.

reply

upvote

by npn1 hours ago|

[-]

The premise is if they stop training new models then it will become pure profit after 2 years when the hardware finished paying for itself.

It's pretty funny that everyone say that this business is unsustainable, but I have yet seen anyone bankrupt, even the pure hardware providers who are renting out a100 b200.

reply

upvote

by LetsGetTechnicl1 hours ago|

[-]

And AI investors and stock market boosters are just going to accept OpenAI not having anything "new" to show for all their investments? What about replacing hardware once it's been burned out from constant high usage? Is it not odd to you that so many big AI deals get announced and never heard from again? What's the business reason for neoclouds buying GPU's from NVIDIA only for NVIDIA to then pay them to rent them back? How does this make any sense?

reply

upvote

by operatingthetan5 hours ago|

[-]

They immediately undercut their argument to the point that I'm not sure if they were being sarcastic.

reply

upvote

by Rekindle80904 hours ago|

[-]

[dead]

reply

upvote

by ilia-a9 hours ago|

[-]

Yeah, it is a massive jump in price, hardly a "Flash" model anymore... I wonder if they'll release flash lite or something with a bit more affordable price point.

reply

upvote

by OakNinja7 hours ago|

[-]

There’s already a flash lite tier since 2.5. Latest is 3.1 currently.

reply

upvote

by irthomasthomas9 hours ago|

[-]

And they are using this to power search answers?

reply

upvote

by CooCooCaCha8 hours ago|

[-]

I bet the API pricing helps pay for search users

reply

upvote

by llm_nerd8 hours ago|

[-]

It might be temporary pricing given that 3.5 Flash is actually superior to the existing 3.1 Pro in almost all regards, so they're in a bit of a lurch as 3.1 Pro really doesn't make sense given that 3.5 Pro has been delayed a bit.

reply

upvote

by SwellJoe8 hours ago|

[-]

That's a lot. DeepSeek v4 Flash is just over a tenth the price, and DeepSeek v4 Pro is roughly the same price (currently heavily discounted, but will be $1.74).

I mean, the benchmarks for Gemini 3.5 Flash are very strong, but at those prices it has to be. I guess the time of subsidized tokens from the big guys is slowly coming to an end.

reply

upvote

by copperx5 hours ago|

[-]

They have said AI will be priced like a utility, meaning $100-300 per month or so.

reply

upvote

by dzhiurgis3 hours ago|

[-]

I use Gemini models in Junie daily. When I need accuracy I switch to Gemini 3.1 Pro Preview (why it is still in preview?), but it burns thru credits leaving me topping up $5 every day. 3.1 Flash lite is just not accurate enough. 3 Flash is sweet spot just as Jetbrains suggests it is.

Maybe I'll look at Opus again, but it just was slower, much more expensive and worst at all - wasn't listening to you instructions.

reply

upvote

by verdverm8 hours ago|

[-]

At the same time, it is supposedly Gemini 3.1 Pro level at 3/4 the price

and far cheaper than comparable models, Gemini Pro is cheaper than Claude Sonnet (Anthropic still gets to charge a brand premium)

reply

upvote

by throwa3562628 hours ago|

[-]

Gemini 2.5 flash was the best Gemini model.

Not the most intelligent but perfect balance of cheap, fast and not-too-dumb.

reply

upvote

by m3kw98 hours ago|

[-]

just subscribe to the plan, cheaper

reply