undefined

points

by rudedogg10 hours ago |

comments

by tempaccount42010 hours ago|

[-]

This is not priced at inference cost.

My guess: it's the price at which they make more money than if they rent the TPUs to other companies.

The Gemini team has had trouble securing enough TPUs for their user's needs. They struggle with load and their rate limits are really bad. Maybe at a higher price, they have a better chance at getting more TPUs assigned?

by gpm8 hours ago|

parent|

[-]

The cost at such they could rent out the TPUs, i.e. the market rate, is the inference cost.

Just because you are vertically integrated doesn't mean you get to discount the one business units products to the other. Doing so discounts the opportunity cost you pay and is just bad accounting.

by KoolKat235 hours ago|

parent|

[-]

Basic business principle, you charge what people are willing to pay not what it costs.

by dash27 hours ago|

parent|

prev|

[-]

Look up “double marginalisation”.

by HDThoreaun8 hours ago|

parent|

prev|

[-]

Depends on if you have spare capacity I think. They have minimal competition so they might be maximizing profit by charging prices higher than what clears all their supply.

by spyckie29 hours ago|

prev|

[-]

Its probably that in 1 or 2 years local (free) models will completely take the place of cheap models so cheap models need to move up the quality chain.

You have free local models for most tasks, $20 subscriptions for near-frontier intelligence, and API per token costs for frontier intelligence.

Flash seems to be targeting the near-frontier category.

by TurdF3rguson8 hours ago|

parent|

[-]

That might work if it wasn't for FOMO. Are you ok with only $20 of frontier usage a month?

by rohansood154 hours ago|

parent|

[-]

Subjective, but if we compare to compute not everyone needs the most expensive laptops or super computers for their work.

I think frontier models will be invaluable for scientific research, defense, financial analysis and such. But the average person probably would be reasonably well-served with a local model.

If you're in sales, customer service, product management and such - the leading open models at the 30B mark are already good enough.

by booty8 hours ago|

prev|

[-]

Prevailing wisdom is that serving LLMs at a profit is achievable... it's when you factor in the cost of training them that prices get astronomical real fast.

Open-source model inference providers (who do not have to bear the cost of training) seem able to do it at much lower prices.

https://www.together.ai/pricing

https://fireworks.ai/pricing#serverless-pricing (scroll down to headline models)

Of course, it's possible that they are burning through investor cash as well, and apples-to-apples comparisons are not possible because AFAIK Google does not mention the size/paramcount for 3.5 Flash.

But if the prevailing wisdom is true, I think it's actually encouraging. It suggests that OpenAI and Anthropic could perhaps, if they need to, achieve profitability if they slow down model development and focus on tooling etc. instead. If true that's probably good news for everybody w.r.t. preventing a bursting of this economic bubble.

...my opinions here are of course, conjecture built on top of conjecture....

by eklitzke5 hours ago|

parent|

[-]

Most of the training cost is not in the final training run, it's in all of the R&D (including salaries, equity, etc.) that it takes to get to the final training run. The actual cost of all of the TPUs (or GPUs), power, networking, storage, etc. for the final training run is significant, but it's even more expensive to have this huge R&D team doing frontier model development and using a lot of those same resources during development.

I think you're right that releasing models at a slower cadence would bring down costs to some degree, but it's not clear how much. All of these companies could significantly reduce their opex but at the risk of falling behind in terms of being at the frontier.

by HDBaseT6 hours ago|

parent|

prev|

[-]

Not to discredit you, because you are 100% correct but tangential note about together.ai, they seem fairly unreliable with constant outages or higher than normal latency.

by BoorishBears7 hours ago|

prev|

[-]

This is trouble if you're not Google/OpenAI/Anthropic: they're all shifting towards pricing for the economic value of the knowledge work they're aiding.

The economic value increases non-linearly as models get more intelligent: being 10% more capable unlocks way more than 10% in downstream value.

That's trouble because the non-linear component means at some point their margins will stop primarily defined by the cost of compute, and start being dominated by how intelligent the model is.

At that point you can expect compute prices to skyrocket and free capacity to plummet, so even if you have a model that's "good enough", you can't afford to deploy it at scale.

(and in terms of timing, I think they're all well under the curve for pricing by economic value. Everyone is talking about Uber spending millions on tokens, but how much payroll did they pay while devs scrolled their phones and waited for CC to do their job?)

by IncreasePosts10 hours ago|

prev|

[-]

Maybe the margins are just very large for Google because they predict so much demand for 3.5?

by GodelNumbering10 hours ago|

parent|

[-]

This combined with locally runnable models getting pretty good recently (e.g. Qwen 3.6) tells me that it's time to seriously consider local dev setup again

by MASNeo10 hours ago|

parent|

[-]

Besides the cost you get the control, transparency and ability to identify small language models or LoRA you want to serve even more cost effective.

by cft9 hours ago|

parent|

prev|

[-]

This should become the new Apple's hardware and software play. I am hopeful about the new CEO