undefined

points

[-]

Second here. From recent Alibaba Qwen conference: the all-in-one box (DC in a box - I think I was called Apsara, 0.6x0.6x1.5m) plug and play, 1.5TB GPU RAM, capability to run in a fully air gapped environment, any open models... All of that is roughly $300k one time. And this box can do non LLM tasks as well. Performance (throughput) around 20k t/s. Delivery time - around 2 months. For any medium sized company its perhaps cheaper to just buy it once than spending 1.5k for cloud per user

by jon_adler1 hours ago|

parent|

[-]

Where can I find more information on this? A web search didn’t reveal much for me.

by dmos627 hours ago|

prev|

[-]

Decent vs best-money-can-buy. Further, a self-hosted LLM will be much slower.

by VBprogrammer7 hours ago|

parent|

[-]

I think we're all past the "bet-money-can-buy" stage. The most expensive models are an order of magnitude more expensive than the middle ground ones, so you need to be selective about what you run where.

And with a bit of careful routing - there isn't a lot stopping you sending the hard stuff to a cloud model and the average stuff to an on prem model.

by dmos627 hours ago|

parent|

[-]

Only people who do pay-per-use optimize this. Most heavy users have their use covered by an employer.

by VBprogrammer6 hours ago|

parent|

[-]

I have my use covered by my employer but we also have budgets and limits.

by VBprogrammer7 hours ago|

prev|

[-]

I'd think for most companies the pace of change is too high at the moment. Give it a few years, a bit of a plateau in the improvements in frontier models and I can't see how many of these companies don't implode under the weight of competition on inference prices.