undefined

points

[-]

Thinking that on prem models will be a halfway decent solution against what can be served out of a data center is a fools take... One that is more common than it should be on here...

by wolttam17 hours ago|

parent|

[-]

The point is not to be as good as the multi-trillion parameter model you can host in across 72 GPUs (or whatever).

I'm running a 248B model on a paltry amount of hardware and getting plenty of good use out of it.

Sure, the most demanding tasks will demand the best models (and always will). There's still less demanding tasks for other models.

I think some people are fooling themselves that coding of all tasks is always going to requires the biggest models ever. Again, maybe some coding tasks will, but the majority of business CRUD apps probably don't. Same goes for virtually any other type of task. The biggest models are really only useful for the most complex tasks.

by sgc16 hours ago|

parent|

[-]

If you wouldn't mind, could you explain a bit what the 248B model is good for, and where it breaks down and you need something better? I hear this take often, but it is always a fleeting remark so I have no idea what the 'useful' looks like - at all.

by wolttam15 hours ago|

parent|

[-]

To answer this and my sibling, it's DeepSeek V4 Flash at native FP4 quantization, on two Nvidia DGX Sparks. Which is a bit of kit but still paltry relative to the data centre. ~40 TPS generation, ~2000 TPS prompt processing, which makes it feel approximately as fast as typical APIs.

I primarily use it with my own harness for coding. I'm not going to say it will compete with Opus in the most challenging domains, because it won't, but I will say that there's a reasonable likelihood that Opus is used for tasks that a model like Flash could comfortably handle at 1/100th the cost.

So far I've only seen it struggle at tasks that I myself would struggle with. Tasks that I can describe the shape of the solution for, it has a high success rate at implementing.

Useful is going to be different for everyone. I'm not working on the hardest problems, I don't need the best models.

by ihateolives13 hours ago|

parent|

prev|

[-]

In my experience they require much more hand holding and more specific directions with less possibilities to interpret a command in several ways. You do the planning, keep on eye on that they're producing and they do the legwork. It's not that their knowledge of Java or PHP or what have you is lacking, it's the long horizon planning that you have to do yourself. Technically they're good. You just have to do more thinking and more reviewing yourself. YMMV.

by rhipitr16 hours ago|

parent|

prev|

[-]

Depending on quantization I figure they need at least a p4 and likely a p5 EC2 (or similar instance in another provider) for a model with that many parameters. Maybe they are hosting on bare metal but I imagine not. Those instance types (assuming not using spot) are quite expensive to run.

by aerhardt7 hours ago|

parent|

prev|

[-]

It’s perfectly reasonable to believe that a law of marginal decreasing returns will kick in at some point (if it hasn’t already), and that what one point looked like an exponential may start looking like an s-curve.

I do not see how being experienced in engineering, or having higher studies in computer science and economics should make that view less common.

by upbeat_general17 hours ago|

parent|

prev|

[-]

If we’re defining on-prem as fitting in a rack - then every frontier model can be hosted on-prem.

Now this might not be the most cost effective (and may require a bit extra power), but you only need a datacenter for training or cost optimization.

by johndough11 hours ago|

parent|

prev|

[-]

The recent MiMo-V2.5-Pro-UltraSpeed can be served from 8 GPUs, which is certainly within the reach of sophisticated on-prem setups. https://mimo.xiaomi.com/blog/mimo-tilert-1000tps

by WarOnPrivacy18 hours ago|

prev|

[-]

> I predict the pressures for on-prem, offline access ... will be overwhelming and one the players will fill the need.

I'd agree except that Big AI has made sure that most of us can't afford the hardware (RAM, NVMe, etc) to run it.

by Folcon17 hours ago|

parent|

[-]

Honestly at this point I'm not sure how much that matters?

by sgrove18 hours ago|

prev|

[-]

Likely many points along the pareto frontier.

Some will take greater risks and win (or lose); others will play it safer and slowly accumulate wins (or be obsoleted).

Never mind the threat of letting these models write code that runs your business, or operate it agentically. Models trained by actors (corporate or nationstate) diametrically opposed to your interests.

Lots to take into account now, interesting time to be in business.

by bryzio16 hours ago|

prev|

[-]

Or abstract i.e. openrouter, that reduces the risk vector to "all implementations have been simultaneously banned".

If a government entity bans a LLM provider due to a jailbreak concern, they can also ban an on-prem solution under the same guise. The jailbreak risk exists regardless of where it's hosted. You could defensibly argue the on-prem risk is higher since frontier model companies can justify safety spend due to their size, it's more difficult to combat bad actors if you're company is the only one using the model and you don't have economies of scale.

by stevarino17 hours ago|

prev|

[-]

This is ignoring the fact that the government is the foundation of society (I know some will disagree with that, but the end result is just government with more steps).

Private models in a low trust society means the government will come and seize the models. Competitive business will only be allowed through cronyism.

The better option is to opt for high trust. Yes the Gman can rip your servers apart, but they know they'll face consequences, legal and political. Laws and regulations are the answer, not locking down into smaller fiefdoms.

by senderista16 hours ago|

parent|

[-]

You get high trust through social norms, not by more "laws and regulations". Social norms can't be imposed by fiat, they arise spontaneously, often for unclear reasons. That's why they're so fragile and precious. With Trump's destruction of social norms around the presidency and the federal government generally, the US is now just another country where bribery is the cost of doing business.

by iamnothere16 hours ago|

parent|

[-]

Through social norms and through policies that ensure the public on average feels prosperous and secure.

by yogthos18 hours ago|

prev|

[-]

This is precisely why I expect that Chinese open models are going to win in the long run. The capability difference isn't dramatic in the grand scheme of things, but the fact that you can run your own is a huge selling point. Even if you rent an open model from a Chinese company, you can switch to on prem if they decided to yank access or change terms in the way you don't like. It might be a pain, but it wouldn't be existential. On the other hand, if you become dependent on a closed model and it gets yanked then you're in a world of hurt.

And infrastructure dominance is really the big picture here. Chinese models are going to become the standard setters because they're going to be what people are using. That means more research, more tooling, and a whole ecosystem developing around them.

And that was already starting to happen even before this fiasco with Chinese models now being the most used ones globally. https://www.indiatoday.in/amp/technology/features/story/clau...

by UncleOxidant17 hours ago|

parent|

[-]

After this action, I have no doubt that this administration will try to ban Chinese models. Of course, doing so will be futile, we'll figure out ways to get around it, but now I'm pretty sure they're going to try.

by angry_octet5 hours ago|

parent|

[-]

It is almost certain that the CCP will impose constraints on access to their models at some point too. But Trump is doing it to extort cash from Anthropic, and China will be doing it to leverage political and economic concessions.

Remember that there are degrees of banning. Slower tokens, dumber models, token caps, KYC for each model consumer, hurting specific companies that are not capitulating in a deal with a Chinese company, etc.

by yogthos59 minutes ago|

parent|

[-]

A big difference with open models is that anybody can run and tune them any way they like. The real difference in philosophy is that Americans companies treat the model as the product, while Chinese companies see models at infrastructure you build products on top of. You amortize the cost of deploying it at scale by sharing knowledge and iterating quickly to bring the cost down.

I see absolutely no reason why CPC would choose to kneecap themselves the way the USG just did. Keeping open access to the models means that the whole world will be using Chinese based AI stack going forward. Only a government run by absolute imbeciles would do what the US did.

by yogthos17 hours ago|

parent|

prev|

[-]

I'm waiting for that to happen as well since the price difference makes it very difficult for companies like Anthropic and OpenAI to compete. And we already have precedent for this with stuff like EVs, phones, and so on. As soon as Chinese companies start making a product that's more popular, they get banned on some national security pretext.

The tricky part with banning Chinese models is that they're open. It'll be easy to ban access to service providers, but preventing people from running these models on prem is going to be really tough. Like are they going to go after Cursor for example given that their model is based on Kimi?

I very much agree it's going to be a futile endeavour in the end. It kind of reminds me of the time Microsoft tried to get Linux and open source banned when Linux started encroaching on Windows server market. This is going to end the same way.

by UncleOxidant17 hours ago|

parent|

[-]

I'm going to guess they'll go after sites like Huggingface that host downloads. I suspect we'll be torrenting Chinese models in the not-too-distant future. Or we'll have to geo-spoof with VPN to download from other countries.

by AbstractH2416 hours ago|

prev|

[-]

Why? None of the various cloud provider outages ever have.

by duped17 hours ago|

prev|

[-]

[flagged]

by hackmack1015 hours ago|

parent|

[-]

Great point. That is what all the Fortune 500 CEO's are frothing at the mouth about. Having LLM's replace their payroll. So yeah, they deserve to fail.