undefined

upvote

points

by maherbeg19 hours ago |

upvote

by dakolli18 hours ago|

[-]

There will always be a huge gap between frontier models and open source models (unless you're very rich). This whole industry makes no sense, everyone is ignoring the unit economics. It cost 20k a month to running Kimi 2.6 at decent tok/ps, to sell those tokens at a profit you'd need your hardware costs to be less 1k a month.

Everyone who's betting their competency on the generosity of billionaires selling tokens for 1/10-1/20th of the cost, or a delusional future where capable OS models fit on consumer grade hardware are actually cooked.

reply

upvote

by bensyverson17 hours ago|

[-]

If you looked at a graph of GPU power in consumer hardware and model capability per billion parameters over time, it seems inevitable that in the next few years a "good enough" model will run on entry-level hardware.

Of course there will always be larger flagship models, but if you can count on decent on-device inference, it materially changes what you can build.

reply

upvote

by physicsguy17 hours ago|

[-]

It also massively changes the value economics of the frontier models. In a lot of cases, you really don't need a general purpose intelligence model too.

reply

upvote

by bensyverson16 hours ago|

[-]

Exactly… as hn readers, we sometimes forget that a lot of people are using these tools to search for the best sunscreen, or rewrite an email.

reply

upvote

by dakolli17 hours ago|

[-]

No offense, this is a crazy delusional statement.

reply

upvote

by 17 hours ago|

[-]

deleted

reply

upvote

by afro8817 hours ago|

[-]

No offense, this is a crazy worthless contribution to the discussion.

Why?

reply

upvote

by dakolli16 hours ago|

[-]

Because everyone in these replies is in complete denial about the physical limits of memory and scaling in general. Ya'll literally living in an alternate reality where model capability increases with a decrease in size, its simply not the case. There will be small focused models that preform well on very narrow tasks, yes, but you will not have "agents" capable of "building most things" running on consumer hardware until more capable (and affordable) consumer hardware exists.

reply

upvote

by bensyverson15 hours ago|

[-]

Ah, you haven't realized that consumer hardware gets more capable over time

reply

upvote

by adrian_b14 hours ago|

[-]

Not this year, when many vendors either offer lower memory capacities or demand higher prices for their devices.

reply

upvote

by bensyverson12 hours ago|

[-]

Correct, the progress is not perfectly linear. But do you believe technological progress has stalled forever? If so, I'd get out of tech and start selling bomb shelters.

reply

upvote

by dakolli11 hours ago|

[-]

Do you really think the trend of consumer hardware is heading towards more memory and better specs? Apple's most popular product this year is an 8gb of RAM laptop..

The trend is heading in the opposite direction, less options for strong consumer hardware and towards cloud based products. This is a memory issue more than anything. Nvidia is done selling their ddr7 to gamers and people with AI girlfriends.

reply

upvote

by bensyverson8 hours ago|

[-]

Just so that I have your position straight: you actually believe that over the long term, like 10, 20 years, that the amount of RAM in a laptop is going to go down?

It's not out of the realm of possibility, but I just want to make you aware that this would be a very surprising development in computing history.

reply

upvote

by fulafel7 hours ago|

[-]

This seems to be a different discussion than was going on up thread about:

> in the next few years a "good enough" model will run on entry-level hardware

reply

upvote

by wtallis7 hours ago|

[-]

Exactly. In the next few years, entry-level hardware will not be advancing beyond 16GB. And anything beyond 32GB will remain decidedly high-end.

And that's for laptops with unified memory. In the desktop space, 8GB discrete GPUs are going to be sticking around for a very long time.

reply

upvote

by dakolli8 hours ago|

[-]

A future with less RAM is possible with more applications using computational storage with ssd/nvme.

But that's not my main argument is that its delusional for OP thinks its reasonable to expect that soon we'll be able to run models on consumer hardware that will be able to build basically most things,

But I do think there will be many compromises made for consumer electronics, I don't think the powers that be are eager to give consumers all the best memory (that should be clear by now) There's 3 DDR5 DRAM manufactures in the world that have to provide memory to all the world's militaries, governments, datacenters/corporations. Consumers are last priority.

reply

upvote

by iuffxguy8 hours ago|

[-]

This is more then just the hardware evolving over time but we also are seeing big improvements in quantization and efficiency improvements.

reply

upvote

by dakolli7 hours ago|

[-]

There are physical limits to how much you can compress data. I'm just saying, don't sit on your hands waiting for this to happen, becuase its probably not going to for another decade +. There's no use in waiting, just write the code your fkin self and stop being lazy.

reply

upvote

by liuliu17 hours ago|

[-]

I am not sure where this comment is from (possibly without looking at this project?). This project is running quasi-frontier model at reasonable tps (~30) with reasonable prefill performance (~500tps) with a high-end laptop. People simply project what they see from this project to what you optimistically can expect.

You can argue whether the projection is too optimistic or not, but this project definitely made me a little bit optimistic on that end.

reply

upvote

by maherbeg13 hours ago|

[-]

There will always be a gap, but what's interesting is that because new models are constantly coming out, we as an industry never spend any time extracting the maximal value out of an existing model. What if there are techniques, and harness workflows that could be optimized for a singular model end to end? How far can that push the state of the art.

An example is https://blog.can.ac/2026/02/12/the-harness-problem/ for just improving edits.

Or if we could really steer these open source models using well structured plans, could we spend more time planning into a specific way and kick off the build over night (a la the night shift https://jamon.dev/night-shift)

reply

upvote

by amunozo17 hours ago|

[-]

Most tasks do not require frontier models, so as long as these models cover 95-99 per cent of the tasks, closed frontier models can be left for niche and specialized cases that are harder.

reply

upvote

by dakolli16 hours ago|

[-]

Frontier models can hardly do the tasks I want them too, I simply cannot buy into this notion.

reply

upvote

by drob51815 hours ago|

[-]

For instance?

reply

upvote

by daveguy11 hours ago|

[-]

> There will always be a huge gap between frontier models and open source models (unless you're very rich).

They said the same thing about open source chess engines.

reply

upvote

by otabdeveloper417 hours ago|

[-]

> a delusional future where capable OS models fit on consumer grade hardware

48 gb is enough for a capable LLM.

Doing that on consumer grade hardware is entirely possible. The bottleneck is CUDA and other intellectual property moats.

reply