undefined

points

by kridsdale313 hours ago |

comments

by qlm13 hours ago|

[-]

Hacker News moment

by toxik13 hours ago|

prev|

[-]

$10k is well outside my budget for frivolous computer purchases.

by zozbot2349 hours ago|

parent|

[-]

It would be plenty in-budget if the software part of local AI was a bit more full-featured than it is at present. I want stuff like SSD offload for cold expert weights and/or for saved/cached KV-context, dynamic context sizing, NPU use for prefill, distributed inference over the network, etc. etc. to all be things that just work for most users, without them having to set anything up in an overly error-prone way. The system should not just explode when someone tries to run something slightly larger; it should undergo graceful degradation and let them figure out where the reasonable limits are.

by bdangubic12 hours ago|

parent|

prev|

[-]

99.97% of HN users are nodding… :)

by hparadiz11 hours ago|

parent|

[-]

There are way too many good uses of these models for local that I fully expect a standard workstation 10 years from now to start at 128GB of RAM and have at least a workstation inference device.

by bdangubic10 hours ago|

parent|

[-]

or if you believe a lot of HN crowd we are in AI bubble and in 10 years inference will be dirt cheap when all of this crashes and we have all this hardware in data centers and it won't make any sense to run monster workstations at home (I work 128GB M4 but not run inference, just too many electron apps running at the same time...) :)

by bigiain7 hours ago|

parent|

[-]

> I work 128GB M4 but not run inference, just too many electron apps running at the same time.

This is somewhat depressing - needing a couple of thousand bucks worth of ram just to run your chat app and code/text editor and API doco tool and forum app and notetaking app all at the same time...

by hparadiz10 hours ago|

parent|

prev|

[-]

Inference will be dirt cheap for things like coding but you'll want much more compute for architectural planning, personal assistants with persistent real time "thinking / memory", as well as real time multimedia. I could put 10 M4s to work right now and it won't be enough for what I've been cooking.

by stefs9 hours ago|

parent|

prev|

[-]

yeah, but if you really really wanted to and/or your livelyhood depended on it, you probably could afford it.

by lpnam02011 hours ago|

prev|

[-]

In where I am living, 10k USD is a little more than 3 years worth of rent, for a relatively new and convenient 2 bedroom apartment.

by gbgarbeb33 minutes ago|

parent|

[-]

$277 a month for a two bedroom is literally 6-10% of what someone in the SF Bagholder Area pays.

Either you're in Africa, southeast Asia or south/central Amarica.

How do you even afford internet?

by SlavikCA13 hours ago|

prev|

[-]

I'm running it on my Intel Xeon W5 with 256GB of DDR5 and Nvidia 72GB VRAM. Paid $7-8k for this system. Probably cost twice as much now.

Using UD-IQ4_NL quants.

Getting 13 t/s. Using it with thinking disabled.

by GrayShade1 hours ago|

parent|

[-]

I get 20 t/s on the UD-Q6_K_XL quant, Radeon 6800 XT.

by rwmj12 hours ago|

prev|

[-]

For some reason you were being downvoted but I enjoy hearing how people are running open weights models at home (NOT in the cloud), and what kind of hardware they need, even if it's out of my price range.

by kylehotchkiss8 hours ago|

prev|

[-]

you have proved my point