undefined

upvote

points

by dTal21 hours ago |

upvote

by 62746720 hours ago|

[-]

"Normal people" have never bothered to host their own: photos, music, videos, documents, comunications, etc. To the point that for many their computer is essentially a thin client into someone else's server. Why would we think this same people would care about "personal" inference?

reply

upvote

by trollbridge18 hours ago|

[-]

Normal people can go open an account at DeepSeek or Xiaomi and chat away for free. Or, for that matter, a couple other models like z.ai's (GLM-5.2 isn't in the free tier, though, but neither is GPT-5.5-Pro), or Qwen, which does have 3.7-Max for free with no account on their chatbot interface.

Yes, I realise this isn't "running a local model", but it's using models that can be grabbed and run locally. For my pipelines, I feel far more confidence when I use an open model (even one like GLM-5.2 that would be expensive for me to run) since I have a backup plan if the hosted/cloud option becomes unworkable for me. If that happens to me with Opus, I have zero options.

reply

upvote

by cdata19 hours ago|

[-]

If our strategy to avoid "slavery" involves "normal people" taking the local-vs-managed choice seriously, we have already lost.

This choice is made for us. The deciding factors will be convenience and economics.

My sense is that just like Web 2.0 SaaS we are destined for servitude.

A better strategy is to play an assymetrical game IMO. Don't let your would-be master write the rules by which you play.

reply

upvote

by yeeeloit18 hours ago|

[-]

> A better strategy is to play an assymetrical game IMO. Don't let your would-be master write the rules by which you play.

What do you mean by this? Do you have an example in the given context?

reply

upvote

by At1C10 hours ago|

[-]

[dead]

reply

upvote

by 8note20 hours ago|

[-]

normal people dont really have the hardware to run local models

reply

upvote

by dTal9 hours ago|

[-]

Anyone with an M-series Apple computer can run something very competently. Mac Pro users can run 30B class models which is good enough for the vast majority of practical everyday purposes, far better than the original ChatGPT was. Anyone with a gaming computer is in a similar situation. The rest of us can still run stuff, just not as big or as fast.

reply

upvote

by sosodev19 hours ago|

[-]

They have it, we just haven’t enabled them. The smart model with a chat box is the wrong abstraction for local. Ideally we would have it built into applications as a clear and easy to use opt-in feature. Like allowing a user to index a folder on their hard drive and then search it semantically via embeddings. You could do that on fairly low end hardware these days. Like 2GB of RAM with any processor made within the last 10 years.

reply

upvote

by manithree19 hours ago|

[-]

They may not right now, but the whole point of Microsoft's Copilot+ PC standard (even though it's somewhat anemic) is to run models locally. Apple Silicon with enough unified memory is capable. Not to mention modern iPhones and Pixels have fairly capable NPUs and routinely run local models. So, we may not be to the point where most normal people have the hardware to run local models, but it is rapidly approaching.

reply

upvote

by Danox16 hours ago|

[-]

As time goes on, they’re almost certainly will be very capable local models in the long run we (general computer users) aren’t going back to the era of mainframe computing no matter how much OpenAI, Meta or Google would like us to.

reply

upvote

by dTal8 hours ago|

[-]

We aren't? Are you sure? Where is your email inbox? Where are your backups? Where are your music files? For most people the answer to all those is "someone else's computer".

reply

upvote

by trollbridge18 hours ago|

[-]

Gamers can run Qwen 3.6 quantised models now.

You would also be shocked what's possible on a 64GB Mac Studio, which isn't that unattainable.

reply

upvote

by conception19 hours ago|

[-]

Google Edge Gallery is turn key for people and on the device most people chatgpt on. Just like with most Google Stuff “edge gallery” is maybe the worst name possible for “run AI on your phone”!

reply

upvote

by theptip20 hours ago|

[-]

Why do you feel the important part _now_ is where the weights get run?

I can see this as a future battleground but access to frontier models (which you cannot run locally) seems a lot more relevant today.

reply

upvote

by dTal8 hours ago|

[-]

Because the local LLMs available today are already fantastic, and the difference between no LLM and an open weights LLM is much smaller than the gap between an open LLM and a so-called "frontier" model.

It's important that people get used to the idea that your interactions with a language model are a highly personal thing. LLMs can perceive and categorize us in ways we can't even imagine, far more violently than the simple algorithmic feeds which have already corroded public discourse so much. LLMs can control us. LLMs warp the information landscape more radically than even the internet did. Even now you are likely underestimating their role in future society.

The principles of software freedom are becoming existentially important.

reply

upvote

by itkovian_19 hours ago|

[-]

You can’t run a closed llm locally. Strange to frame the dichotomy as between local and open. One begets the other.

reply

upvote

by idiotsecant20 hours ago|

[-]

Better UX does not buy you a datacenter farm to train state of the art cutting edge models. Right now the only people who can do that are the technobility class.

reply

upvote

by dTal20 hours ago|

[-]

It does not, but it might encourage more people to care. Worrying about training is a luxury when you are starting from a baseline of "OpenAI spies upon me and controls my access". Let's focus on getting every Tom, Dick and Harry 1) on board with LLMs, because they're happening, 2) habitually using local software.

reply

upvote

by trollbridge18 hours ago|

[-]

The same used to be true of being able to program computers and compile software.

Of course the frontier will always be unattainable, but that's like pointing out that I couldn't buy my own Cray supercomputer.

reply

upvote

by azinman220 hours ago|

[-]

> We are sleepwalking into slavery.

That’s a bit hyperbolic…

reply

upvote

by MrDrMcCoy18 hours ago|

[-]

Some hyperbole is useful. The problem is real and serious, though short of the specific verbiage.

reply

upvote

by 0gs20 hours ago|

[-]

it's funny because i made this thing (called enough) that aims to make it easy for non-technical people to get up and running with local models quickly, but it is impossible to figure out how to break through the noise. every thread and comment like this breaks my heart a lil bit

reply

upvote

by dTal8 hours ago|

[-]

Link? You have to tell us if you want to break through the noise!

reply

upvote

by 0gs5 hours ago|

[-]

sure! github.com/0gsd/enough ; enough.support has some FAQs. i did post a Show HN on it and i intend to do so again sooner than later haha

reply

upvote

by double0jimb020 hours ago|

[-]

Yea, anyone who understands what makes products actually usable is opting to get paid for said skill.

reply

upvote

by bsder19 hours ago|

[-]

> we are losing that battle badly despite all the software being here now and viable, entirely because UX sucks.

Yep. I'm an old time Linux sysadmin, but I am COMPLETELY baffled as to what I can or cannot run on my 32GB R9700 with 128GB main CPU memory.

If I want something Claude or Codex like what do I use that would be useful? If I want a chat system, what do I use? Images--apparently ComfyUI for setup but after that what do I do?

I don't even mind spinning up something in the cloud for a bit, but I need to know how I'm going to get data up and down without racking up massive bandwidth charges.

I'd love to do some tinkering, but the field is moving so fast and so full of charlatans that cleaning the dross out is almost impossible.

reply

upvote

by entrope7 hours ago|

[-]

For coding, Qwen3.6-27B with MTP should fit in 32GB with almost full context length for Unsloth's 5-bit quantization. That's my preferred choice for a local coding agent on similar hardware: the quality delta compared to a MoE model is IMO worth the extra wait. (And I haven't found a model with 70B-120B parameters that works better for coding.) For general chat, maybe gpt-oss-120b? It should have more general knowledge than a 30B-class model; I've used it to suggest itineraries for trips and to review the completeness of small requests for proposals.

I don't have recommendations for images because I haven't played with those.

reply

upvote

by markhahn15 hours ago|

[-]

these days, even completely mainstream distros (Fedora here) include ollama, which leverages a wide range of hardware and range of models. (it's generally useful to install a more recent ollama, though.) there are free coding harnesses too.

reply

upvote

by dTal8 hours ago|

[-]

ollama is just a wrapper around llama.cpp, and a pretty janky one at that. You're much better off using it directly.

reply

upvote

by wmf20 hours ago|

[-]

LM Studio

reply