upvote
Qwen 3.6 which was released this month is a large but still smaller model. Supposedly it's at about sonnet level when configured correctly. It can be run on commodity hardware without purchasing a data center. https://www.reddit.com/r/LocalLLaMA/comments/1so1533/qwen36_...

Then there are middle size ones which require multiple gpus which are like gpts latest flagships.

Then there is kimi 2.6 which is a monster that is beating opus in some benchmarks. https://www.reddit.com/r/LocalLLaMA/comments/1sr8p49/kimi_k2...

It's basically whatever you can afford. Any trash heap laptop can run code auto complete models locally no problem. The rest require some level of investment, an idle gaming pc, or a serious investment

reply
Depends on your VRAM or "unified" memory for how smart it is, and CPU/GPU for how quick it is.

128GB of RAM? Sure, the early to mid 4s releases, except maybe 4o. And on an M5 Max, about the same speed.

I wouldn't really bother under 64GB (meaning 32GB or less) except for entertainment value (chats, summaries, tasky read-only agent things).

reply
GLM 5.1 and DeepSeek 4 are acceptable, but the cost of hardware and energy cost that depending on your use case you may as well purchase a Tokens. They get useless and stupid rapidilty if you quant enough to run on single 16-24GB GPU style.
reply