undefined

[-]

LMArena isn't very useful as a benchmark, however I can vouch for the fact that GLM 5.1 is astonishingly good. Several people I know who have a $100/mo Claude Code subscription are considering cancelling it and going all in on GLM, because it's finally gotten (for them) comparable to Opus 4.5/6. I don't use Opus myself, but I can definitely say that the jump from the (imvho) previous best open weight model Kimi K2.5 to this is otherworldly — and K2.5 was already a huge jump itself!

by blahblaher9 hours ago|

[-]

qwen3.5/3.6 (30B) works well,locally, with opencode

by zozbot2349 hours ago|

[-]

Mind you, a 30B model (3B active) is not going to be comparable to Opus. There are open models that are near-SOTA but they are ~750B-1T total params. That's going to require substantial infrastructure if you want to use them agentically, scaled up even further if you expect quick real-time response for at least some fraction of that work. (Your only hope of getting reasonable utilization out of local hardware in single-user or few-users scenarios is to always have something useful cranking in the background during downtime.)

[-]

For a business with ten or more engineers/people-using-ai, it might still make sense to set this up. For an individual though, I can’t imagine you’d make it through to positive ROI before the hardware ages out.

by zozbot2348 hours ago|

[-]

It's hard to tell for sure because the local inference engines/frameworks we have today are not really that capable. We have barely started exploring the implications of SSD offload, saving KV-caches to storage for reuse, setting up distributed inference in multi-GPU setups or over the network, making use of specialty hardware such as NPUs etc. All of these can reuse fairly ordinary, run-of-the-mill hardware.

by DeathArrow7 hours ago|

[-]

Since you need at least a few of H100 class hardware, I guess you need at least few tens of coders to justify the costs.

by wuschel8 hours ago|

[-]

What near SOTA open models are you referring to?

by cyberax7 hours ago|

[-]

I'm backing up a big dataset onto tapes, so I wanted to automate it. I have an idle 64Gb VRAM setup in my basement, so I decided to experiment and tasked it with writing an LTFS implementation. LTFS is an open standard for filesystems for tapes, and there's an implementation in C that can be used as the baseline.

So far, Qwen 3.6 created a functionally equivalent Golang implementation that works against the flat file backend within the last 2 days. I'm extremely impressed.

by Gareth3215 hours ago|

[-]

It is surprisingly competent. It's not Opus 4.6 but it works well for well structured tasks.

[-]

I want to bump this more than just a +1 by recommending everyone try out OpenCode. It can still run on a Codex subscription so you aren’t in fully unfamiliar territory but unlocks a lot of options.

by zozbot2348 hours ago|

[-]

The Codex TUI harness is also open source and you can use open models with it, so you can stay in even more familiar territory.

by pwython8 hours ago|

[-]

pi-coding-agent (pi.dev) is also great. I've been using it with Gemma 4 and Qwen 3.6.

by equasar4 hours ago|

[-]

The thing I dislike about OpenCode is the lack of capabilities of their editor, also, resource intensive, for some reason on a VM it chuckles each 30 mins, that I need to discard all sessions, commits, etc.

I don't know if it is bun related, but in task manager, is the thing that is almost at the top always on CPU usage, turns out for me, bun is not production ready at all.

Wish Zed editor had something like BigPickle which is free to use without limits.

by jherdman8 hours ago|

[-]

Is this sort of setup tenable on a consumer MBP or similar?

by danw19798 hours ago|

[-]

Qwen’s 30B models run great on my MBP (M4, 48GB) but the issue I have is cooling - the fan exhaust is straight onto the screen, which I can’t help thinking will eventually degrade it, given the thermal cycling it would go through. A Mac Studio makes far more sense for local inference just for this reason alone.

[-]

For a 30B model, you want at least 20GB of VRAM and a 24GB MBP can’t quite allocate that much of it to VRAM. So you’d want at least a 32GB MBP.

by richardfey8 hours ago|

[-]

I have 24GB VRAM available and haven't yet found a decent model or combination. Last one I tried is Qwen with continue, I guess I need to spend more time on this.

by zozbot2348 hours ago|

[-]

It's a MoE model so I'd assume a cheaper MBP would simply result in some experts staying on CPU? And those would still have a sizeable fraction of the unified memory bandwidth available.

[-]

I haven’t tried this myself yet but you would still need enough non-vram ram available to the cpu to offload to cpu, right? This is a fully novice question, I have not ever tried it.

by _blk8 hours ago|

[-]

Is there any model that practically compares to Sonnet 4.6 in code and vision and runs on home-grade (12G-24G) cards?

by macwhisperer5 hours ago|

[-]

im currently running a custom Gemma4 26b MoE model on my 24gb m2... super fast and it beat deepseek, chatgpt, and gemini in 3 different puzzles/code challenges I tested it on. the issue now is the low context... I can only do 2048 tokens with my vram... the gap is slowly closing on the frontier models

by Gareth3215 hours ago|

[-]

The Mac Minis (probably 64GB RAM) are the most cost effective.

by cpursley8 hours ago|

[-]

How are you running it with opencode, any tips/pointers on the setup?

by cmrdporcupine8 hours ago|