undefined

points

[-]

I'm developing software in this area right now, so I try a lot of the new models. They're not even close for coding tasks. It basically comes down to 26b parameters vs 1T parameters / quantisation / smaller context sizs, there's no comparison. However, for agentic work, tool calling, text summarisation, local LLMs can be quite capable. Workloads that run as background tasks where you're not concerned about TTFB, cold starts, tok/s etc., this is where local AI is useful.

If you have an M processor then I would recommend that you ditch Ollama because it performs slowly. We get double or triple tok/s using omlx or vmlx, respectively, but vmlx doesn't have extensive support for some models like gpt-oss.

by AstroBen4 hours ago|

parent|

[-]

Kimi K2.5 (as an example) is an open model with 1T params. I don't see a reason it has to be local for most use cases- the fact that it's open is what's important.

by verdverm2 hours ago|

parent|

prev|

[-]

first session with gemma4:31b looks pretty good, like it may actually be up to coding tasks like gemini-3-flash levels

you can tell gemma4 comes from gemini-3

by __mharrison__4 hours ago|

prev|

[-]

I recently experimented creating a Python library from scratch with Codex. After I was done, I took the PRD and Task list that was generated and fed them to opencode with Qwen 3.5 running locally.

Opencode was able to create the library as well. It just took about 2x longer.

by selectodude4 hours ago|

parent|

[-]

Which version of Qwen 3.5 did you use?

by verdverm4 hours ago|

parent|

[-]

which quant as well

by __mharrison__1 hours ago|

parent|

[-]

Not at my computer now, either 27 or 35b not quantized.

Next week I will be trying qwopus 27b.