undefined

points

[-]

Was the choice of such a small model driven by a desire for high tok/sec? I ask because an m4 pro 48gb machine can run larger models (if model intelligence is the thing that would make it more useful).

by sourc312 hours ago|

parent|

[-]

Yes that was my goal. Also noticed a huge performance gain going from ollama to mlx. Your mileage may vary.

by elij12 hours ago|

prev|

[-]

I'm using the 30b MOE model on same spec with 65k tokens as a sub agent with tooling and it absolutely writes decent code. The dense 9b I agree wasn't great.

by sjones67112 hours ago|

prev|

[-]

Thanks for saying this. There's so much nonsense out there online about local models being better than Opus 4.7 and the like. It's just not true for regular users.

I have a brand new M5 MacBook Pro - top end with all the specs and I've tried local models and they're barely functional.

by Yukonv12 hours ago|

parent|

[-]

What models and quantizations have you been trying? I've had great success with the larger Qwen 3.x models at 6-bit levels. Using 6 bit quantization is really the bare minimum to give local models a fair shot at agentic flows. Once you start pushing below that the models become more "dumb" from the limited bit space.

by SecretDreams11 hours ago|

parent|

prev|

[-]

The main benefits for local are:

1) control 2) privacy 3) transparent cost model

Cloud has tremendous value for speed, plug and play, and performance. You need to decide how those compete with the benefits of local - both today, and a year from now, e.g.

by hparadiz12 hours ago|

prev|

[-]

How does it (the openrouter version) compare to ChatGPT 5.5 or Claude Opus 4.6?

by sourc312 hours ago|

parent|

[-]

Good enough. It gets 60-70% of the work I need done for a lot less $ (keep in mind I am using these for personal projects that doesn’t generate revenue). If I was using it with the hopes of making money I think I would just use Codex at this point.

by 12 hours ago|

prev|

[-]

deleted