undefined

points

by cpburns20097 hours ago|

[-]

Qwen3-Coder-Next works well on my 128GB Framework Desktop. It seems better at coding Python than Qwen3.5 35B-A3B, and it's not too much slower (43 tg/s compared to 55 tg/s at Q4).

27B is supposed to be really good but it's so slow I gave up on it (11-12 tg/s at Q4).

by vlowther2 hours ago|

parent|

[-]

The 8 bit MLX unsloth quant of qwen3-coder-next seems to be a local best on an MBB M5 Max with 128GB memory. With oMLX doing prompt caching I can run two in parallel doing different tasks pretty reasonably. I found that lower quants tend to lose the plot after about 170k tokens in context.

by cpburns20091 hours ago|

parent|

[-]

That's good to know. I haven't exceeded a 120k context yet. Maybe I'll bite the bullet and try Q6 or Q8. Any of coder-next quants larger than UD-Q4_K_XL take forever to load, especially with ROCm. I think there's some sort of autotuning or fitting going in llama.cpp.

by UncleOxidant4 hours ago|

parent|

prev|

[-]

Agreed. Qwen3-coder-next seems like the sweetspot model on my 128GB Framework Desktop. I seem to get better coding results from it vs 27b in addition to it running faster.