undefined

points

[-]

Have you been using `omlx serve`? If so, how are you bumping up the max context size? I'm not seeing a param to go above 32k?

by vlowther4 hours ago|

prev|

[-]

Same. Opencode + oMLX (0.3.4) + unsloth-Qwen3-Coder-Next-mlx-8bit on my M5 Max w 128GB is the sweet spot for me locally. The prompt decode caching keeps things coherent and fast even when contexts get north of 100k tokens.