Hacker News
new
past
comments
ask
show
jobs
points
by
selectodude
7 hours ago
|
comments
by
peder
1 hours ago
|
next
[-]
Have you been using `omlx serve`? If so, how are you bumping up the max context size? I'm not seeing a param to go above 32k?
reply
by
vlowther
4 hours ago
|
prev
|
[-]
Same. Opencode + oMLX (0.3.4) + unsloth-Qwen3-Coder-Next-mlx-8bit on my M5 Max w 128GB is the sweet spot for me locally. The prompt decode caching keeps things coherent and fast even when contexts get north of 100k tokens.
reply