undefined

points

[-]

Correct, no issues because since at least a few months, llama.cpp/server exposes an Anthropic messages API at v1/messages, in addition to the OpenAI-compatible API at v1/chat/completions. Claude Code uses the former.

by selectodude7 hours ago|

prev|

[-]

I’ve jumped over to oMLX. A ton of rough edges but I think it’s the future.

by peder1 hours ago|

parent|

[-]

Have you been using `omlx serve`? If so, how are you bumping up the max context size? I'm not seeing a param to go above 32k?

by vlowther4 hours ago|

parent|

prev|

[-]

Same. Opencode + oMLX (0.3.4) + unsloth-Qwen3-Coder-Next-mlx-8bit on my M5 Max w 128GB is the sweet spot for me locally. The prompt decode caching keeps things coherent and fast even when contexts get north of 100k tokens.