upvote
Same problem with Gemma 4 + oMLX + OpenCode. The thinking and tool calling seems to be parsed fine in other clients such as Open WebUI. This really shouldn’t even matter because the client isn’t responsible for parsing the output, but it’s happening anyway.
reply
possibly a problem with the chat template

https://huggingface.co/google/gemma-4-31B-it/discussions/118

reply
Huh. Same problem, and I run with llama.cpp. In my case, Gemma4-31B (4-bit quant though) will just stop sometimes.
reply