Which provider are you using? I got a z.ai Lite Coding Plan and it's my understanding z.ai is on the slower side of providers and the Lite plan gets lower priority on top of that. In the api key console, it shows dipping below 60 tok/sec which is quite slow.
Switched the model to GLM-5.2 halfway in the middle of a troubleshooting session (didn't even bother to reprompt, just changed it in the middle of its reasoning), gave it a few minutes, problem fixed. This is with the subscription based allocation on OpenCode Go, where a problem like this would completely burn up my Opus for the current 5 hours or even the current week.
It's always a shock to me how opaque most other models are!
It also is pretty resilience to letting you inject in while it's working without going off course or while getting back on track after, which I appreciate
This is (unfortunately) by design. The proprietary models hide their reasoning traces so they can't be used for model distillation. Sometimes even when they do show reasoning, it isn't the model's real trace - IIRC, someone was able to demonstrate that Opus' reasoning is usually a summary made with Haiku behind the scenes.
It is less than 20% of the cost of Opus at API rates. 1.40/4.40 vs 5/25.
Maybe makes sense if you have z.AI's (not greatly priced) subscription plan, but it's not competitive against an OpenAI or Anthropic monthly coding subscription plan. I burned through almost $10 worth of tokens just doing an hour of work.
You get access to a whole bunch of bleeding edge open models including GLM-5.2, Kimi K2.7, DeepSeek 4 Pro, etc. Inference is run on US/SG/EU cloud providers with zero data retention policies. The $20/mo tier is very generous, in my experience.
> Where are models hosted?
> Ollama hosts models and compute resources primarily in the United States. To serve global demand, we may route to Europe and Singapore for additional capacity.
> Is my prompt or response data trained on?
> Prompt or response data is never logged or trained on.
> Who does Ollama partner with to host models?
> Ollama collaborates with NVIDIA Cloud Providers (NCPs) to host open models.
> When Ollama partners with providers, we require no logging, no training, and zero data retention policies in place.