undefined

points

[-]

Since that defaults to the q4 variant, try the q8 one:

  ollama launch claude --model gemma4:26b-a4b-it-q8_0

by pshirshov22 hours ago|

parent|

[-]

Even tried gemma4:31b and gemma4:31b with 128k context (I have 72GiB VRAM). Nothing. I'm cursed I guess. That's ollama-rocm if that matters (I had weird bugs on Vulkan, maybe gemma misbehaves on radeons somehow?..).

UPD: tried ollama-vulkan. It works, gemma4:31b-it-q8_0 with 64k context!

by alfiedotwtf14 hours ago|

parent|

[-]

The default context is 128k for the smaller Gemma 4’s and 256k for the bigger ones, so you’re cutting off context and it doesn’t know how to continue.

Bump it to native (or -c 0 may work too)

by pshirshov10 hours ago|

parent|

[-]

In that case the model descriptor on ollama.com is incorrect, because it defaults to 16k. So I have to manually change that to 64/128k. I think you are talking about maximum context size.

by trvz10 hours ago|

parent|

prev|

[-]

No, the default context in Ollama varies by the memory available: https://docs.ollama.com/context-length