- Safetensors: https://huggingface.co/google/gemma-4-26B-A4B-it-qat-q4_0-un...
- GGUF: https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF/tree/...
Note the README in the Unsloth list of files: llama.cpp is working on a PR to support the gemma4 drafters: https://github.com/ggml-org/llama.cpp/pull/23398. Also note the PR submitter didn't experience much speedup with 26B (seems typical that MoE models don't generally benefit from MTP).
https://huggingface.co/google/gemma-4-26B-A4B-it-qat-q4_0-un...
(Pardon my ignorance; this stuff moves so fast)
https://point.free/blog/gemma-4-on-a-2016-xeon/
Xeon, but could be useful for MTP on Mac.
I do have the Qwen 3.6 (35B) MTP implementation running (in LM Studio; it doesn't need a separate drafter), along with non-MTP Gemma 4 26B, and I can see that Unsloth Studio can run the new QAT, but I can't see how you can run the assistant/drafter. Yet.
It's just a constantly changing landscape. Don't get me wrong, it's fascinating and for various reasons I am pleased I can keep up even slightly, but eeeehhh :-)
1) Gemma 4 MTP is too fresh for off-the-shelf software to use anyway
2) "you can convert them yourself" which is fine, obvs