undefined

points

by WhiteDawn1 days ago |

comments

by pfheatwole20 hours ago|

[-]

Models:

- Safetensors: https://huggingface.co/google/gemma-4-26B-A4B-it-qat-q4_0-un...

- GGUF: https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF/tree/...

Note the README in the Unsloth list of files: llama.cpp is working on a PR to support the gemma4 drafters: https://github.com/ggml-org/llama.cpp/pull/23398. Also note the PR submitter didn't experience much speedup with 26B (seems typical that MoE models don't generally benefit from MTP).

by dist-epoch1 days ago|

prev|

[-]

Google already did

https://huggingface.co/google/gemma-4-26B-A4B-it-qat-q4_0-un...

by dofm1 days ago|

parent|

[-]

This is safetensors. Is there any way to run these on a Mac paired with the MLX QAT?

(Pardon my ignorance; this stuff moves so fast)

by thangalin21 hours ago|

parent|

[-]

Did you see this?

https://point.free/blog/gemma-4-on-a-2016-xeon/

Xeon, but could be useful for MTP on Mac.

by dofm21 hours ago|

parent|

[-]

I hadn't seen this, thanks.

I do have the Qwen 3.6 (35B) MTP implementation running (in LM Studio; it doesn't need a separate drafter), along with non-MTP Gemma 4 26B, and I can see that Unsloth Studio can run the new QAT, but I can't see how you can run the assistant/drafter. Yet.

It's just a constantly changing landscape. Don't get me wrong, it's fascinating and for various reasons I am pleased I can keep up even slightly, but eeeehhh :-)

by int_19h19 hours ago|

parent|

prev|

[-]

https://huggingface.co/lmstudio-community/gemma-4-26B-A4B-it...

by dofm13 hours ago|

parent|

[-]

Yeah — that is the base QAT model, and there are safetensors weights for the QAT version of the MTP drafter, but there are no MLX/GGUF versions. I think the answer is a combination of:

1) Gemma 4 MTP is too fresh for off-the-shelf software to use anyway

2) "you can convert them yourself" which is fine, obvs