For MiniMax 2.7 - there were NaNs, but it wasn't just ours - all quant providers had it - we identified 38% of bartowski's had NaNs. Ours was 22%. We identified a fix, and have already fixed ours see https://www.reddit.com/r/LocalLLaMA/comments/1slk4di/minimax.... Bartowski has not, but is working on it. We share our investigations always.
For Qwen3.5 - we shared our 7TB research artifacts showing which layers not to quantize - all provider's quants were not optimal, not broken - ssm_out and ssm_* tensors were the issue - we're now the best in terms of KLD and disk space - see https://www.reddit.com/r/LocalLLaMA/comments/1rgel19/new_qwe...
On other fixes, we also fixed bugs in many OSS models like Gemma 1, Gemma 3, Llama chat template fixes, Mistral, and many more.
It might seem these issues are due to us, but it's because we publicize them and tell people to update. 95% of them are not related to us, but as good open source stewards, we should update everyone.
1. Split metadata into shard 0 for huge models so 10B is for chat template fixes - however sometimes fixes cause a recalculation of the imatrix, which means all quants have to be re-made
2. Add HF discussion posts on each model talking about what changed, and on our Reddit and Twitter
3. Hugging Face XET now has de-duplication downloading of shards, so generally redownloading 100GB models again should be much faster - it chunks 100GB into small chunks and hashes them, and only downloads the shards which have changed
Ideally the labs releasing the open models would work with Unsloth and the llama.cpp maintainers in advance to work out the bugs up front. That does sometimes happen, but not always.
We do get early access to nearly all models, and we do find the most pressing issues sometimes. But sadly some issues are really hard to find and diagnose :(
HF also provides SHA256 for eg https://huggingface.co/unsloth/MiniMax-M2.7-GGUF/blob/main/U... is 92986e39a0c0b5f12c2c9b6a811dad59e3317caaf1b7ad5c7f0d7d12abc4a6e8
But agreed it's probs better to place them in a table
The purist in me feels the 50GB chunks are a temporary artifact of Hugging Face's uploading requirements, and the authoritative model file should be the merged one. I am unable to articulate any practical reason why this matters.
Though chat templates seem like they need a better solution. So many issues, seems quite fragile.
For serious fixes, sadly we have to re-compute imatrix since the activation patterns have changed - this sadly makes the entire quant change a lot, hence you have to re-download :(
We try our best as model distributors to fix them on day 0 or 1, but 95% of issues aren't our issues - as you mentioned it's the chat template or runtime etc