upvote
I just wanted to express gratitude to you guys, you do great work. However, it is a little annoying to have to redownload big models though and keeping up with the AI news and community sentiment is a full time job. I wish there was some mechanism somewhere (on your site or Huggingface or something) for displaying feedback or confidence in a model being "ready for general use" before kicking off 100+ GB model downloads.
reply
Hey thanks - yes agreed - for now we do:

1. Split metadata into shard 0 for huge models so 10B is for chat template fixes - however sometimes fixes cause a recalculation of the imatrix, which means all quants have to be re-made

2. Add HF discussion posts on each model talking about what changed, and on our Reddit and Twitter

3. Hugging Face XET now has de-duplication downloading of shards, so generally redownloading 100GB models again should be much faster - it chunks 100GB into small chunks and hashes them, and only downloads the shards which have changed

reply
If you would know - is this also why LM Studio and Ollama model downloads often fail with a signature mismatch error?
reply
Ah thanks, I wasn't aware of #3, that should be a huge boon.
reply
Best policy is to just wait a couple of weeks after a major model is released. It's frustrating to have to re-download tens or hundreds of GB every few days, but the quant producers have no choice but to release early and often if they want to maintain their reputation.

Ideally the labs releasing the open models would work with Unsloth and the llama.cpp maintainers in advance to work out the bugs up front. That does sometimes happen, but not always.

reply
Yep agreed at least 1 week is a good idea :)

We do get early access to nearly all models, and we do find the most pressing issues sometimes. But sadly some issues are really hard to find and diagnose :(

reply
Please publish sha256sums of the merged GGUFs in the model descriptions. Otherwise it's hard to tell if the version we have is the latest.
reply
Yep we can do that probs add a table - in general be post in discussions of model pages - for eg https://huggingface.co/unsloth/MiniMax-M2.7-GGUF/discussions...

HF also provides SHA256 for eg https://huggingface.co/unsloth/MiniMax-M2.7-GGUF/blob/main/U... is 92986e39a0c0b5f12c2c9b6a811dad59e3317caaf1b7ad5c7f0d7d12abc4a6e8

But agreed it's probs better to place them in a table

reply
Thanks! I know about HF's chunk checksums, but HF doesn't publish (or possibly even know) the merged checksums.
reply
Oh for multi files? Hmm ok let me check that out
reply
deleted
reply
Why do you merge the GGUFs? The 50 GB files are more manageable (IMO) and you can verify checksums as you say.
reply
I admit it's a habit that's probably weeks out of date. Earlier engines barfed on split GGUFs, but support is a lot better now. Frontends didn't always infer the model name correctly from the first chunk's filename, but once llama.cpp added the models.ini feature, that objection went away.

The purist in me feels the 50GB chunks are a temporary artifact of Hugging Face's uploading requirements, and the authoritative model file should be the merged one. I am unable to articulate any practical reason why this matters.

reply
Just curious, the fixes are not about weights but about templates, am I right?
reply
Appreciate the work of your team very much.

Though chat templates seem like they need a better solution. So many issues, seems quite fragile.

reply
What do you think about creating a tool which can just patch the template embedded in the .gguf file instead of forcing a re-download? The whole file hash can be checked afterwards.
reply
Sadly it's not always chat template fixes :( But yes we now split the first shard as pure metadata (10MB) for huge models - these include the chat template etc - so you only need to download that.

For serious fixes, sadly we have to re-compute imatrix since the activation patterns have changed - this sadly makes the entire quant change a lot, hence you have to re-download :(

reply