undefined

points

[-]

I just wanted to express gratitude to you guys, you do great work. However, it is a little annoying to have to redownload big models though and keeping up with the AI news and community sentiment is a full time job. I wish there was some mechanism somewhere (on your site or Huggingface or something) for displaying feedback or confidence in a model being "ready for general use" before kicking off 100+ GB model downloads.

by danielhanchen14 hours ago|

parent|

[-]

Hey thanks - yes agreed - for now we do:

1. Split metadata into shard 0 for huge models so 10B is for chat template fixes - however sometimes fixes cause a recalculation of the imatrix, which means all quants have to be re-made

2. Add HF discussion posts on each model talking about what changed, and on our Reddit and Twitter

3. Hugging Face XET now has de-duplication downloading of shards, so generally redownloading 100GB models again should be much faster - it chunks 100GB into small chunks and hashes them, and only downloads the shards which have changed

by ssrshh5 hours ago|

parent|

[-]

If you would know - is this also why LM Studio and Ollama model downloads often fail with a signature mismatch error?

by evilduck11 hours ago|

parent|

prev|

[-]

Ah thanks, I wasn't aware of #3, that should be a huge boon.

by CamperBob213 hours ago|

parent|

prev|

[-]

Best policy is to just wait a couple of weeks after a major model is released. It's frustrating to have to re-download tens or hundreds of GB every few days, but the quant producers have no choice but to release early and often if they want to maintain their reputation.

Ideally the labs releasing the open models would work with Unsloth and the llama.cpp maintainers in advance to work out the bugs up front. That does sometimes happen, but not always.

by danielhanchen12 hours ago|

parent|

[-]

Yep agreed at least 1 week is a good idea :)

We do get early access to nearly all models, and we do find the most pressing issues sometimes. But sadly some issues are really hard to find and diagnose :(

by sowbug15 hours ago|

prev|

[-]

Please publish sha256sums of the merged GGUFs in the model descriptions. Otherwise it's hard to tell if the version we have is the latest.

by danielhanchen15 hours ago|

parent|

[-]

Yep we can do that probs add a table - in general be post in discussions of model pages - for eg https://huggingface.co/unsloth/MiniMax-M2.7-GGUF/discussions...

HF also provides SHA256 for eg https://huggingface.co/unsloth/MiniMax-M2.7-GGUF/blob/main/U... is 92986e39a0c0b5f12c2c9b6a811dad59e3317caaf1b7ad5c7f0d7d12abc4a6e8

But agreed it's probs better to place them in a table

by sowbug14 hours ago|

parent|

[-]

Thanks! I know about HF's chunk checksums, but HF doesn't publish (or possibly even know) the merged checksums.

by danielhanchen14 hours ago|

parent|

[-]

Oh for multi files? Hmm ok let me check that out

by 10 hours ago|

parent|

prev|

[-]

deleted

by zargon13 hours ago|

parent|

prev|

[-]

Why do you merge the GGUFs? The 50 GB files are more manageable (IMO) and you can verify checksums as you say.

by sowbug11 hours ago|

parent|

[-]

I admit it's a habit that's probably weeks out of date. Earlier engines barfed on split GGUFs, but support is a lot better now. Frontends didn't always infer the model name correctly from the first chunk's filename, but once llama.cpp added the models.ini feature, that objection went away.

The purist in me feels the 50GB chunks are a temporary artifact of Hugging Face's uploading requirements, and the authoritative model file should be the merged one. I am unable to articulate any practical reason why this matters.

by solomatov8 hours ago|

prev|

[-]

Just curious, the fixes are not about weights but about templates, am I right?

by magicalhippo11 hours ago|

prev|

[-]

Appreciate the work of your team very much.

Though chat templates seem like they need a better solution. So many issues, seems quite fragile.

by dist-epoch14 hours ago|

prev|

[-]

What do you think about creating a tool which can just patch the template embedded in the .gguf file instead of forcing a re-download? The whole file hash can be checked afterwards.

by danielhanchen14 hours ago|

parent|

[-]

Sadly it's not always chat template fixes :( But yes we now split the first shard as pure metadata (10MB) for huge models - these include the chat template etc - so you only need to download that.

For serious fixes, sadly we have to re-compute imatrix since the activation patterns have changed - this sadly makes the entire quant change a lot, hence you have to re-download :(