upvote
MTP models share internal state with the main model, and also refer to parameters in the model.

They are more like a single model that has two separate attention head mechanisms.

reply
I mean just like GGUFs aren't technically necessary yet are _way_ more convenient than using Safetensors and configuring the default Jinja prompt by-hand, it makes sense to bundle the draft model too. For all intents and purposes, the only people who will train a draft model are the people who train the original model
reply