undefined

points

by spijdar2 hours ago |

comments

by DiabloD31 hours ago|

[-]

MTP models share internal state with the main model, and also refer to parameters in the model.

They are more like a single model that has two separate attention head mechanisms.

by anaisbetts2 hours ago|

prev|

[-]

I mean just like GGUFs aren't technically necessary yet are _way_ more convenient than using Safetensors and configuring the default Jinja prompt by-hand, it makes sense to bundle the draft model too. For all intents and purposes, the only people who will train a draft model are the people who train the original model