upvote
I doubt you can do that. MTP magic happens because for texts, we have a lot of low value fixed tokens that almost always get generated in the sequence (like punctuation, function words, language keywords etc). for most important ones (the entities, the content words, variables) you still need the full model.

so there is alwasy a maximum limit for how well MTP can do.

reply