upvote
I think the idea is you sink the pretraining costs once and then you can distill multiple specialized models from that
reply
There used to be training methods like that but I think they've been phased out in favor of letting small models evolve by rewriting their own training material. Surprisingly that's actually cheaper.
reply