undefined

points

by lambda21 hours ago |

comments

by bandrami8 hours ago|

[-]

I think the idea is you sink the pretraining costs once and then you can distill multiple specialized models from that

by spwa420 hours ago|

prev|

[-]

There used to be training methods like that but I think they've been phased out in favor of letting small models evolve by rewriting their own training material. Surprisingly that's actually cheaper.