Hacker News
new
past
comments
ask
show
jobs
points
by
lambda
21 hours ago
|
comments
by
bandrami
8 hours ago
|
next
[-]
I think the idea is you sink the pretraining costs once and then you can distill multiple specialized models from that
reply
by
spwa4
20 hours ago
|
prev
|
[-]
There used to be training methods like that but I think they've been phased out in favor of letting small models evolve by rewriting their own training material. Surprisingly that's actually cheaper.
reply