Hacker News
new
past
comments
ask
show
jobs
points
by
jacquesm
22 hours ago
|
comments
by
serendip-ml
18 hours ago
|
[-]
The compression analogy is interesting. Another way of looking at it could be fine-tuning as "knowing what to leave out" - a 3B model for example tuned for a narrow task doesn't need the capacity that makes 70B good at many things.
reply