upvote
Either we will be expecting the models to compress whole wikipedia and stale on the size reduction, or focus on the reasoning capabilities. My intuition is that by forcing models to remember everything we are wasting parameter space which can be allocated for more abstract thinking.
reply
Integrating tool use into the training process should fix this.

Rather than learn about President Lincoln, the model can learn to look that info up with a search tool and use it to get better answers.

Just like a human does. I don't learn what 76x35 is... I learn that a calculator can give me that answer so I don't need to memorize it.

reply
[dead]
reply