upvote
After a quick content browse, my understanding is this is more like with a very compressed diff vector, applied to a multi billion parameter model, the models could be 'retrained' to reason (score) better on a specific topic , e.g. math was used in the paper
reply
It's the statistics equivalent of 'no one needs more than 640kb of RAM'
reply
speak for yourself!
reply
reasoning capability might just be some specific combinations of mirror neurons.

even some advanced math usually evolves applying patterns found elsewhere into new topics

reply
I agree, I don't think gradient descent is going to work in the long run for the kind of luxurious & automated communist utopia the technocrats are promising everyone.
reply