undefined

upvote

points

by Ifkaluva20 hours ago |

upvote

by andai19 hours ago|

[-]

What's the downside? Don't they stop when they hit diminishing returns?

reply

upvote

by hgoel1 hours ago|

[-]

Wouldn't the model start overfitting at some point? Degrading generalization for accuracy on the training set.

reply

upvote

by Ifkaluva17 hours ago|

[-]

You’d think so, but I haven’t seen it explicitly discussed in their papers, and nobody else that I know of trains on that many tokens

reply