Hacker News
new
past
comments
ask
show
jobs
points
by
andai
19 hours ago
|
comments
by
hgoel
1 hours ago
|
next
[-]
Wouldn't the model start overfitting at some point? Degrading generalization for accuracy on the training set.
reply
by
Ifkaluva
17 hours ago
|
prev
|
[-]
You’d think so, but I haven’t seen it explicitly discussed in their papers, and nobody else that I know of trains on that many tokens
reply