upvote
See chinchilla scaling laws, we have the functional form of the curve and know the constants (though they change and are domain and model specific):

L(N,D) ~= 1.69 + 406 / N^0.339 + 411 / D^0.285

L is loss (pre training test loss) D is the scale of the data N is the number of model parameters

reply
You need to touch grass dude, seriously.
reply
Why deflect from the conversation and attempt to insult someone? What I’m saying is literally canonical and extremely well known literature.
reply