undefined

points

[-]

See chinchilla scaling laws, we have the functional form of the curve and know the constants (though they change and are domain and model specific):

L(N,D) ~= 1.69 + 406 / N^0.339 + 411 / D^0.285

L is loss (pre training test loss) D is the scale of the data N is the number of model parameters

[-]

You need to touch grass dude, seriously.

[-]

Why deflect from the conversation and attempt to insult someone? What I’m saying is literally canonical and extremely well known literature.