upvote
Ultimately, you are describing a fundamental problem with induction -- Hume's problem of induction to be specific. How can we know that anything that has been shown empirically in the past will continue to be true - we can't. Best to investigate mechanistically:

I don't see why we would assume that we are at a plateau for RL. In many other settings, Go for instance, RL continues to scale until you reach compute limits. Some things are more easily RL'd than others, but ultimately this largely unlocks data. We are not yet compute/energy/physical world constrained. I think you would start observing clear changes in the world around you before that becomes a true bottleneck. Regardless, currently the vast majority of compute is used for inference not training so the compute overhang is large.

Assuming that we plateau at {insert current moment} seems wishful and I've already had this conversation any number of times on this exact forum at every level of capability [3.5, 4, o1, o3, 4.6/5.5, mythos] from Nov 2022 onwards.

reply
Since we're not experts, we treat it as a black box. What are the results? Is the quality of the results improving? Is the improvement accelerating or decelerating?

And the answer appears to be that the improvement is accelerating. So how could it be stopping?

https://metr.org/time-horizons/

reply