upvote
This interview https://youtu.be/oWOz2htozfI?si=qdQ0uZRoZOYeThOn from 2 days ago with a top researcher from OpenAI directly addresses the bitter lesson argument and the importance of scaling for the history of their models.
reply
Isn't the bitter lesson basically the same as "The Unreasonable Effectiveness of Data" from 2009?
reply
not exactly, bitter lesson is one meta-level up from "scale eats everything". this is a common misunderstanding of bitter lesson that rich sutton has been fighting ever since the thing was written. in rich's own words[1], the modern summary is

> Don’t be distracted by human knowledge, as AI has been historically.

> Instead focus on methods for creating knowledge that scale with computation, like search and learning.

so the lesson is choose methods that scale with computation, not just that blindly scaling up anything (data, params, people, whatever) works, it is choosing the right x axis and the right scaling laws consistently wins out in the long run despite short term wins from other methods.

1: https://x.com/RichardSSutton/status/2056419165502935198

reply