1) Do models generalize?
2) If they do, and they generalize from this, is that a win?
Chollet was one of the first “they do not generalize” evangelists. I’d be curious to hear what he thinks now, because a) most disagree with him, and b) this test seems designed to get models that can generalize better at visual long context problem solving and agency, exactly where the bleeding edge is right now for needs with agentic systems.
+ generalize being the key word.