undefined

points

[-]

They would score much worse on the private set than the public set. And they haven't done this for any of the other ARC-AGI benchmarks, so why would they do it for this one?

by vessenes8 hours ago|

prev|

[-]

Wrong question. I suggest:

1) Do models generalize?

2) If they do, and they generalize from this, is that a win?

Chollet was one of the first “they do not generalize” evangelists. I’d be curious to hear what he thinks now, because a) most disagree with him, and b) this test seems designed to get models that can generalize better at visual long context problem solving and agency, exactly where the bleeding edge is right now for needs with agentic systems.

by MadxX795 hours ago|

parent|

[-]

Yeah, so you are agreeing that the benchmarks are useless because they don't answer those questions.

by daveguy3 hours ago|

parent|

prev|

[-]

Can AI models generalize+ at any long context problem solving and agency regardless of modality? I think the answer is no, and this is why they are not yet AGI.

+ generalize being the key word.