ARC-AGI-3 has a nasty combo of spatial reasoning + explore/exploit. It's basically adversarial vs current AIs.
You would need to check to see if everyone is having mistakes on the same 20% or different 20%. If its the same 20% either those questions are really hard, or they are keyed incorrectly, or they aren't stated with enough context to actually solve the problem.
It happens. Old MMLU non pro had a lot of wrong answers. Simple things like MNIST have digits labeled incorrect or drawn so badly its not even a digit anymore.
Arc-AGI score isn't correlated with anything useful.
It's also interesting because it's very very hard for base LLMs, even if you try to "cheat" by training on millions of ARC-like problems. Reasoning LLMs show genuine improvement on this type of problem.
>can u make the progm for helps that with what in need for shpping good cheap products that will display them on screen and have me let the best one to get so that i can quickly hav it at home
And get back an automatic coupon code app like the user actually wanted.