undefined

points

[-]

I guess I mean that you're projecting anthropomorphization. When I see people sharing examples that the model answered wrong, I'm not interpreting that they think it "didn't know" the answer. Rather, they're reproducing the error. Most simple questions the models will get right nearly every time, so showing a failure is useful data.