The second question sounds like a useless and artificial metric to judge on. The average person might miss such a “gotcha” logical quiz too, for the same reason - because they expect to be asked “is it walking distance.”
No one has ever relied on anyone else’s judgment, nor an AI, to answer “should I bring my car to the carwash.” Same for the ol’ “how many rocks shall I eat?” that people got the AI Overview tricked with.
I’m not saying anything categorically “is AGI” but by relying on jokes like this you’re lying to yourself about what’s relevant.
In my experience, they contain more information than any human but they are actually quite stupid. Reasoning is not something they do well at all. But even if I skip that, they can not learn. Inference is separate from training, so they can not learn new things other than trying to work with words in a context window, and even then they will only be able to mimic rather than extrapolate anything new.
It's not the lack of perfect, it's the lack of reasoning and learning.
I've seen a lot of reasoning in the latest models while engaging in agentic coding. It is often decent at debugging and experimentational, but around 30% it goes does wrong paths and just adds unnecessary complexity via misdiagnoses.