Ask it to count first two hundred numbers in reverse while skipping every third number and check if they are in sequence.
Check the car wash examples on YouTube.
And this logic flow only proves that no AI is a human intelligence. It doesn't disprove the intelligence part.
Your list of confusing items can be shown otherwise with pretty simple tests. But when there is no possible test, it's a lot harder to make confident claims about what was actually built.
Would you claim that relativity disproves aether theory? Because it doesn't really. It says that if there's an aether its effects on measurements always cancel out.