Although I don't think they actually "know" it. This particular trick question will be in the bank just like the seahorse emoji or how many Rs in strawberry. Did they start reasoning and generalising better or did the publishing of the "trick" and the discourse around it paper over the gap?
I wonder if in the future we will trade these AI tells like 0days, keeping them secret so they don't get patched out at the next model update.
They won’t get this specific question wrong again; but also they generalise, once they have sufficient examples. Patching out a single failure doesn’t do it. Patch out ten equivalent ones, and the eleventh doesn’t happen.
"Well, you need your car to be at the car wash in order to wash it, right?"