upvote
Future models know it now, assuming they suck in mastodon and/or hacker news.

Although I don't think they actually "know" it. This particular trick question will be in the bank just like the seahorse emoji or how many Rs in strawberry. Did they start reasoning and generalising better or did the publishing of the "trick" and the discourse around it paper over the gap?

I wonder if in the future we will trade these AI tells like 0days, keeping them secret so they don't get patched out at the next model update.

reply
The answer can be “both”.

They won’t get this specific question wrong again; but also they generalise, once they have sufficient examples. Patching out a single failure doesn’t do it. Patch out ten equivalent ones, and the eleventh doesn’t happen.

reply
Yeah, the interpolation works if there are enough close examples around it. Problem is that the dimensionality of the space you are trying to interpolate in is so incomprehensibly big that even training on all of the internet, you are always going to have stuff that just doesn't have samples close by.
reply
Even I don’t “know” how many “R”s there are in “strawberry”. I don’t keep that information in my brain. What I do keep is the spelling of the word “strawberry” and the skill of being able to count so that I can derive the answer to that question anytime I need.
reply
Right. The equivalent here, for this problem, would be something like asking for context. And the LLM response should've been:

"Well, you need your car to be at the car wash in order to wash it, right?"

reply
For many words I can't say the number of each letters but I only have an abstract memory of how they look so when I write say "strawbery" I just realize it looks odd and correct it.
reply