undefined

points

[-]

Yeah people are always like "these are just trick questions!" as though the correct mode of use for an LLM is quizzing it on things where the answer is already available. Where LLMs have the greatest potential to steer you wrong is when you ask something where the answer is not obvious, the question might be ill-formed, or the user is incorrectly convinced that something should be possible (or easy) when it isn't. Such cases have a lot more in common with these "nonsensical riddles" than they do with any possible frontier benchmark.

This is especially obvious when viewing the reasoning trace for models like Claude, which often spends a lot of time speculating about the user's "hints" and trying to parse out the intent of the user in asking the question. Essentially, the model I use for LLMs these days is to treat them as very good "test takers" which have limited open book access to a large swathe of the internet. They are trying to ace the test by any means necessary and love to take shortcuts to get there that don't require actual "reasoning" (which burns tokens and increases the context window, decreasing accuracy overall). For example, when asked to read a full paper, focusing on the implications for some particular problem, Claude agents will try to cheat by skimming until they get to a section that feels relevant, then searching directly for some words they read in that section. They will do this even if told explicitly that they must read the whole paper. I assume this is because the vast majority of the time, for the kinds of questions that they are trained on, this sort of behavior maximizes their reward function (though I'm sure I'm getting lots of details wrong about the way frontier models are trained, I find it very unlikely that the kinds of prompts that these agents get very closely resemble data found in the wild on the internet pre-LLMs).