I’ve seen "thinking models" go off the rails trying to deduce what to do with ten items and being asked for the best of 9.
[1]: the reality of the situation is subtle internal inconsistencies in the prompt can really confuse it. It is an entertaining bug in AI pipelines, but it can end up costing you a ton of money.