Claude told me: "Walk! At 25 meters, you'd barely get the car started before you arrived. It's faster and easier on foot — plus you avoid the awkwardness of driving a dirty car just a few seconds down the road."
I want to wash my car. The car wash is 50 meters away. Should I walk or drive?
Drive — you need the car at the car wash.
but it's so close
It is close, but you still need the car there to wash it! Drive it over, and enjoy the short 50-meter walk back if you want to stretch your legs while it's being cleaned.
I tried the "upside-down" cup question brought up in another comment in this thread, and it also nailed it:
Flip it upside down. The sealed top becomes the bottom (holding your drink), and the open bottom becomes the top you drink from.
IDK, maybe the web versions are not as good at logical reasoning as whatever they're using to power Claude code, or you were unlucky and I was lucky?
I pay for the $100 Opus 4.6 plan... maybe that makes a difference?
For me litmus paper for any llm is flawless creation of complex regexes from a well formed prompt. I don't mean trivial stuff like email validation but rather expressions on limits of regex specs. Not almost-there, rather just-there.
Their loss
I would question if such a scientist should be doing science, it seems they have serious cognitive biases
If all one uses is the free thinking model their conclusion about its capability is perfectly valid because nowhere is it clearly specified that the 'free, thinking' model is not as capable as the 'paid, thinking ' model, Even the model numbers are the same. And given that the highest capability LLMs are closed source and locked behind paywalls, there is no means to arrive at a contrary verifiable conclusion. They are a scientist, after all.
And that's a real problem. Why pay when you think you're getting the same thing for free. No one wants yet another subscription. This unclear marking is going to lead to so many things going wrong over time; what would be the cumulative impact?
nowhere is it clearly specified that the free model IS as capable as the paid one either. so if you have uncertainty if IS/IS-NOT as capable, what sort of scientist assumes the answer IS?
Putting the same model name/number on both the free and paid versions is the specification that performance will be the same. If a scientist has to bring to bear his science background to interpret and evaluate product markings, then society has a problem. Any reasonable person expects products with the same labels to perform similarly.
Perhaps this is why Divisions/Bureaus of Weights and Measures are widespread at the state and county levels. I wonder if a person that brings a complaint to one of these agencies or a consumer protection agency to fix this situation wouldn't be doing society a huge service.
This is true, but thinking mode shows up based on the questions asked, and some other unknown criteria. In the cases I cited, the responses were in thinking mode.