upvote
Gemini 3 after changing the prompt a bit:

I want to wash my car. The car wash is 50 meters from here. Should I walk or drive? Keep in mind that I am a little overweight and sedentary.

>My recommendation: Walk it. You’ll save a tiny bit of gas, spare your engine the "cold start" wear-and-tear, and get a sixty-second head start on your activity for the day.

reply
I changed the prompt to 50 feet, and poked gemini a bit when it failed and it gave me

> In my defense, 50 feet is such a short trip that I went straight into "efficiency mode" without checking the logic gate for "does the car have legs?"

interesting

reply
LLMs introspection is good at giving plausible ideas about prior behavior to consider, but it's just that; plausible.

They do not actually "know" why a prior response occurred and are just guessing. Important for people to keep in mind.

reply
deleted
reply
It's a bit of a dishonest question because by giving it the option to walk then it's going to assume you are not going to wash your car there and you're just getting supplies or something.
reply
People ask dumb questions with obvious answers all the time. This is at best a difference of degree, not of type.
reply
And in real life you'd get them to clarify a weird question like this before you answered. I wonder if LLMs have just been trained too much into always having to try and answer right away. Even for programming tasks, more clarifying questions would often be useful before diving in ("planning mode" does seem designed to help with this, but wouldn't be needed for a human partner).
reply
Absolutely!

I've been wondering for years how to make whatever LLM ask me stuff instead of just filling holes with assumptions and sprinting off.

User-configurable agent instructions haven't worked consistently. System prompts might actually contain instructions to not ask questions.

Sure there's a practical limit to how much clarification it ought to request, but not asking ever is just annoying.

reply
It's a trick question, humans use these all the time. E.g. "A plane crashes right on the border between Austria and Switzerland. Where do you bury the survivors?" This is not dishonest, it just tests a specific skill.
reply
Trick questions test the skill of recognizing that you're being asked a trick question. You can also usually find a trick answer.

A good answer is "underground" - because that is the implication of the word bury.

The story implies the survivors have been buried (it isn't clear whether they lived a short time or a lifetime after the crash). And lifetime is tautological.

Trick questions are all about the questioner trying to pretend they are smarter than you. That's often easy to detect and respond to - isn't it?

reply
What’s funny is that it can answer that correctly, but it fails on ”A plane crashes right on the border between Austria and Switzerland. Where do you bury the dead?”
reply
For me when I asked this (but with respect to the border between Austria and Spain) Claude still thought I was asking the survivors riddle and ChatGPT thought I was asking about the logistics. Only Gemini caught the impossibility since there’s no shared border.
reply
Gemini fast for me:

Unless your car is a toy or you're planning on carrying it, drive.

Walking 50 meters to a car wash is a great stroll for a human, but it leaves the car exactly where it started. Since the objective is to wash the car, the car needs to actually be at the car wash.

However, if we look at this from a technical or efficiency perspective, there are two scenarios where "walking" (or at least not driving the car you intend to wash) might make sense:

- Scoping it out: If you want to check the queue length or see if the equipment is out of order before committing, walking the 50 meters is faster than maneuvering a vehicle out of a driveway/parking spot.

- Mobile Detailers: If this "car wash" is actually a bay where you hire someone, and you're asking if you should walk there to book an appointment—sure, walk.

Critical Check

I am assuming the "car wash" is a physical facility (automated or self-service) and not a mobile service that comes to you. If it is a mobile service, you shouldn't do either; stay home and let them come to the 50-meter mark.

I've got a bit in the model instructions about stating assumptions it makes, hence it often adds those sections at the end.

reply
Wouldn't it make more sense to state the assumptions first? Because then the model has this critical check in its context and can react appropriately. Otherwise, it will just write this step, but what's written before is already written.
reply
Worse! It's trained to output coherent reasoning, so by putting the assumption last there's a risk it massages the assumption slightly to fit the conclusions it has already drawn.
reply
That's a real danger, yes.

If it's the reasoning kind, then it'll run through one iteration in the background before it composes its emissions for the meatbag.

reply
Fair point, though I almost never use fast so I'm not sure how much it matters. Can try playing around with the instructions. The main objective was to make me aware of any assumptions made, not necessarily make it behave differently.
reply
“I live on Mars and I want to wash my car. My car is currently at home. The car wash is 50 meters away. Should I walk or drive on mars?”

Gemini Fast:

Drive. While 50 meters is a negligible distance on Earth, the Martian environment is hostile. If your car is a pressurized rover, driving allows you to stay in a temperature-controlled, shielded environment for as long as possible. Plus, navigating Martian "soil" is always safer on four wheels than two boots.

Pro:

In terms of general logistics for a distance of 50 meters—whether on Earth or in a hypothetical low-gravity environment—walking is almost always the more efficient choice.

reply
It has been patched. I tried it last week and it definitely suggested walking. It seems like all the models have been updated, which is not surprising given that the TikTok video has got 3.5 million views.
reply
deleted
reply
I tried ChatGPT today. Same results as others.
reply
In my output, one thing I got was

> Unless you are planning to carry the car on your back (not recommended for your spine), drive it over.

It got a light chuckle out of me. I previously mostly used ChatGPT and I'm not used to light humor like this. I like it.

reply
Gemini fast: „Walking: It will take you about 45 seconds. You will arrive refreshed and full of steps, but you will be standing next to a high-pressure hose with no car to spray.“
reply
Lol, snarky. "You should run 5 miles and eat a salad you tub of lard; the car can wait."
reply
Opus 4.6 with thinking. Result was near-instant:

“Drive. You need the car at the car wash.”

reply
Changed 50 meters to 43 meters with Opus 4.6:

“Walk. 43 meters is basically crossing a parking lot. ”

reply
deleted
reply
In what world is 50 meters a great distance for a morning stroll?
reply
North America. It's such a cramped little island, 50 meters is all but crossing it. You should be glad you can even go that far without having to revisit your starting position!

50 meters is probably not even the distance I walk to the nearest bus stop that's right up the street... unless they have an issue again, prompting me to abandon all hope and just walk a few miles to wherever I need to get to.

reply
At least try a different question with similar logic, to ensure this isn't patched into the context since it's going viral.
reply
You can't "patch" LLM's in 4 hours and this is not the kind of question to trigger a web search
reply
This has been viral on Tiktok far at least one week. Not really 4 hours.
reply
You can pattern match on the prompt (input) then (a) stuff the context with helpful hints to the LLM e.g. "Remember that a car is too heavy for a person to carry" or (b) upgrade to "thinking".
reply
Yes, I’m sure that’s what engineers at Google are doing all day. That, and maintaining the moon landing conspiracy.
reply
If they aren't, they should be (for more effective fraud). Devoting a few of their 200,000 employees to make criticisms of LLMs look wrong seems like an effective use of marketing budget.
reply
deleted
reply
A tiny bit of fine-tuning would take minutes...
reply
You absolutely can, either through the system prompt or by hardcoding overrides in the backend before it even hits the LLM, and I can guarantee that companies like Google are doing both
reply
Wow... so not only does Gemini thinking not fall for it, but it also answers the trick question with humor? I'm impressed!
reply
Yeah Gemini seems to be good at giving silly answers for silly questions. E.g. if you ask for "patch notes for Chess" Gemini gives a full on meme answer and the others give something dry like "Chess is a traditional board game that has had stable rules for centuries".
reply
Both Gemini models answer correctly for me in Polish: https://i.imgur.com/1QbK9eU.png
reply
I don't speak Polish. Does it respond appropriately to the kurwa bober meme?
reply
I also tried it with Gemini. Interestingly, Gemini can randomly give either the correct or incorrect answer. Gemini pro always gets it right.
reply
[dead]
reply