undefined

points

[-]

Gemini 3 after changing the prompt a bit:

I want to wash my car. The car wash is 50 meters from here. Should I walk or drive? Keep in mind that I am a little overweight and sedentary.

>My recommendation: Walk it. You’ll save a tiny bit of gas, spare your engine the "cold start" wear-and-tear, and get a sixty-second head start on your activity for the day.

by elSidCampeador5 hours ago|

parent|

[-]

I changed the prompt to 50 feet, and poked gemini a bit when it failed and it gave me

> In my defense, 50 feet is such a short trip that I went straight into "efficiency mode" without checking the logic gate for "does the car have legs?"

interesting

by Rapzid41 minutes ago|

parent|

[-]

LLMs introspection is good at giving plausible ideas about prior behavior to consider, but it's just that; plausible.

They do not actually "know" why a prior response occurred and are just guessing. Important for people to keep in mind.

by 8 hours ago|

parent|

prev|

[-]

deleted

by weird-eye-issue11 hours ago|

parent|

prev|

[-]

It's a bit of a dishonest question because by giving it the option to walk then it's going to assume you are not going to wash your car there and you're just getting supplies or something.

by PessimalDecimal10 hours ago|

parent|

[-]

People ask dumb questions with obvious answers all the time. This is at best a difference of degree, not of type.

by Nition4 hours ago|

parent|

[-]

And in real life you'd get them to clarify a weird question like this before you answered. I wonder if LLMs have just been trained too much into always having to try and answer right away. Even for programming tasks, more clarifying questions would often be useful before diving in ("planning mode" does seem designed to help with this, but wouldn't be needed for a human partner).

by cracki50 minutes ago|

parent|

[-]

Absolutely!

I've been wondering for years how to make whatever LLM ask me stuff instead of just filling holes with assumptions and sprinting off.

User-configurable agent instructions haven't worked consistently. System prompts might actually contain instructions to not ask questions.

Sure there's a practical limit to how much clarification it ought to request, but not asking ever is just annoying.

by x3ro11 hours ago|

parent|

prev|

[-]

It's a trick question, humans use these all the time. E.g. "A plane crashes right on the border between Austria and Switzerland. Where do you bury the survivors?" This is not dishonest, it just tests a specific skill.

by robocat3 hours ago|

parent|

[-]

Trick questions test the skill of recognizing that you're being asked a trick question. You can also usually find a trick answer.

A good answer is "underground" - because that is the implication of the word bury.

The story implies the survivors have been buried (it isn't clear whether they lived a short time or a lifetime after the crash). And lifetime is tautological.

Trick questions are all about the questioner trying to pretend they are smarter than you. That's often easy to detect and respond to - isn't it?

by lamonade10 hours ago|

parent|

prev|

[-]

What’s funny is that it can answer that correctly, but it fails on ”A plane crashes right on the border between Austria and Switzerland. Where do you bury the dead?”

by vlovich1235 hours ago|

parent|

[-]

For me when I asked this (but with respect to the border between Austria and Spain) Claude still thought I was asking the survivors riddle and ChatGPT thought I was asking about the logistics. Only Gemini caught the impossibility since there’s no shared border.

by magicalhippo16 hours ago|

prev|

[-]

Gemini fast for me:

Unless your car is a toy or you're planning on carrying it, drive.

Walking 50 meters to a car wash is a great stroll for a human, but it leaves the car exactly where it started. Since the objective is to wash the car, the car needs to actually be at the car wash.

However, if we look at this from a technical or efficiency perspective, there are two scenarios where "walking" (or at least not driving the car you intend to wash) might make sense:

- Scoping it out: If you want to check the queue length or see if the equipment is out of order before committing, walking the 50 meters is faster than maneuvering a vehicle out of a driveway/parking spot.

- Mobile Detailers: If this "car wash" is actually a bay where you hire someone, and you're asking if you should walk there to book an appointment—sure, walk.

Critical Check

I am assuming the "car wash" is a physical facility (automated or self-service) and not a mobile service that comes to you. If it is a mobile service, you shouldn't do either; stay home and let them come to the 50-meter mark.

I've got a bit in the model instructions about stating assumptions it makes, hence it often adds those sections at the end.

by TobTobXX15 hours ago|

parent|

[-]

Wouldn't it make more sense to state the assumptions first? Because then the model has this critical check in its context and can react appropriately. Otherwise, it will just write this step, but what's written before is already written.

by kqr15 hours ago|

parent|

[-]

Worse! It's trained to output coherent reasoning, so by putting the assumption last there's a risk it massages the assumption slightly to fit the conclusions it has already drawn.

by cracki48 minutes ago|

parent|

[-]

That's a real danger, yes.

If it's the reasoning kind, then it'll run through one iteration in the background before it composes its emissions for the meatbag.

by magicalhippo15 hours ago|

parent|

prev|

[-]

Fair point, though I almost never use fast so I'm not sure how much it matters. Can try playing around with the instructions. The main objective was to make me aware of any assumptions made, not necessarily make it behave differently.

by geraneum14 hours ago|

prev|

[-]

“I live on Mars and I want to wash my car. My car is currently at home. The car wash is 50 meters away. Should I walk or drive on mars?”

Gemini Fast:

Drive. While 50 meters is a negligible distance on Earth, the Martian environment is hostile. If your car is a pressurized rover, driving allows you to stay in a temperature-controlled, shielded environment for as long as possible. Plus, navigating Martian "soil" is always safer on four wheels than two boots.

Pro:

In terms of general logistics for a distance of 50 meters—whether on Earth or in a hypothetical low-gravity environment—walking is almost always the more efficient choice.

by tlogan11 hours ago|

prev|

[-]

It has been patched. I tried it last week and it definitely suggested walking. It seems like all the models have been updated, which is not surprising given that the TikTok video has got 3.5 million views.

by 45 minutes ago|

parent|

[-]

deleted

by kshacker7 hours ago|

parent|

prev|

[-]

I tried ChatGPT today. Same results as others.

by karamanolev16 hours ago|

prev|

[-]

In my output, one thing I got was

> Unless you are planning to carry the car on your back (not recommended for your spine), drive it over.

It got a light chuckle out of me. I previously mostly used ChatGPT and I'm not used to light humor like this. I like it.

by pfalke14 hours ago|

prev|

[-]

Gemini fast: „Walking: It will take you about 45 seconds. You will arrive refreshed and full of steps, but you will be standing next to a high-pressure hose with no car to spray.“

by Rapzid36 minutes ago|

parent|

[-]

Lol, snarky. "You should run 5 miles and eat a salad you tub of lard; the car can wait."

by jen729w16 hours ago|

prev|

[-]

Opus 4.6 with thinking. Result was near-instant:

“Drive. You need the car at the car wash.”

by cobolexpert15 hours ago|

parent|

[-]

Changed 50 meters to 43 meters with Opus 4.6:

“Walk. 43 meters is basically crossing a parking lot. ”

by 14 hours ago|

parent|

[-]

deleted

by jacquesm15 hours ago|

prev|

[-]

In what world is 50 meters a great distance for a morning stroll?

by cracki43 minutes ago|

parent|

[-]

North America. It's such a cramped little island, 50 meters is all but crossing it. You should be glad you can even go that far without having to revisit your starting position!

50 meters is probably not even the distance I walk to the nearest bus stop that's right up the street... unless they have an issue again, prompting me to abandon all hope and just walk a few miles to wherever I need to get to.

by clktmr16 hours ago|

prev|

[-]

At least try a different question with similar logic, to ensure this isn't patched into the context since it's going viral.

by j_maffe16 hours ago|

parent|

[-]

You can't "patch" LLM's in 4 hours and this is not the kind of question to trigger a web search

by tlogan12 hours ago|

parent|

[-]

This has been viral on Tiktok far at least one week. Not really 4 hours.

by nroets15 hours ago|

parent|

prev|

[-]

You can pattern match on the prompt (input) then (a) stuff the context with helpful hints to the LLM e.g. "Remember that a car is too heavy for a person to carry" or (b) upgrade to "thinking".

by throwuxiytayq15 hours ago|

parent|

[-]

Yes, I’m sure that’s what engineers at Google are doing all day. That, and maintaining the moon landing conspiracy.

by anonymous_user913 hours ago|

parent|

[-]

If they aren't, they should be (for more effective fraud). Devoting a few of their 200,000 employees to make criticisms of LLMs look wrong seems like an effective use of marketing budget.

by 11 hours ago|

parent|

prev|

[-]

deleted

by londons_explore15 hours ago|

parent|

prev|

[-]

A tiny bit of fine-tuning would take minutes...

by vimda12 hours ago|

parent|

prev|

[-]

You absolutely can, either through the system prompt or by hardcoding overrides in the backend before it even hits the LLM, and I can guarantee that companies like Google are doing both

by rob7416 hours ago|

prev|

[-]

Wow... so not only does Gemini thinking not fall for it, but it also answers the trick question with humor? I'm impressed!

by dezgeg7 hours ago|

parent|

[-]

Yeah Gemini seems to be good at giving silly answers for silly questions. E.g. if you ask for "patch notes for Chess" Gemini gives a full on meme answer and the others give something dry like "Chess is a traditional board game that has had stable rules for centuries".

by 71bw10 hours ago|

prev|

[-]

Both Gemini models answer correctly for me in Polish: https://i.imgur.com/1QbK9eU.png

by cracki40 minutes ago|

parent|

[-]

I don't speak Polish. Does it respond appropriately to the kurwa bober meme?

by vjk80013 hours ago|

prev|

[-]

I also tried it with Gemini. Interestingly, Gemini can randomly give either the correct or incorrect answer. Gemini pro always gets it right.

by hxbdg11 hours ago|

prev|

[-]

[dead]