undefined

upvote

points

by imoreno174 days ago |

upvote

by mncharity174 days ago|

[-]

> they never push back on nonsense questions or stupid requirements

"What is the volume of 1 mole of Argon, where T = 400 K and p = 10 GPa?" Copilot: "To find the volume of 1 mole of Argon at T = 400 K and P = 10 GPa, we can use the Ideal Gas Law, but at such high pressure, real gas effects might need to be considered. Still, let's start with the ideal case: PV=nRT"

> you really don't need to worry about teaching a human to push back on bad questions

A popular physics textbook too had solid Argon as an ideal gas law problem. Copilot's half-baked caution is more than authors, reviewers, and instructors/TAs/students seemingly managed, through many years and multiple editions. Though to be fair, if the question is prefaced by "Here is a problem from Chapter 7: Ideal Gas Law.", Copilot is similarly mindless.

Asked explicitly "What is the phase state of ...", it does respond solid. But as with humans, determining that isn't a step in the solution process. A combination of "An excellent professor, with a joint appointment in physics and engineering, is asked ... What would be a careful reply?" and then "Try harder." was finally sufficient.

> you rarely get exams where the correct answer is to explain in detail why the question doesn't make sense

Oh, if only that were commonplace. Aspiring to transferable understanding. Maybe someday? Perhaps in China? Has anyone seen this done?

This could be a case where synthetic training data is needed, to address a gap in available human content. But if graders are looking for plug-n-chug... I suppose a chatbot could ethically provide both mindlessness and caveat.

reply

upvote

by isoprophlex173 days ago|

[-]

Don't use copilot, it's worse than useless. Claude understands that it's a solid on the first try.

reply

upvote

by the_snooze173 days ago|

[-]

>Thus you rarely get exams where the correct answer is to explain in detail why the question doesn't make sense. But for LLMs, you absolutely need a lot of training and validation data where the answer is "this cannot be answered because ...".

I wouldn't even give them credit for cases where there's a lot of good training data. My go-to test is sports trivia and statistics. AI systems fail miserably at that [1], despite the wide availability of good clean data and text about it. If sports is such a blind spot for AIs, I can't help but wonder what else they're confidently wrong about.

[1] https://news.ycombinator.com/item?id=43669364

reply

upvote

by captainkrtek174 days ago|

[-]

This is a good observation. Ive noticed this as well. Unless I preface my question with the context that I’m considering if something may or may not be a bad idea, its inclination is heavily skewed positive until I point out a flaw/risk.

reply

upvote

by aaronbaugher174 days ago|

[-]

I asked Grok about this: "I've heard that AIs are programmed to be helpful, and that this may lead to telling users what they want to hear instead of the most accurate answer. Could you be doing this?" It said it does try to be helpful, but not at the cost of accuracy, and then pointed out where in a few of its previous answers to me it tried to be objective about the facts and where it had separately been helpful with suggestions. I had to admit it made a pretty good case.

Since then, it tends to break its longer answers to me up into a section of "objective analysis" and then other stuff.

reply

upvote

by captainkrtek174 days ago|

[-]

Thats interesting, thanks for sharing that. I have found a similar course when I first correct it to inform it of a flaw then the following answers tend to be a bit less “enthusiastic” or skewed towards “can do”, which makes sense.

reply

upvote

by golergka173 days ago|

[-]

> You ask them to build a flying submarine and by God they'll build one, dammit!

This thing already exists? UK, Soviet Union and USA designed them.

https://en.wikipedia.org/wiki/Flying_submarine

reply

upvote

by GoToRO174 days ago|

[-]

They do. Recently I was pleasantly surprised by gemini telling me that what I wanted to do will NOT work. I was in disbelief.

reply

upvote

by sgtnoodle173 days ago|

[-]

I asked Gemini to format some URLs into an XML format. It got halfway through and gave up. I asked if it truncated the output, and it said yes and then told _me_ to write a python script to do it.

reply

upvote

by queenkjuul170 days ago|

[-]

On the one hand, it did better than chatgpt at understanding what i wanted and actually transforming my data

On the other, truncating my dataset halfway through is nearly as worthless as not doing it at all (and i was working with a single file, maybe hundreds of kilobytes)

reply

upvote

by walls173 days ago|

[-]

This is my most common experience with Gemini. Ask it to do something, it'll tell you how you can do it yourself and then stop.

reply

upvote

by edoloughlin173 days ago|

[-]

Given that Gemini seems to have frequent availability issues, I wonder if this is a strategy to offload low-hanging fruit (from a human-effort pov) to the user. If it is, I think that's still kinda impressive.

reply

upvote

by ASalazarMX173 days ago|

[-]

Somehow I like this. I hate that current LLMs act like yes-men, you can't trust them to give unbiased results. If it told me my approach is stupid, and why, I would appreciate it.

reply

upvote

by chipsrafferty170 days ago|

[-]

I just asked ChatGPT to help me design a house where the walls are made of fleas and it told me the idea is not going to work, and also has ethical concerns.

reply

upvote

by ASalazarMX166 days ago|

[-]

I tried it with a Gemini personality that uses this kind of attack, and since that kind of prompt strongly encourages it to provide a working answer, it decided that the fleas were a metaphor about botnet clients, and the walls were my network, all so it could give an actionable answer.

I inadvertently made a stronger yes-man.

reply

upvote

by GoToRO173 days ago|

[-]

That's a different kind of push back.

reply

upvote

by danielbln174 days ago|

[-]

I've noticed Gemini pushing back more as well, whereas Claude will just butter me up and happily march on unless I specifically request a critical evaluation.

reply

upvote

by kelvinjps10173 days ago|

[-]

Y experience as well

reply

upvote

by captainkrtek174 days ago|

[-]

Interesting, can you share more context on the topic you were asking it about?

reply

upvote

by GoToRO173 days ago|

[-]

coding in a stack I didn't bother to learn first (android)

reply

upvote

by brookst173 days ago|

[-]

> they never push back on nonsense questions or stupid requirements

I was reminded of your comment this morning when I asked ChatGPT how to create a path mask in Rhino Grasshopper:

Me: what is a path mask that will get 1;1;0;0;* and also anything lower (like 1;0;5;10 or 0;20;1;15} ?

ChatGpt: Short answer: No single path mask can do that. Here's why: (very long answer)

Me: are you sure I can't use greater than, less than in the masks?

ChatGpt: Yes — **I am absolutely sure:** **Grasshopper path masks do *NOT* support greater-than or less-than comparisons.** Official sources and detailed confirmation: (sources and stuff)

...so I think your priors may need to be updated, at least as far as "never". And I especially like that ChatGpt hit me with not just bold, not just italics, but bold italics on that NOT. Seems like a fairly assertive disagreement to me.

reply

upvote

by bee_rider173 days ago|

[-]

Hmm. I actually wonder is such a question would be good to include in a human exam, since knowing the question is possible does somewhat impact your reasoning. And, often the answer works out to some nice round numbers…

Of course, it is also not unheard of for a question to be impossible because of an error by the test writer. Which can easily be cleared up. So it is probably best not to have impossible questions, because then students will be looking for reasons to declare the question impossible.

reply

upvote

by vintermann173 days ago|

[-]

Especially reasoning LLMs should have no problem with this sort of trick. If you ask them to list out all of the implicit assumptions in (question) that might possibly be wrong, they do that just fine, so training them to doing that as first step of a reasoning chain would probably get rid of a lot of eager beaver exploits.

reply

upvote

by genewitch173 days ago|

[-]

I think you start to hit philosophical limits with applying restrictions on eager beaver "AI", things like "is there an objective truth" matter when you start trying to decide what a "nonsense question" or "stupid requirement" is.

I'd rather the AI push back and ask clarifying questions, rather than spit out a valid-looking response that is not valid and could never be valid. For example.

I was going to write something up about this topic but it is surprisingly difficult. I also don't have any concrete examples jumping to mind, but really think how many questions could honestly be responded to with "it depends" - like my kid asked me how much milk should a person drink in a day. It depends: ask a vegan, a Hindu, a doctor, and a dairy farmer. Which answer is correct? The kid is really good at asking simple questions that absolutely do not have simple answers when my goal is to convey as much context and correct information as possible.

Furthermore, just because an answer appears in context more often in the training data doesn't mean it's (more) correct. Asserting it is, is fallacious.

So we get to the point, again, where creativite output is being commoditized, I guess - which explains their reasoning for your final paragraph.

reply

upvote

by edoloughlin173 days ago|

[-]

> I also don't have any concrete examples jumping to mind

I do (and I may get publicly shamed and shunned for admitting I do such a thing): figuring out how to fix parenthesis matching errors in Clojure code that it's generated.

One coding agent I've used is so bad at this that it falls back to rewriting entire functions and will not recognise that it is probably never going to fix the problem. It just keeps burning rainforest trying one stupid approach after another.

Yes, I realise that this is not a philosophical question, even though it is philosophically repugnant (and objectively so). I am being facetious and trying to work through the PTSD I acquired from the above exercise.

reply