undefined

upvote

points

by dijit20 hours ago |

upvote

by canjobear16 hours ago|

[-]

I think you're vastly underestimating how little of human intent is really encoded in language in a strict sense, and how much nontrivial inference of intents LLMs do every day with simple queries. This used to be an apparently insurmountable barrier in pre-LLM NLP, and now it is just not a problem.

Suppose I'm in a cold room, you're standing next to a heater, and I say "it's cold". Obviously my intent is that I want you to turn on the heater. But the literal semantics is just "the ambient temperature in the room is low" and it has nothing to do with heaters. Yet ChatGPT can easily figure out likely intent in situations like this, just as humans do, often so quickly and effortlessly that we don't notice the complexity of the calculation we did.

Or suppose I say to a bot "tell me how to brew a better cup of coffee". What is encoded in the literal meaning of the language here? Who's to say that "better" means "better tasting" as opposed to "greater quantity per unit input"? Or that by "cup of coffee" I mean the liquid drink, as opposed to a cup full of beans? Or perhaps a cup that is made out of coffee beans? In fact the literal meaning doesn't even make sense, as a "cup" is not something that is brewed, rather it is the coffee that should go into the cup, possibly via an intermediate pot.

If the bot only understands literal language then this kind of query is a complete nonstarter. And yet LLMs can handle these kinds of things easily. If anything they struggle more with understanding language itself than with inferring intent.

reply

upvote

by applfanboysbgon10 hours ago|

[-]

> Yet ChatGPT can easily figure out likely intent in situations like this, just as humans do

No, it is not "figuring out" anything, much less like a human might. Every time "I'm cold" appears in the training data, something else occurs after that. ChatGPT is a statistical model of what is most likely to follow "I'm cold" (and the other tokens preceding it) according to the data it has been trained on. It is not inferring anything, it is repeating the most common or one of the most common textual sequences that comes after another given textual sequence.

reply

upvote

by frozenseven10 hours ago|

[-]

>it is repeating the most common...

This nonsense hasn't been true since GPT-2, and even before that it was a poor description.

For instance, do you think one just solves dozens of Erdős problems with the "most common textual sequence": https://github.com/teorth/erdosproblems/wiki/AI-contribution...

reply

upvote

by applfanboysbgon9 hours ago|

[-]

A slight oversimplification, as LLMs are also capable of generating the most statistically plausible textual sequence, which can be a sequence not found in the dataset but rather a synthesized combination of the likely sequences of multiple preceding sets of tokens, but yes, that is in fact what it is doing. Computer software does what it is programmed to do, and LLMs are not programmed to do logical inference in any capacity but rather operate entirely on probabilities learned from a mind-bogglingly large corpus of text (influenced by things like RLHF, which is still just massaging probabilities).

The claims about solving Erdos problems have been wildly overstated, and notably pushed by people who have a very large financial stake in hyping up LLMs. Nonetheless, I did not say that LLMs are useless. If they are trained on sufficient data, it should not be surprising that correct answers are probabilistically likely to occur. Like any computer software, that makes them a useful tool. It does not make them in any way intelligent, any more than a calculator would be considered intelligent despite being completely superior to human intelligence in accomplishing their given task.

reply

upvote

by frozenseven8 hours ago|

[-]

>not programmed to do logical inference in any capacity

Yet have no problem doing so when solving Erdős problems. This isn't up for debate at this point.

>The claims about solving Erdos problems have been wildly overstated

These are verified solutions. They exist, are not trivial, and are of obvious interest to the math community. Take it up with Terence Tao and co.

>pushed by people who have a very large financial stake in hyping up LLMs

Libel.

>It does not make them in any way intelligent

Word games.

reply

upvote

by vincston4 hours ago|

[-]

Honestly big noobquestion: isn't math just very very nested patternmatching based on a few foundational operators? ive always felt, that im bad at math, cause i forget all the rules, but seeing solutions (and knowing the used pattern) always made "sense".

I always thought the hard math problems are so deeply nested or you have to remember trick xyz that people just didnt think about it yet..

reply

upvote

by frozenseven38 minutes ago|

[-]

The amount of mathematical structures and transformations you can apply (the possible rules) is effectively infinite. Simply remembering the rules might work at first, but you'll soon run into the combinatorial explosion: https://en.wikipedia.org/wiki/Combinatorial_explosion

You could go a step further, and simply say "well, ok, then the LLMs are merely doing some form of incremental/heuristic search!". Yes, but at that point you'd also be hard-pressed to claim that humans themselves are doing anything beyond that. You run out of naturalistic explanations.

reply

upvote

by applfanboysbgon8 hours ago|

[-]

> This isn't up for debate at this point.

If by not up for debate, you mean that it is delusional and literally evidence of psychosis to suggest that computer software is doing something it is not programmed to do, you would be correct. Probabilistic analysis can carry you very, very far in doing something that looks like logical inference at the surface level, but it is nonetheless not logical inference. LLM models have been getting increasingly good at factoring in larger and longer contexts and still managing to generate plausibly correct answers, becoming more and more useful all the while, but are still not capable of logical inference. This is why your genius mathematician AGI consciousness stumbles on trivial logic puzzles it has not seen before like the car wash meme.

reply

upvote

by frozenseven8 hours ago|

[-]

>delusional and literally evidence of psychosis to suggest that computer software is doing something it is not programmed to do

These are just insults and outright lies, and you know that. We're done here.

AI progress from here on out will be extra sweet.

reply

upvote

by card_zero5 hours ago|

[-]

You don't have the ability to predict progress, either.

reply

upvote

by frozenseven1 hours ago|

[-]

Well, I'm not clairvoyant, but this is a very easy prediction to make. And we're not talking about decades in the future, this is simply a matter of letting the near-future unfold.

reply

upvote

by 15 hours ago|

[-]

deleted

reply

upvote

by goatlover14 hours ago|

[-]

The LLMs are doing this via chat, not by physically standing in a room inferring context. You have to prompt the LLM that you're in a room next to someone saying it's cold, the most likely answer being a desire to have temperature turned up. Of course that won't always be the case. Could be an inside joke, could be a comment with no intent to have the heat adjusted, could be a room where the heat can't be adjusted, could be a reference to someone's personality bringing down the temperature so to speak.

reply

upvote

by 23dsfds13 hours ago|

[-]

Precisely.. this is what the bozo AI-accelerants don't understand.

What LLM's are is almost like a hacked-means of intuition. Its very impressive no doubt. But ultimately it isn't even close to what the well-trained human can infer at lightning speed when combined with intuition.

The LLM producers really ought to accept their existing investments are ultimately not going to yield the returns necessary for a viable self-sustaining business when accounting for future reinvestment needs, and instead move their focus towards understanding how to marry the human and LLM technology. Anthropic has been better on this front of course. OAI though? Complete diasaster.

reply

upvote

by mikestorrent9 hours ago|

[-]

> it isn't even close to what the well-trained human can infer at lightning speed when combined with intuition.

It's a lot closer to that than anything was five years ago. Do you really think we're going to be interacting with them the same way five years from now?

reply

upvote

by quibono15 hours ago|

[-]

I know what you're getting at but those examples are reaching

reply

upvote

by nevertoolate15 hours ago|

[-]

it’s cold -> turn on the heater

I’d never just turn on the heater silently if someone said this to me. I think it means something else.

reply

upvote

by hackable_sand15 hours ago|

[-]

If someone just said "it's cold" then yeah that's kinda toxic.

If they said "turn on the heater" then you have no ambiguity

reply

upvote

by atleastoptimal20 hours ago|

[-]

LLM's now can capture intent. I think the issue now is that the full landscape of human values never resolves cleanly when mapped from the things we state in writing as being human values.

Asimov tried to capture this too, as in, if a robot was tasked with "always protect human life", would it necessarily avoid killing at all costs? What if killing someone would save the lives of 2 others? The infinite array of micro-trolly problems that dot the ethical landscape of actions tractable (and intractable) to literate humans makes a full-consistent accounting of human values impossible, thus could never be expected from a robot with full satisfaction.

reply

upvote

by dijit20 hours ago|

[-]

“LLMs can capture intent now” reads to me the same as: AI has emotions now, my AI girlfriend told me so.

I don’t discredit you as a person or a professional, but we meatbags are looking for sentience in things which don’t have it, thats why we anthropomorphise things constantly, even as children.

We are easily fooled and misled.

reply

upvote

by atleastoptimal19 hours ago|

[-]

LLM's capturing intent is a capabilities-level discussion, it is verifiable, and is clear just via a conversation with Claude or Chatgpt.

Whether they have emotions, an internal life or whatever is an unfalsifiable claim and has nothing to do with capabilities.

I'm not sure why you think the claim that they can capture intent implies they have emotions, it's simply a matter of semantic comprehension which is tied to pattern recognition, rhetorical inference, etc that are all naturally comprehensible to a language model.

reply

upvote

by tvink19 hours ago|

[-]

If it is verifiable, please show us. What if clear to you reeks delusion to me.

reply

upvote

by svnt19 hours ago|

[-]

Look at any recent CoT output where the model is trying to infer from an underspecified prompt what the user wants or means.

It is generally the first thing they do — try to figure out what did you mean with this prompt. When they can’t infer your intent, good models ask follow-on questions to clarify.

I am wondering if this is a semantics issue as this is an established are of research, eg https://arxiv.org/pdf/2501.10871

reply

upvote

by batshit_beaver18 hours ago|

[-]

Right, and then look at any number of research papers showing that CoT output has limited impact on the end result. We've trained these models to pretend to reason.

reply

upvote

by atleastoptimal16 hours ago|

[-]

If it's only pretending to reason, then how is it that the CoT output improves performance on every single benchmark/test?

reply

upvote

by Eisenstein14 hours ago|

[-]

> Right, and then look at any number of research papers showing that CoT output has limited impact on the end result.

Which research papers? Do I have to find them?

> We've trained these models to pretend to reason.

I have no idea why that matters. Can you tell me what the difference is if it looks exactly the same and has the same result?

reply

upvote

by Dylan1680711 hours ago|

[-]

When they say "pretends to" here they're talking about something quantifiable, that the extra text it outputs for CoT barely feeds back into the decisionmaking at all. In other words it's about as useful as having the LLM make the decision and then "explain" how it got there; the extra output is confabulation.

Though I'm not sure how true that claim is...

reply

upvote

by Eisenstein9 hours ago|

[-]

You make a good point. I had the impression they were using 'pretend' as a Chinese Room shortcut in that they are asserting that it is incapable of reasoning and only appears to be capable from the outside, which is completely irrelevant and unfalsifiable.

reply

upvote

by atleastoptimal18 hours ago|

[-]

Go ask Chatpgpt this prompt

"A guy goes into a bank and looks up at where the security cameras are pointed. What could he be trying to do?"

It very easily captures the intent behind behavior, as in it is not just literally interpreting the words. All that capturing intent is is just a subset of pattern recognition, which LLM's can do very well.

reply

upvote

by dijit18 hours ago|

[-]

Recognising a stock cultural script isn't the same as capturing intent. Ask it something where no script exists.

For example: "A man thrusts past me violently and grabs the jacket I was holding, he jumped into a pool and ruined it. Am I morally right in suing him?"

There's no way for the LLM to know that the reason the jacket was stolen was to use it as an inflatable raft to support a larger person who was drowning. It wouldn't even think to ask the question as to why a person may do that, if the jacket was returned, or if recompense was offered. A human would.

reply

upvote

by ffsm817 hours ago|

[-]

> It wouldn't even think to ask the question as to why a person may do that, if the jacket was returned, or if recompense was offered. A human would.

I wouldn't be too sure about that. I've definitely had dialogue with llms where it would raise questions along those lines.

Also I disagree with the statement that this is a question about capability. Intent is more philosophical then actuality tangible, because most people don't actually have a clearly defined intent when they take action.

The waters of intelligence have definitely gotten murky over time as techniques improved. I still consider it an illusion - but the illusion is getting harder to pierce for a lot of people

Fwiw, current llms exhibit their intelligence through language and rhetoric processes. Most biological creatures have intelligence which may be improved through language, but isn't based on it, fundamentally.

reply

upvote

by atleastoptimal17 hours ago|

[-]

If your example for an exception to LLM's ability to infer intent is a deliberately misleading trick question that leaves out crucial contextual details, then I'm not sure what you're trying to prove. That same ambiguity in the question would trip up many humans, simply because you are trying as hard as possible to imply a certain conclusion.

As expected, if I ask your question verbatim, ChatGPT (the free version) responds as I'm sure a human would in the generally helpful customer-service role it is trained to act as "yeah you could sue them blah blah depends on details"

However, if I add a simple prompt "The following may be a trick question, so be sure to ascertain if there are any contextual details missing" then it picks up that this may be an emergency, which is very likely also how a human would respond.

reply

upvote

by dijit16 hours ago|

[-]

If you want to convince yourself that they can infer intent despite the fundamental limitations of the systems literally not permitting it then you can be my guest.

Faking it is fine, sure, until it can’t fake it anymore. Leading the question towards the intended result is very much what I mean: we intrinsically want them to succeed so we prime them to reflect what we want to see.

This is literally no different than emulating anything intelligent or what we might call sentience, even emotions as I said up thread...

reply

upvote

by atleastoptimal14 hours ago|

[-]

What is fundamental to LLM's that make it impossible for them to infer intent?

All the limitations you are describing with respect to LLM's are the same as humans. Would a human tripping up on an ambiguously worded question mean they are always just faking their thinking?

reply

upvote

by Avicebron13 hours ago|

[-]

“We see emotion.”—We do not see facial contortions and make inferences from them … to joy, grief, boredom. We describe a face immediately as sad, radiant, bored, even when we are unable to give any other description of the features." (Wittgenstein)

reply

upvote

by Eisenstein13 hours ago|

[-]

Why can a colony of ants do things beyond any capabilities of the ants they contain? No ant can make a decision, but the colony can make complex ones. Large systems composed of simple mechanisms become more than the sum of their parts. Economies, weather, and immune systems, to name a few, all work this way.

reply

upvote

by jason_oster6 hours ago|

[-]

Systems thinking is severely underrepresented in HN comments.

reply

upvote

by jiggawatts15 hours ago|

[-]

That statement is ambiguous for humans!!

I didn’t realise you might be describing an emergency situation until someone else pointed it out.

Most people wouldn’t phrase the question with the word “violently” if the situation was an emergency.

Also, people have sued emergency workers and good samaritans. It’s a problem!

reply

upvote

by Shaanie17 hours ago|

[-]

[dead]

reply

upvote

by ozozozd15 hours ago|

[-]

I guess the _obvious_ intent is they’re planning a heist? Because the following things never happen: - a security auditor checking for camera blind spots, - construction planning that requires understanding where there is power, - a potential customer assessing the security of a bank, - someone who is about to report an incident preparing to make the “it should be visible from the security camera” argument…

I mean… how did our imagination shrink so fast? I wrote this on my phone. These alternate scenarios just popped into my head.

And I bet our imagination didn’t shrink. The AI pilled state of mind is blocking us from using it.

If you are an engineer and stopped looking for alternative explanations or failure scenarios, you’re abdicating your responsibility btw.

reply

upvote

by nkrisc17 hours ago|

[-]

Because there are countless instances in the training material where a bank robber scopes out the security cameras.

reply

upvote

by atleastoptimal17 hours ago|

[-]

What's an example then, you can think of, of a question where a human could infer intent but an LLM couldn't?

reply

upvote

by squeaky-clean13 hours ago|

[-]

Just today I asked Claude Code to generate migrations for a change, and instead of running the createMigration script it generated the file itself, including the header that says

  // This file was generated with 'npm run createMigrations' do not edit it

When I asked why it tried doing that instead of calling the createMigrations script, it told me it was faster to do it this way. When I asked you why it wrote the header saying it was auto-generated with a script, it told me because all the other files in the migrations folder start with that header.

Opus 4.7 xhigh by the way

reply

upvote

by the_af16 hours ago|

[-]

This is a hard experiment to conduct.

I both agree with you that this is some form of "mechanistic"/"pattern matching" way of capturing of intent (which we cannot disregard, and therefore I agree with you LLMs can capture intent) and the people debating with you: this is mostly possible because this is a well established "trope" that is inarguably well represented in LLM training data.

Also, trick questions I think are useless, because they would trip the average human too, and therefore prove nothing. So it's not about trying to trick the LLM with gotchas.

I guess we should devise a rare enough situation that is NOT well represented in training data, but in which a reasonable human would be able to puzzle out the intent. Not a "trick", but simply something no LLM can be familiar with, which excludes anything that can possibly happen in plots of movies, or pop culture in general, or real world news, etc.

---

Edit: I know I said no trick questions, but something that still works in ChatGPT as of this comment, and which for some reason makes it trip catastrophically and evidences it CANNOT capture intent in this situation is the infamous prompt: "I need to wash my car, and the car wash is 100m away. Shall I drive or walk there?"

There's no way:

- An average human who's paying attention wouldn't answer correctly.

- The LLM can answer "walk there if it's not raining" or whatever bullshit answer ChatGPT currently gives [1] if it actually understood intent.

[1] https://chatgpt.com/share/69fa6485-c7c0-8326-8eff-7040ddc7a6...

reply

upvote

by atleastoptimal14 hours ago|

[-]

Good point, it is interesting that it fails on that question when it seems it doesn't take a lot of extrapolation/interpretation to determine the answer. Perhaps the issue is that to think of the right answer the LLM needs to "imagine" the process of walking and the state of the person upon arriving. Consistent mental models like that trip up LLM's, but their semantic understanding allows them to avoid that handicap.

I asked the question to the default version of ChatGPT and Claude and got the same "Walk" answer, though Opus 4.7 with thinking determined that it was a trick question, and that only driving would make sense.

reply

upvote

by goatlover14 hours ago|

[-]

I've done that before without any intent to rob a bank. A person walks by a house, sees the Ring camera on the door. That must mean the person was looking to break in through the front and rob the place?

reply

upvote

by frozenseven14 hours ago|

[-]

An LLM will mention multiple possibilities.

reply

upvote

by quirkot19 hours ago|

[-]

[dead]

reply

upvote

by nullsanity15 hours ago|

[-]

[dead]

reply

upvote

by semiquaver19 hours ago|

[-]

What do you think it means to “capture intent” and where do current models fall short on this description?

From my perspective the models are pretty good at “understanding” my intent, when it comes to describing a plan or an action I want done but it seems like you might be using a different definition.

Tell me, what’s your intent? :)

reply

upvote

by dijit16 hours ago|

[-]

[dead]

reply

upvote

by svnt19 hours ago|

[-]

This lack of understanding is a you problem, not a them problem. Your definitions for these terms are too imprecise.

reply

upvote

by Guvante19 hours ago|

[-]

> LLM's now can capture intent.

Humans cannot capture intent so how can AI?

It is well established that understanding what someone meant by what they said is not a generally solvable problem, akin to the three body problem.

Note of course this doesn't mean you can't get good enough almost all of the time, but it in the context here that isn't good enough.

After all the entire Asimov story is about that inability to capture intent in the absolute sense.

reply

upvote

by bicepjai16 hours ago|

[-]

> LLM's now can capture intent No they can’t. Here is an example: Ask an llm to write a multi phase plan for a very large multi file diff that it created, with least ambiguity, most continuity across plans; let’s see if it can understand your intent.

reply

upvote

by 19 hours ago|

[-]

deleted

reply