upvote
What does “understand” mean?

Turing complete systems can be built out of matrix multiplications, out of attention, out of key/value lookups. The Chinese room is Turing complete. By claiming it cannot understand things because it is built out of components computing devices can be built out of, we are claiming no computer can because no computer can. This is a very bold claim indeed, and also we’re assuming the conclusion! The claim is no more convincing than “brains cannot understand things because they are made out of neurons”. The system may or may not have some particular properties, but we have to do more work than just gesturing at the components the system is made of when making claims about it; the alternative is, at best, a world where we prove too much and conclude that humans, too, are not conscious.

For starters, we need to pin down the terms under discussion enough that they don’t just mean whatever we need them to in the moment.

reply
>What does “understand” mean?

Exactly.

reply
There's a growing body of evidence that most of what the brain does is constantly predicting the world around us - look into the predictive brain hypothesis if you're interested.

As Ilya Sutskever has pointed out, if you read a mystery novel up until the reveal of the culprit, and then fill in "The killer was _____", don't you need to understand the novel to accurately predict the next word?

reply
The understanding is inside of the system, in LLMs and in the Chinese Room. I agree with Daniel Dennett that it's preposterous to say that Chinese is not understood in any meaningful sense in the Chinese Room scenario -- it's just that the understanding has been hidden away in the background of the scenario.

Language is tremendously complicated. "Time flies like an arrow, but fruit flies like a banana." "Hard hats must be worn on site; dogs must be carried on escalators", etc. Predicting the next token requires understanding, full stop.

> if the rules are followed, no understanding is neccessary.

The rules are the understanding.

(Note that understanding != consciousness)

reply
Right, it's an illusion of understanding. There is some sort of symbolic understanding, but that is completely due to the fact that the training data was made by humans who actually do understand, can interact with the world, and can write their thoughts down so that the LLM can insert some sort of reference to "basketball" and "Michael Jordan" in their embeddings or whatever.
reply
However it’s disingenuous to say the inference is on the next token because it’s actually not, it’s in the models parameter space across a set of nonlinear activation functions then effectively projected into the token. The idea its predictive of the token isn’t actually the case, it really is a much more complex and more semantic relationship that ends in the series of tokens through the attention mechanism.

The article also makes this assertion that it replays everything over and over again to create each character one at a time as some way to demonstrate the autoregressive self attention mechanism but it’s really not accurate at all, and it trivializes what is going on.

I’m am not asserting LLMs are aware or conscious that’s on the surface profoundly absurd. And I do understand your point that the fact it emits in words something that seems to speak to us gives to the air of humanity that’s isnt real. However there is a very real emergent reality that our language alone appears to lead to embedding a form of thought and understanding that is latent in our use of language in communicating that is in fact coming through the model. It is not regurgitating its corpus and pattern matching because the patterns you input and it emits are not where the inference is operating, its within this enormous vector space through these complex non linear activation functions with learned residuals not in the language corpus.

It is not conscious or aware. It is something else, not human. But if you can not see it as amazing you have lost the capacity to dream.

reply
What if to become really good at predicting you must have some of what we call understanding?
reply
You are oversimplifying. They do produce one word per cycle. But they can also have context buffers carrying up to two million tokens, which is most definitely larger than your measly human short-term memory context buffers.

You, of course, wouldn't notice if your only experience of LLMs was chatting with the cheapest, smallest, least capable LLMs that you get through ChatGPT, or Google search.

It becomes pretty obvious when you use a coding AI on a daily basis. It is the context buffer in which the magic occurs, not the tokens that get spit out one at a time.

Every day, I watch my coding AI develop plans, search the web a half dozen times for documentation, grep through my entire codebase looking for pieces of related code and context, analyze relevant source code across multiple files, spit out an initial plan for implementing the fix before starting to execute it, run requests through some sort of advanced mathematics tool (they are EXTREMELY good at graduate-level calculus and linear algebra), implement fixes that extend across half a dozen files in 2 different computer languages (typescript and C++), run trial compiles and fix coding errors in its output, sometimes developing sub-plans to deal with compile errors. I've seen it get halfway through a fix and revise its initial plan mid-flight as it encounters something in existing source code.

Not vibe coding, to be clear. Targeted use of a coding tool by a by a professional senior software developer with decades of experience, and fair bit of expertise with the limits of what sort of problems my coding AI can and cannot do. Every line code reviewed. Sometimes it needs additional prompts, telling it how it mis-implemented something, or specifying more carefully what I actually want but didn't properly express in the initial request

All the time maintaining that context across multiple request, so that I don't have to restate requests from scratch.

A particularly interesting revision: "You have misread the equation (13) on page 112 of 'Spice, the Manual 2nd ed.'. I should be ....". (It had previously identified the textbook as a source I was using, from comments in source, in a preceding request, and actually already read cited pages in the PDF file, which it had found online). And I had actually asked it to implement equation (13), which was, in fact, badly typeset. The error it had made was defensible, if not the best reading of the equation.

"You are correct. Let me fix that." (producing updates to the implementation of the equation in code, AND code that implements the symbolically-differentiated version of that equation 60 lines later, which is not explicitly given in the text). The text says "take the lagrangian of equations (11), (12) and (13)" or something like that.

ALL information that gets carried in context buffers, even though it's generating code one word at a time. The bulk of the magic occurs in context buffers, not spitting out words one at a time, which, for my coding AI is, I think about 250,000 tokens.

I think it's pretty safe to think that my coding AI is working out of context buffers that may carry plans and research results consisting of tens or hundreds of thousands of arranged tokens carried in context buffers through the multiple steps of the implementation, and later revision. None of that would be possible if were simply working one token ahead.

I kind of suspect that a lot of activity occurs in the first few words of its response. "Let me examine your current source code and develop a plan. Ok. I can see on line 131 where you want me to implement the equation.". (An opportunity to perform about 27 updates of the context buffer). And in the sometimes hundreds of lines of output it generates as it talks itself through what it needs to do.

reply
I use coding agents every day. They're useful. It hasn't changed my mind on what they are.
reply