Turing complete systems can be built out of matrix multiplications, out of attention, out of key/value lookups. The Chinese room is Turing complete. By claiming it cannot understand things because it is built out of components computing devices can be built out of, we are claiming no computer can because no computer can. This is a very bold claim indeed, and also we’re assuming the conclusion! The claim is no more convincing than “brains cannot understand things because they are made out of neurons”. The system may or may not have some particular properties, but we have to do more work than just gesturing at the components the system is made of when making claims about it; the alternative is, at best, a world where we prove too much and conclude that humans, too, are not conscious.
For starters, we need to pin down the terms under discussion enough that they don’t just mean whatever we need them to in the moment.
Exactly.
As Ilya Sutskever has pointed out, if you read a mystery novel up until the reveal of the culprit, and then fill in "The killer was _____", don't you need to understand the novel to accurately predict the next word?
Language is tremendously complicated. "Time flies like an arrow, but fruit flies like a banana." "Hard hats must be worn on site; dogs must be carried on escalators", etc. Predicting the next token requires understanding, full stop.
> if the rules are followed, no understanding is neccessary.
The rules are the understanding.
(Note that understanding != consciousness)
The article also makes this assertion that it replays everything over and over again to create each character one at a time as some way to demonstrate the autoregressive self attention mechanism but it’s really not accurate at all, and it trivializes what is going on.
I’m am not asserting LLMs are aware or conscious that’s on the surface profoundly absurd. And I do understand your point that the fact it emits in words something that seems to speak to us gives to the air of humanity that’s isnt real. However there is a very real emergent reality that our language alone appears to lead to embedding a form of thought and understanding that is latent in our use of language in communicating that is in fact coming through the model. It is not regurgitating its corpus and pattern matching because the patterns you input and it emits are not where the inference is operating, its within this enormous vector space through these complex non linear activation functions with learned residuals not in the language corpus.
It is not conscious or aware. It is something else, not human. But if you can not see it as amazing you have lost the capacity to dream.
You, of course, wouldn't notice if your only experience of LLMs was chatting with the cheapest, smallest, least capable LLMs that you get through ChatGPT, or Google search.
It becomes pretty obvious when you use a coding AI on a daily basis. It is the context buffer in which the magic occurs, not the tokens that get spit out one at a time.
Every day, I watch my coding AI develop plans, search the web a half dozen times for documentation, grep through my entire codebase looking for pieces of related code and context, analyze relevant source code across multiple files, spit out an initial plan for implementing the fix before starting to execute it, run requests through some sort of advanced mathematics tool (they are EXTREMELY good at graduate-level calculus and linear algebra), implement fixes that extend across half a dozen files in 2 different computer languages (typescript and C++), run trial compiles and fix coding errors in its output, sometimes developing sub-plans to deal with compile errors. I've seen it get halfway through a fix and revise its initial plan mid-flight as it encounters something in existing source code.
Not vibe coding, to be clear. Targeted use of a coding tool by a by a professional senior software developer with decades of experience, and fair bit of expertise with the limits of what sort of problems my coding AI can and cannot do. Every line code reviewed. Sometimes it needs additional prompts, telling it how it mis-implemented something, or specifying more carefully what I actually want but didn't properly express in the initial request
All the time maintaining that context across multiple request, so that I don't have to restate requests from scratch.
A particularly interesting revision: "You have misread the equation (13) on page 112 of 'Spice, the Manual 2nd ed.'. I should be ....". (It had previously identified the textbook as a source I was using, from comments in source, in a preceding request, and actually already read cited pages in the PDF file, which it had found online). And I had actually asked it to implement equation (13), which was, in fact, badly typeset. The error it had made was defensible, if not the best reading of the equation.
"You are correct. Let me fix that." (producing updates to the implementation of the equation in code, AND code that implements the symbolically-differentiated version of that equation 60 lines later, which is not explicitly given in the text). The text says "take the lagrangian of equations (11), (12) and (13)" or something like that.
ALL information that gets carried in context buffers, even though it's generating code one word at a time. The bulk of the magic occurs in context buffers, not spitting out words one at a time, which, for my coding AI is, I think about 250,000 tokens.
I think it's pretty safe to think that my coding AI is working out of context buffers that may carry plans and research results consisting of tens or hundreds of thousands of arranged tokens carried in context buffers through the multiple steps of the implementation, and later revision. None of that would be possible if were simply working one token ahead.
I kind of suspect that a lot of activity occurs in the first few words of its response. "Let me examine your current source code and develop a plan. Ok. I can see on line 131 where you want me to implement the equation.". (An opportunity to perform about 27 updates of the context buffer). And in the sometimes hundreds of lines of output it generates as it talks itself through what it needs to do.