Tokens are the most basic input unit of an LLM. But tokens don't generally correspond to words or letters, rather sub-word sequences. So Strawberry might be broken up into two tokens 'straw' and 'berry'. It has trouble distinguishing features that are "sub-token" like specific letter sequences because it doesn't see letter sequences but just the token as a single atomic unit. 'Straw' and 'r' are two tokens but an LLM is entirely blind to the fact that 'straw' has one 'r' in it.
As an analogy, I might ask you to identify the relative activations of each of the three cone types on your retina as I present some solid color image to your eyes. But of course you can't do this, you simply do not have cognitive access to that information. Individual color experiences are your basic vision tokens.
The widespread mistake people keep making is assuming the development of intelligence in LLMs should follow the same trajectory that human intelligence takes as it develops into adult levels of intelligence. Thus deficiency in some capacity that we take for granted in humans is an indictment on LLM intelligence. But this is specious. LLMs are entirely alien; their developmental paths do not and should not look anything like ours. Your intuition from human intelligence just works against understanding the potential for intelligence in LLMs.
To be fair, almost everyone who claims LLMs are conscious tends to claim that they are conscious in exactly the way that humans are, to the point of stating that human brains are also just complex next-token prediction machines with a random seed. It's basically religious arguments on both sides.
I have seen people say "you're a next token prediction machine" but only in a similar way one might say "you're a cup of old lard". Not actually meaning it literally.
I have seen people interpret the request to show that they are not next token prediction machines to be a claim that they are, but this is almost always an argument to show certainty is difficult in this area.
People like Hinton have declared that they believe them to be conscious, but clealy indicate that they do not mean just like us.
And it seem obvious to me that language behavior does differ significantly between humans and LLMs based on the frequency and nature of failure states. LLMs routinely hallucinate, or get "AI strokes" or get obsessed about not talking about goblins, etc. This isn't typical language behavior for humans unless they have severe neurological or psychological impairment.
People tend not to "spew words out without thinking" and certainly not all the time by default - we call that glossolalia and (outside of some fringe Christian sects) it's considered a "bug" not a "feature" of the human brain. Human language by default always has intent behind it, even if that intent isn't readily apparent to the speaker. People can recite by rote memory, but that isn't blind token prediction, it's the neurological equivalent of muscle memory. People can have conversations then forget about them because their attention was focused elsewhere, but that doesn't indicate that they were simply "spewing words out without thinking" at the time.
People imagine details all the time. Eyewitness testimony is notoriously untrustworthy.
Our brains seem wired to confidently fill in gaps. We all have a literal blind spot we aren’t aware of because our brains convincingly lie to us and fill in the gap.
I don’t know what an “AI stroke” is, but I’ve definitely seen human beings in good health be in the middle of talking and suddenly forget what they are going to say.
> People tend not to "spew words out without thinking" and certainly not all the time by default - we call that glossolalia and (outside of some fringe Christian sects) it's considered a "bug" not a "feature" of the human brain.
Glossolalia is spouting gibberish, not comprehensible speech.
Kind of weird that you speak so confidently when you don’t apparently know the difference between steam of consciousness and “speaking in tongues”. Almost like you’re AI hallucinating.
This sounds like a description of a child who has not learned to read yet. You ask a child who is not aware of the alphabet and of "words" how many r's are in strawberry you'd get a non-sense answer too. So what you're really pointing out is that the LLMs have not been trained on "the english language" and how words are constructed and what they are composed of. That they operate by tokens that don't correspond to words or letters is irrelevant as an answer to why they can't count the letters in a word. It's not that I know how many r's are in strawberry because of how I'm understanding the word "strawberry", I know how many r's are in strawberry because I know how to spell strawberry. The LLM needs to be trained on this the same way someone who is learning to read would be trained on it. No one should be surprised that an LLM can't "read" in the same way no one should be surprised that a child can't "read".
This interpretation takes things too far away from how LLMs are constituted and so misses important explanatory power. The issue of counting letters in a word isn't about an ability to spell, it's about the nature of one's perception. We perceive words as sequences of individual letters. LLMs do not. I can ask you to tell me how many r's are in some nonsense word sequence and you're fully capable of doing that. LLMs do not see sequences of letters so they are intrinsically at a disadvantage for this kind of question. But this says nothing about its capacity for intelligence anymore than not naturally being able to distinguish frequencies of photons hitting your retina has anything to say about human intelligence.
I disagree with this pretty strongly, because I don't think you're correct that I don't have the ability to distinguish frequencies of photons hitting my retina. We have a lot of tools that can determine the frequency of light and I can use those on any source of light that I wish to measure that may hit my retinas.
If you ask an LLM how many Rs are in strawberry, it wouldn't think like this. It would confidently state that there are two Rs. Even though it "knows" that it can write a python script to count the number of Rs in strawberry, it doesn't do that. Why not? Is it maybe because it isn't intelligent? Yeah, you can get an LLM to count the number of Rs in strawberry by writing a python script, but that's a use of your intelligence, not the LLM's.
Counting letters in a word seems to have little to do with understanding the word. Young kids can’t spell or count well at all but no one says that means they can’t understand.