What do you mean by that? It’s literally text prediction, isn’t it?
I have a list of numbers, 0 to9, and the + , = operators. I will train my model on this dataset, except the model won’t get the list, they will get a bunch of addition problems. A lot. But every addition problem possible inside that space will not be represented, not by a long shot, and neither will every number. but still, the model will be able to solve any math problem you can form with those symbols.
It’s just predicting symbols, but to do so it had to internalize the concepts.
This gives the impression that it is doing something more than pattern matching. I think this kind of communication where some human attribute is used to name some concept in the LLM domain is causing a lot of damage, and ends up inadvertently blowing up the hype for the AI marketing...
So the conclusion was that these middle layers have their own language and it's converting the text into this language and this decoding it. It explains why sometime the models switch to chinese when they have a lot of chinese language inputs, etc.
You are also confusing ‘mechanistic explanation still incomplete’ with ‘empirical phenomenon unestablished.’ Those are not the same thing.
PS. Em dash? So you are some LLM bot trying to bait mine HN for reasoning traces? :D
You sound like you’re trying to sound impressive. Like I said, I’ll read the paper.
you are discovering that the favorite luddite argument is bullshit
https://machinelearning.apple.com/research/illusion-of-think...
> just look at research papers
You didn't add anything other than vibes either.
This is not how the feature called "reasoning" work in current models.
"reasoning" simply let's the model output and then consume some "thinking" tokens before generating the actual output.
All the "fluff" tokens in the output have absolutely nothing to do with "reasoning".
For example thinking in modern US English generates many thoughts, to keep correct speak at right cultural context (there is only one correct way to say People Of Color, and it changes every year, any typo makes it horribly wrong).
Some languages are far more expressive and specialized in logical conditions, conditionals, recursion and reasoning. Like eskimos have 100 words for snow, but for boolean algebra.
It is well proven that thinking in Chinese needs far less tokens!
With this caveman mod you strip out most of cultural complexities of anglosphere, make it easier for foreigners and far simpler to digest.
This is simply not true.
It is very arrogant to assume, no other language can be more advanced than English.
Programming languages are not languages in the human brain nor the culture sense.