It isn’t because humans and current LLMs have radically different architectures
LLMs: training and inference are two separate processes; weights are modifiable during training, static/fixed/read-only at runtime
Humans: training and inference are integrated and run together; weights are dynamic, continuously updated in response to new experiences
You can scale current LLM architectures as far as you want, it will never compete with humans because it architecturally lacks their dynamism
Actually scaling to humans is going to require fundamentally new architectures-which some people are working on, but it isn’t clear if any of them have succeeded yet
True, but we have RAG to offset that.
> it architecturally lacks their dynamism
We'll get there eventually. Keep in mind that the brain is now about 300k years into fine-tuning itself as this species classified as homo sapiens. LLMs haven't even been around for 5 years yet.
In practice that doesn’t always work… I’ve seen cases where (a) the answer is in the RAG but the model can’t find it because it didn’t use the right search terms-embeddings and vector search reduces the incidence of that but cannot eliminate it; (b) the model decided not to use the search tool because it thought the answer was so obvious that tool use was unnecessary; (c) model doubts, rejects, or forgets the tool call results because they contradict the weights; (d) contradictions between data in weights and data in RAG produce contradictory or ineloquent output; (e) the data in the RAG is overly diffuse and the tool fails to surface enough of it to produce the kind of synthesis of it all which you’d get if the same info was in the weights
This is especially the case when the facts have changed radically since the model was trained, e.g. “who is the Supreme Leader of Iran?”
> We'll get there eventually. Keep in mind that the brain is now about 300k years into fine-tuning itself as this species classified as homo sapiens. LLMs haven't even been around for 5 years yet.
We probably will eventually-but I doubt we’ll get there purely by scaling existing approaches-more likely, novel ideas nobody has even thought of yet will prove essential, and a human-level AI model will have radical architectural differences from the current generation
AlphaZero isn’t a LLM. There are Feed Forward networks, recurrent networks, convolutional networks, transformer networks, generative adversarial networks.
Brains have many different regions each with different architectures. None of them work like LLMs. Not even our language centres are structured or trained anything like LLMs.
Language came after conceptual modeling of the world around us. We're surrounded by social species with theory of mind and even the ability to recognise themselves and communicate with each other, but none of them have language. Even the communications faculties they have operate in completely different parts of their brains than ours with completely different structure. Actually we still have those parts of the brain too.
Conceptual representation and modeling came first, then language came along to communicate those concepts. LLMs are the other way around, linguistic tokens come first and they just stream out more of them.
This is why Noam Chomsky was adamant that what LLMs are actually doing in terms of architecture and function has nothing to do with language. At first I thought he must be wrong, he mustn't know how these things work, but the more I dug into it the more I realised he was right. He did know, and he was analysing this as a linguist with a deep understanding of the cognitive processes of language.
To say that brains are language models you have to ditch completely what the term language model actually means in AI research.
That's irrelevant though, since all the above are still prediction machines based on weights.
If you're ok with the brain being that, then you just changed the architecture (from LLM-like), not the concept.
An LLM is a specific neural architectural structure and training process. Brains are also neural networks, but they are otherwise nothing at all like LLMs and don't function the ways LLMs do architecturally other than being neural networks.
We do not have all the answers or a complete understanding of everything.