The brain definitely stores things, and retrieval and processing are key to the behaviour that comes out the other end, but whether it's "memory" like what this article tries to define, I'm not sure. The article makes it a point to talk about instances where /lack/ of a memory is a sign of the brain doing something different from an LLM, but the brain is pretty happy to "make up" a "memory", from all of my reading and understanding.
A distinction between semantic (facts/concepts) and episodic (specific experiences) declarative memories are fairly well established since at least the 1970s. That the latter is required to construct the former is also long posited, with reasonable evidence [1]. Similarly, there's a slightly more recent distinction between "recollecting" (i.e., similar to the author's "I can remember the event of learning this") and "knowing" (i.e., "I know this but don't remember why"), with differences in hypothesized recall mechanisms [3].
[1] https://www.science.org/doi/full/10.1126/science.277.5324.33... or many other reviews by Eichenbaum, Squire, Milner, etc
Once we begin to disengage from the arbitrariness inherent in arbitrary metaphors, and rely on what actually generates memories (action-neural-spatial-syntax), we can study what's really happening in the allocortex's distribution of cues between sense/emotion into memory.
Until then we will simply be trapped in falsely segregated ideas of episodic/semantic.
I can recall the experience of getting in on a cold morning, pumping the throttle pedal three times to activate the semi-automatic choke, starting it, and getting out to clear frost off the window while it warmed up a little. The tactile feeling and squeak of the throttle linkage, the sound of the starter motor, the hollow sound of the door closing, and the noxious exhaust from the cold start (which I haven't smelled in 30 years). I remember how my little plastic window scraper sounded when scraping the glass, and even how the defrost vents made two regions which were always easier to scrape. But, I cannot really remember a specific episode of this on a certain date or leading to a particular trip.
On the other hand, I do have an episodic memory on my final trip in this truck. It was sliding off an icy road, rolling over, and sledding down a steep slope. I remember the ruptured, snow-filled windshield, and the sound of the engine idling upside-down. I remember the slow motion way the whole crash unfolded, and the clothes I was wearing as I crawled back to the roadway.
Ironically, I have more emotional context with the generic cold-start memory. It brings with it some vignette of teenage eagerness and pride in having this vehicle and the freedom it represented. The crash is more dissociated, a movie I was watching from within myself. I can meta-remember that I was very distressed afterward, but those feelings are not connected with the memory during recall.
This is very interesting to me. I have temporal lobe epilepsy. My episodic memory is quite poor. However, I believe I'm fairly good at learning new facts (i.e. semantic memory). Perhaps my belief is an illusion, or I'm really only learning facts when my episodic memory is less impaired (which happens; it varies from hour to hour). It's difficult for me to tell of course.
Humans can generally differentiate between when they know something or not, and I'd agree with the article that this is because we tend to remember how we know things, and also have different levels of confidence according to source. Personal experience trumps watching someone else, which trumps hearing or being taught it from a reliable source, which trumps having read something on Twitter or some grafitti on a bathroom stall. To the LLM all text is just statistics, and it has no personal experience to lean to to self-check and say "hmm, I can't recall ever learning that - I'm drawing blanks".
Frankly it's silly to compare LLMs (Transformers) and brains. An LLM was only every meant to be a linguistics model, not a brain or cognitive architecture. I think people get confused because if spits out human text and so people anthropomorphize it and start thinking it's got some human-like capabilities under the hood when it is in fact - surprise surprise - just a pass-thru stack of Transformer layers. A language model.
* Continuously updates its state based on sensory data
* Retrieves/gathers information that correlates strongly with historic sensory input
* Is able to associate propositions with specific instances of historic sensory input
* Uses the above two points to verify/validate its belief in said propositions
Describing how memories "feel" may confuse the matter, I agree. But I don't think we should be quick to dismiss the argument.
It's pretty obvious that an LLM not knowing what it does or does not know is a major part of it hallucinating, while humans do generally know the limits of their own knowledge.
See https://gwern.net/doc/cs/algorithm/information/compression/1... from 1999.
Answering questions in the Turing test (What are roses?) seems to require the same type of real-world knowledge that people use in predicting characters in a stream of natural language text (Roses are ___?), or equivalently, estimating L(x) [the probability of x when written by a human] for compression.
Perhaps in 1999 it seemed reasonable to think that passing the Turing Test, or maximally compressing/predicting human text makes for a good AI/AGI test, but I'd say we now know better, and more to the point that does not appear to have been the motivation for designing the Transformer, or the other language models that preceded it.
The recent history leading to the Transformer was the development of first RNN then LSTM-based language models, then the addition of attention, with the primary practical application being for machine translation (but more generally for any sequence-to-sequence mapping task). The motivation for the Transformer was to build a more efficient and scalable language model by using parallel processing, not sequential (RNN/LSTM), to take advantage of GPU/TPU acceleration.
The conceptual design of what would become the Transformer came from Google employee Jakob Uzkoreit who has been interviewed about this - we don't need to guess the motivation. There were two key ideas, originating from the way linguists use syntax trees to represent the hierarchical/grammatical structure of a sentence.
1) Language is as much parallel as sequential, as can be seen by multiple independent branches of the syntax tree, which only join together at the next level up the tree
2) Language is hierarchical, as indicated by the multiple levels of a branching sytntax tree
Put together these two considerations suggests processing the entire sentence in parallel, taking advantage of GPU parallelism (not sequentially like an LSTM), and having multiple layers of such parallel processing to hierarchically process the sentence. This eventually lead to the stack of parallel-processing Transformer layers design, which did retain the successful idea of attention (thus the paper name "Attention is all you need [not RNNs/LSTMs]").
As far as the functional capability of this new architecture, the initial goal was just to be as good as the LSTM + attention language models it aimed to replace (but be more efficient to train & scale). The first realization of the "parallel + hierarchical" ideas by Uzkoreit was actually less capable than its predecesssors, but then another Google employee, Noam Shazeer, got involved and eventually (after a process of experimentation and ablation) arrived at the Transformer design which did perform well on the language modelling task.
Even at this stage, nobody was saying "if we scale this up it'll be AGI-like". It took multiple steps of scaling, from early Google's early Muppet-themed BERT (following their LSTM-based ELMo), to OpenAI's GPT-1, GPT-2 and GPT-3 for there to be a growing realization of how good a next-word predictor, with corresponding capabilities, this architecture was when scaled up. You can read the early GPT papers and see the growing level of realization - they were not expecting it to be this capable.
Note also that when Shazeer left Google, disappointed that they were not making better use of his Transformer baby, he did not go off and form an AGI company - he went and created Character.ai making fantasy-themed ChatBots (similar to Google having experimented with ChatBot use, then abandoning it, since without OpenAI's innovation of RLHF Transformer-based ChatBots were unpredictable and a corporate liability).
it does not store things in the way records of any sort do, but it does have a some store and recall mechanism that works.
To be fair, LLMs do this too - I just got ChatGPT to recite Ode to Autumn.
By what mechanism do you feel I "remember" last week?