upvote
If I'm reading you right, your opinion is essentially: "If building bigger and bigger statistical next word predictors won't lead to artificial general intelligence, we will never see artificial general intelligence"

I don't know, maybe AGI is possible but there's more to intelligence than statistical next word prediction?

reply
Its not a statistical next word predictor.

The 'predicting the next word' is the learning mechanism of the LLM which leads to a latent space which can encode higher level concepts.

Basically a LLM 'understands' that much as efficient as it has to be to be able to respond in a reasonable way.

A LLM doesn't predict german text or chinese language. It predicts the concept and than has a language layer outputting tokens.

And its not just LLMs which are progressing fast, voice synt and voice understanding jumped significantly, motion detection, skeletion movement, virtual world generation (see nvidias way of generating virutal worlds for their car training), protein folding etc.

reply
I'm sorry but the input to a model is a sequence of tokens and the output is a probability distribution of what's the most likely next token. It's a very very very fancy next token predictor but that is fundamentally what it is. I'm making the argument that this paradigm might not give rise to a general intelligence no matter how much you scale it.
reply
It's a very very very fancy next token predictor

Yes, and unless you are prepared to rebut the argument with evidence of the supernatural, that's all there is, period. That's all we are.

So tired of the thought-terminating "stochastic parrot" argument.

reply
Do LLMs even learn? The companies that build them build new models based partly on the conversations the older models have had with people, but do they incorporate knowledge into their neural nets as they go along?

Can an LLM decide, without prompting or api calls, to text someone or go read about something or do anything at all except for waiting for the next prompt?

Do LLMs have any conceptual understanding of anything they output? Do they even have a mechanism for conceptual understanding?

LLMs are incredibly useful and I'm having a lot of fun working with them, but they are a long way from some kind of general intelligence, at least as far as I understand it.

reply
"Do LLMs even learn?"

They learned already a lot more than any of us will. Additinal to this, you have a prompt and you can teach it things in the prompt. Like if you give it examples how it should parse things, with examples in the prompt, it becomes better in doing it.

I would say yes they learn.

"Can an LLM decide" I would argue that you frame that wrong. If a LLM is the same thing as the pure language part of our brain, than the agent harness and the stuff around it, would be another part of our brain. I find it valid to use the LLM with triggers around it.

Nonetheless, we probably can also design an architecture which has a loop build in.

"Do LLMs have any conceptual understanding" Thats what a LLM has in their latent space. Basically to be able to predict the next token in such a compressed space, they 'invent' higher meaning in that space. You can ask a LLM about it actually.

Yeah for AGI we are not there yet and we do not know how it will look like.

reply
Yes, to all of your questions. You need to use a recent LLM in an agentic harness. Tell it to take notes, and it will.

After a bit of further refinement, we'll start to call that process "learning." Eventually the question of who owns the notes, who gets to update them, and how, will become a huge, huge deal.

reply
I'm not sure why you think you know the human brain works through predicting the next token.

It's not supernatural, I believe that an artificial intelligence is possible because I believe human intelligence is just a clever arrangement of matter performing computation, but I would never be presumptuous enough to claim to know exactly how that mechanism works.

My opinion is that human intelligence might be what's essentially a fancy next token predictor, or it might work in some completely different way, I don't know. Your claim is that human intelligence is a next token predictor. It seems like the burden on proof is on you.

reply
> Your claim is that human intelligence is a next token predictor.

Literally it is, at least in many of its forms.

You accepted CamperBob2’s text as input and then you generated text as output. Unless you are positing that this behavior cannot prove your own general intelligence, it seems plain that “next token generator” is sufficient for AGI. (Whether the current LLM architecture is sufficient is a slightly different question.)

reply
Before I start typing, I think abstractly about the topic and decide on what I shall write in response. Due to the linear nature of time, typing necessarily happens one word at a time, but I am never producing a probability distribution of words (at least not in a way that my conscious self can determine), I consider an entire idea and then decide what tokens to enter into the computer in order to communicate the idea to you.

And while I am typing, and while I am thinking before I type, I experience an array of non-textual sensory input, and my whole experience of self is to a significant extent non-lingual. Sometimes, I experience an inner monologue, sometimes I think thoughts which aren't expressed in language such as the structure of the data flow in a computer program, sometimes I don't think and just experience feelings like a kiss or the sun on my skin or the euphoria of a piece of music which hits just right. These experiences shape who I am and how I think.

When I solve difficult programming problems or other difficult problems, I build abstract structures in my mind which represents the relevant information and consider things like how data flows, which parts impact which other parts, what the constraints are, etc. without language coming in to play at all. This process seems completely detached from words. In contrast, for a language model, there is no thinking outside of producing words.

It seems self-evident to me that at least parts of the human experience fundamentally can not be reduced to next token prediction. Further, it seems plausible to me that some of these aspects may be necessary for what we consider general intelligence.

Therefore, my position is: it is plausible that next token prediction won't give rise to general intelligence, and I do not find your argument convincing.

reply
But a LLM shows similiar effects.

COCONUT, PCCoT, PLaT and co are directly linked to 'thinking in latent space'. yann lecun is working on this too, we have JEPA now.

Also how do you describe or explain how an LLM is generating the next token when it should add a feature to an existing code base? In my opinion it has structures which allows it to create a temp model of that code.

For sure a LLM lack the emotional component but what we humans also do, which indicates to me, that we are a lot closer to LLMs that we want to be, if you have a weird body feeling (stress, hot flashes, anger, etc.) your 'text area/llm/speech area' also tries to make sense of it. Its not always very good in doing so. That emotional body feeling is not that aligned with it and it takes time to either understand or ignore these types of inputs to the text area/llm/speech part of our brain.

I'm open for looking back in 5 years and saying 'man that was a wild ride but no AGI' but at the current quality of LLMs and all the other architectures and type of models and money etc. being thrown at AGI, for now i don't see a ceiling at all. I only see crazy unseen progress.

reply
I don't understand what part of what I said you disagree with.
reply
You state how you think and plan and have thoughts on how to do things etc. and i assumed you mention your way of thinking because you assume a LLM is not doing any of it.

I showed than counter examples.

reply
I don't think you showed counter examples? Or can you link me to a paper which describes a language model thinking without predicting tokens?
reply
My second sentence references all these papers:

"COCONUT, PCCoT, PLaT and co are directly linked to 'thinking in latent space'. yann lecun is working on this too, we have JEPA now."

reply
And it does this thinking without producing tokens?
reply
yes.

Btw. just because you have to do something with the LLM to trigger the flow of information through the model, doesn't mean it can't think. It only means that we have to build an architecture around the model or build it into the models base architecture to enable more thinking.

We do not know how the brain architecture is setup for this. We could have sub agents or we can be a Mixture of Experts type of 'model'.

There is also work going on in combining multimodal inputs and diffusion models which look complelty different from a output pov etc.

If you look how a LLM does math, Anthropic showed in a blog article, that they found similiar structures for estimating numbers than how a brain does.

Another experiment from a person was to clone layers and just adding them beneth the original layer. This improved certain tasks. My assumption here is, that it lengthen and strengthen kind of a thinking structure.

But because using LLMs are still so good and still return relevant improvements, i think a whole field of thinking in this regard is still quite unexplored.

reply
If you ask a model to multiply 322423324 by 8675309232 without using tools, it's interesting to think about how it does it. Where are the intermediate results being maintained?

"In context" is the obvious answer... but if you view the chain of thought from a reasoning model, it may have little or nothing to do with arriving at the correct answer. It may even be complete nonsense. The model is working with tokens in context, but internally the transformer is maintaining some state with those tokens that seems to be independent of the superficial meanings of the tokens. That is profoundly weird, and to me, it makes it difficult to draw a line in the sand between what LLMs can do and what human brains can do.

reply
deleted
reply
> I am never producing a probability distribution of words (at least not in a way that my conscious self can determine)

Inability to introspect your own word selections does not mean it’s meaningfully different from what an LLM does. There is plenty of evidence that humans do a lot of things that are not driven by conscious choice and we rationalize it after the fact.

> I consider an entire idea and then decide what tokens to enter into the computer in order to communicate the idea to you.

And how is that different? You are not so subtly implying that an LLM can’t consider an idea but you haven’t established this as fact. i.e. You are starting with the assumption that an LLM cannot possibly think and therefore cannot be intelligent, but this is just begging the question.

> sometimes I don't think and just experience feelings like a kiss or the sun on my skin or the euphoria of a piece of music which hits just right. These experiences shape who I am and how I think.

You cannot spin experience as intelligence. LLMs have the experience of reading the entire internet, something you cannot conceive of. Certainly your experiences shape who you are. This is a different axis from intelligence, though.

> This process seems completely detached from words. In contrast, for a language model, there is no thinking outside of producing words.

Both sides of this claim seem dubious. The second half in particular seems to be founded on nothing. Again, you are asserting with no support that there is no thinking going on.

> It seems self-evident to me that at least parts of the human experience fundamentally can not be reduced to next token prediction. Further, it seems plausible to me that some of these aspects may be necessary for what we consider general intelligence.

I don’t think anyone sane is claiming an LLM can have a human experience. But it is not clear that a human experience is necessary for intelligence.

reply
> Inability to introspect your own word selections does not mean it’s meaningfully different from what an LLM does. There is plenty of evidence that humans do a lot of things that are not driven by conscious choice and we rationalize it after the fact.

This is correct and also completely irrelevant. I am describing what I experience, and describing how my experience seems very different to next token prediction. I therefore conclude that it's plausible that there is more involved than something which can be reduced to next token prediction.

> And how is that different? You are not so subtly implying that an LLM can’t consider an idea but you haven’t established this as fact. i.e. You are starting with the assumption that an LLM cannot possibly think and therefore cannot be intelligent, but this is just begging the question.

Language models can't think outside of producing tokens. There is nothing going on within an LLM when it's not producing tokens. The only thing it does is taking in tokens as input and producing a token probability distribution as output. It seems plausible that this is not enough for general intelligence.

> You cannot spin experience as intelligence.

Correct, but I can point out that the only generally intelligent beings we know of have these sorts of experiences. Given that we know next to nothing about how a human's general intelligence works, it seems plausible that experience might play a part.

> LLMs have the experience of reading the entire internet, something you cannot conceive of.

I don't know that LLMs have an experience. But correct, I cannot conceive of what it feels like to have read and remembered the entire Internet. I am also a general intelligence and an LLM is not, so there's that.

> Certainly your experiences shape who you are. This is a different axis from intelligence, though.

I don't know enough about what makes up general intelligence to make this claim. I don't think you do either.

> Both sides of this claim seem dubious. The second half in particular seems to be founded on nothing. Again, you are asserting with no support that there is no thinking going on.

I'm telling you how these technologies work. When a language model isn't performing inference, it is not doing anything. A language model is a function which takes a token stream as input and produces a token probability distribution as output. By definition, there is no thinking outside of producing words. The function isn't running.

> I don’t think anyone sane is claiming an LLM can have a human experience. But it is not clear that a human experience is necessary for intelligence.

I 100% agree. It is not clear whether a human experience is necessary for intelligence. It is plausible that something approximating a human-like experience is necessary for intelligence. It is also plausible that something approximating human-like experience is completely unnecessary and you can make an AGI without such experiences.

It's plausible that next token prediction is sufficient for AGI. It's also plausible that it isn't.

reply
> I don't know enough about what makes up general intelligence to make this claim. I don't think you do either.

This is the fundamental issue. No one seems capable of defining general intelligence. Ten years ago most scientists would probably have agreed that The Turing Test was sufficient but the goalposts shifted when ChatGPT passed that.

If it’s not clear what AGI even means, it’s hard to say whether an LLM can achieve it, because it devolves into pointing out that an LLM is not a human.

reply
> Ten years ago most scientists would probably have agreed that The Turing Test was sufficient but the goalposts shifted when ChatGPT passed that.

The popularity of, and lack of consensus on, the Chinese room thought experiment kind of implies that this is wrong? I don't think many scientists (or, more relevantly, philosophers of mind) would, even 10 years ago, have said, "if a computer is able to fool a human into thinking it's a human, then the computer must possess a general intelligence".

Even Turing's perspective was, from what I understand, that we must avoid treating something that might be sentient as a machine. He proposed that if a computer is able to act convincingly human, we ought to treat it as if it is a human, not because it must be a conscious being but because it might be.

reply
Perhaps I am wrong or overstating the belief that the Turing test would be sufficient. My recollection is that it was well regarded as a meaningful if not conclusive test.

> the Chinese room thought experiment

This is an interesting thought experiment but I think the “computers don’t understand” interpretation relies on magical thinking.

The notion that “systemic” understanding is not real is purely begging the question. It also ignores that a human is also a system.

reply
I'm telling you how these technologies work. When a language model isn't performing inference, it is not doing anything. A language model is a function which takes a token stream as input and produces a token probability distribution as output. By definition, there is no thinking outside of producing words. The function isn't running.

If what you are saying is true, then LLMs wouldn't be able to handle out-of-distribution math problems without resorting to tool use. Yet they can. When you ask a current-generation model to multiply some 8-digit numbers, and forbid it from using tools or writing a script, it will almost certainly give you the right answer. That includes local models that can't possibly cheat. LLMs are stochastic, but they are not parrots.

At the risk of sounding like an LLM myself, whatever process makes this possible is not simply next-token prediction in the pejorative sense you're applying to it. It can't be. The tokens in a transformer network are evidently not just words in a Markov chain but a substrate for reasoning. The model is generalizing processes it learned, somehow, in the course of merely being trained to predict the next token.

Mechanically, yes, next-token prediction is what it's doing, but that turns out to be a much more powerful mechanism than it appeared at first. My position is that our brains likely employ similar mechanism(s), albeit through very different means.

It is scarcely believable that this abstraction process is limited to keeping track of intermediate results in math problems. The implications should give the stochastic-parrot crowd some serious cognitive dissonance, but...

(Edit: it occurs to me that you are really arguing that the continuous versus discrete nature of human thinking is what's important here. If so, that sounds like a motte-and-bailey thing that doesn't move the needle on the argument that originally kicked off the subthread.)

(Edit 2, again due to rate-limiting: it does sound like you've fallen back to a continuous-versus-discrete argument, and that's not something I've personally thought much about or read much about. I stand by my point that the ability to do arithmetic without external tools is sufficient to dispense with the stochastic-parrot school of thought, and that's all I set out to argue here.)

reply
> If what you are saying is true, then LLMs wouldn't be able to handle out-of-distribution math problems without resorting to tool use. Yet they can. When you ask a current-generation model to multiply some 8-digit numbers, and forbid it from using tools or writing a script, it will almost certainly give you the right answer. That includes local models that can't possibly cheat. LLMs are stochastic, but they are not parrots.

Okay, what do you think language models are doing when they're not producing token probability distributions? What processes do you think are going on when the function which predicts a token isn't running?

> At the risk of sounding like an LLM myself, whatever process makes this possible is not simply next-token prediction in the pejoreative sense you're applying to it.

I don't know what pejorative sense you're implying here. I am, to the best of my ability, describing how the language model works. I genuinely believe that a language model is, in essence, a function which takes in a sequence of tokens and produces a token probability distribution as an output. If this is incorrect, please, correct me.

reply
> Okay, what do you think language models are doing when they're not producing token probability distributions? What processes do you think are going on when the function which predicts a token isn't running?

What are you doing when you are not outputting tokens? You have a thought, evaluate it, refine it, repeat.

You’re not wrong that the basic building block is just “next token prediction”, but clearly the emergent behaviors exceed our intuition about what this process can achieve. We’re seeing novel proofs come out of these. Will this lead to AGI? That’s still TBD.

> I genuinely believe that a language model is, in essence, a function which takes in a sequence of tokens and produces a token probability distribution as an output. If this is incorrect, please, correct me.

The pejorative is that you imply this is a shallow and unthinking process. As I said earlier, you are literally a token generator on HN. You read someone’s comment, do some kind of processing, and output some tokens of your own.

reply
> What are you doing when you are not outputting tokens? You have a thought, evaluate it, refine it, repeat.

I mean I do think sometimes even when not typing?

> Will this lead to AGI? That’s still TBD.

This is literally what I have been saying this whole time.

Since we agree, I will consider this conversation concluded.

reply
He’s a time waster.

I bet the guy has never contributed a novel thought that could be argued as moving something of magnitude forward. If that is the case he ought to stop writing as if he were capable of doing so - and therefore has no understanding of what true intelligence is.

reply
> I consider an entire idea and then decide what tokens to enter into the computer in order to communicate the idea to you.

This overestimates introspective access.

The brain is very good at producing a coherent story after the fact. Touch the hot stove and your hand moves before the conscious thought of "too hot" arrives. The hot message hits your spinal cord and you move before it reaches your brain. Your conscious mind fills in the rest afterwards.

I don't think that means that conscious thought is fake. But it does make me skeptical of the claim that we first possess a complete idea and only then does it serialize into words. A lot of the "idea" may be assembled during the act of expression, with consciousness narrating the process as if it had the whole thing in advance.

With writing, as in this comment, there's also a lot a backtracking and rewording that LLMs don't have the ability to do, so there's that.

reply
Before I start typing, I think abstractly about the topic

Before you start typing, an fMRI machine can tell you which finger you'll lift first, before you know it yourself.

We are not special. Consciousness is literally a continuous hallucination that we make up to explain what we do and what we think, after the fact. A machine can be trained to behave identically, but it's not clear if that's the best way forward or not.

Edit due to rate limiting: to answer your question, the substrate your mind uses to drive this process can be considered an array of tokens that, themselves, can be considered 'words.'

It's hard to link sources -- what am I supposed to do, send you to Chomsky and other authorities who have predicted none of what's happening and who clearly understand even less?

reply
> (Edit: to answer your question, the substrate your mind uses to drive this process can be considered an array of tokens that, themselves, can be considered 'words.')

This seems like a factual claim. Can you link a source?

(Also why respond in the form of an edit?)

reply
What's your argument? An fMRI can tell which finger I will lift first before that information makes its way to my consciousness, ergo next word prediction is sufficient for general intelligence? Do you hear yourself?
reply
The statement is that your perception of your own cognition isn’t necessarily reality. That isn’t a statement that token prediction is sufficient for general intelligence. It’s a statement that your subjective experience is misleading you.
reply
> Its not a statistical next word predictor.

it absolutely is a next word predictor

reply
LLM proponents believe that these higher level encodings in latent space do in fact match the real world concepts described by our language(s).

However, a much simpler explanation for what we see with LLMs is that instead the higher level encodings in latent space match only the patterns of our language(s), and no deeper encoding/understanding is present.

It's Plato's Cave - the shadows on the wall are all an LLM ever sees, and somehow it is expected to derive the real reality behind them.

reply
Could be, yes for sure but I think it would be very naive in the current state of progress we are in, to down play what progress is happening.

At least Mythos model with its 10 Trillion parameter might indicate that the scaling law is valid. Its a little bit unfortunate that we still don't know that much more about that model.

reply
> And if you look at Boston Dynamics, Unitree and Generalist's progress on robotics

Their progress is almost nought. Humanoids are stupid creations that are not good at anything in the real world. I'll give it to the machine dogs, at least they can reach corners we cannot.

reply
I found there demonstration at the CES this year very spectacular: https://www.youtube.com/watch?v=YIhzUnvi7Fw

I can also recommend looking at Generalist: https://www.youtube.com/@Generalist_AI

reply
> Their progress is almost nought.

How can you say the advancements since Honda's asimo robot amount to "almost naought"?

reply
Not sure if you're being sincere or sarcastic but some of us have lived through several AI winters now. And the fact that such a phenomenon exists is because of this terrible amount of hype the topic gets whenever any progress is made.
reply
Which ones? At least in the last 4 years, there was no AI winter.
reply
The late 70s, again in the late 80s. See wikipedia.

https://en.wikipedia.org/wiki/AI_winter

reply
Yeah and if you look at the blocking factors at that time (data, compute) these type of limits currently are non existend.

There is a difference to be acknowledged: in the 70s/80s the whole world didn't suddenly start to shift to AI right?

So why do so many smart and/or rich people push this? Hype? Yeah sure but hype was here for crypto too.

I bet its an undelying understanding and the right time with the right components: Massive capital for playing this game long enough to see through the required initial investment, internet for fast data sharing, massive compute for the amount of data and compute you need, real live business relevant results (it already disrupts jobs) etc.

reply
History started well before 4 years ago
reply
Yeah but this AI wave has nothing to do how we came to AI winter in the 70s or 80s.

The necessary amount of Compute, interconnect (internet), money, researcher etc. wasn't available at that time.

and we did not invest the most amount of money and compute and brain power as we are doing right now. This is unseen.

reply
Ah, the youth...

"The new economy" also didn't have anything to do with the previous one. Turns out that it crashed just as well.

reply
I'm not an economist but at least as an european person, I currently do see a huge restructuring going on. A shift away from the USA to China. But I never voiced an opinion about that.

I do follow ML/AI/AGI though for a decade by now and read a lot about Neuronal networks, LLMs, etc. in a broad spectrum.

My prediction regarding Crypto/blockchain was true too.

We will see how it plays out. I'm open for both, but I think it would be naive to ignore whats going on and its way to soon to assume there is a AI winter coming soon.

We sitll want to see what Mythos can do and a distilled version of it.

reply
> Progress is huge and fast

is it? we're currently scaled on data input and LLMs in general, the only thing making them advance at all right now is adding processing power

reply
Same thing happened with self-driving cars. Oh and cryptocurrencies.
reply
Self-driving had never the amount of compute, research adoption and money than what the current overall AI has. Its not comparable.

Crypto was flawed from the beginning and lots of people didn't understood it properly. Not even that a blockchain can't secure a transaction from something outside of a blockchain.

reply
The LLMs are flawed, and lots of people don't understand them properly.
reply
People are researching how to make LLMs more stable and from a statistic point of view, we already now down to 10% (progress is made here).

LLMs don't have to be perfect, they just need to be as good as humans and cheaper or easier to manage.

reply
> Self-driving had never the amount of compute, research adoption and money than what the current overall AI has. Its not comparable.

$100+ billion in R&D and it's not comparable... hmm

reply
> Self-driving had never the amount of compute, research adoption and money than what the current overall AI has.

And yet they don't do really good jobs with pretty much anything, save for software development, to which people still seem pretty split as far as it being a helpful thing. That's before we even factor in the cost.

reply
I find them very helpful. I use gemini regularly for multiply things.

I also believe that whatever code researchers and other non software engineers wrote before coding agents, were similiar shitty but took them a lot longer to write.

Like do you know how many researchers need to do some data analysis and hack around code because they never learned programming? So so many. If they know how to verify their data (which they needed to know before already), a LLM helps them already.

There is also plenty of other code were perfection doesn't matter. Non SaaS software exists.

For security experts, we just saw whats happening. The curl inventor mentioned it online that the newest AI reports for Security issues are real and the amount of security gaps found are real and a lot of work.

Image generation is very good and you can see it today already everywere. From cheap restaurants using it, to invitations, whatsapp messages, social media, advertising.

I have a work collegue, who is in it for 6 years and he studied, he is so underqualified if you give me his salary as tokens today, i wouldn't think for a second to replace him.

reply
I don't particularly care about coding and didn't weigh in on it. There is no dispute that people debate if it is effective at that. You can take that debate up with them, not me.
reply
Companies are starting this year with an agentic layer. We will see how this will affect broader areas
reply
Yeah and every year before there was another poster telling me the next model iteration would be enough.
reply
The problem here is the adoption curve; Right now it might feel to you that its not worth it or not happening as it might for most people.

Than suddenly one model update moves it from 80% to 85% and now 30% of the market wants to use it.

Then it might be already too late to act like using it to your advantage, being a valuable expert or deciding things long term based on the new state of affairs.

reply