upvote
You do understand that the mechanism through which an auto-regressive transformer works (predicting one token at a time) is completely unrelated to how a model with that architecture behaves or how it's trained, right? You can have both:

- An LLM that works through completely different mechanisms, like predicting masked words, predicting the previous word, or predicting several words at a time.

- A normal traditional program, like a calculator, encoded as an autoregressive transformer that calculates its output one word at a time (compiled neural networks) [1][2]

So saying "it predicts the next word" is a nothing-burger. That a program calculates its output one token at a time tells you nothing about its behavior.

[1] https://arxiv.org/pdf/2106.06981

[2] https://wengsyx.github.io/NC/static/paper_iclr.pdf

reply
> So saying "it predicts the next word" is a nothing-burger. That a program calculates its output one token at a time tells you nothing about its behavior.

Well it does - it tells me it is utterly un-reliable, because it does not understand anything. It just merely goes on, shitting out a nice pile of tokens that placed one after another kind of look like coherent sentences but make no sense, like "you should absolutely go on foot to the car wash". A completely logical culmination of Bill Gates' idiotic "Content is King" proclamation of 20 years ago.

reply
No, you can't know that the output of a program is unreliable just from the fact that it outputs one words at a time. I already told you that you can perfectly compile a normal program, like a calculator, into the weights of an autoregressive transformer (this comes from works like RASP, ALTA, tracr, etc). And with this I don't mean it in the sense of "approximating the output of a calculator with 99.999% accuracy", I mean it in the sense of "it deterministically gives exactly the same output as a calculator 100% of the time for all possible inputs".
reply
> No, you can't know that the output of a program is unreliable just from the fact that it outputs one words at a time

Yes I can, and it shows everytime the "smart" LLMs suggest us to take a walk to the carwash or suggests 1.9 < 1.11 etc...

reply