upvote
>LLMs are language models, something being a transformer or next-state predictors does not make it a language model. You can also have e.g. convolutional language models or LSTM-based language models. This is a basic point that anyone with any proper understanding of these models would know.

'Language Model' has no inherent meaning beyond 'predicts natural language sequences'. You are trying to make it mean more than that. You can certainly make something you'd call a language model with convolution or LSTMs, but that's just a semantics game. In practice, they would not work like transformers and would in fact perform much worse than them with the same compute budget.

>Even if you disagree with these semantics, the major LLMs today are primarily trained on natural language.

The major LLMs today are trained on trillions of tokens of text, much of which has nothing to do with language beyond the means of communication, millions of images and million(s) of hours of audio.

The problem as I tried to explain is that you're packing more meaning into 'Language Model' than you should. Being trained on text does not mean all your responses are modelled via language as you seem to imply. Even for a model trained on text, only the first and last few layers of a LLM concerns language.

reply
You clearly have no idea about the basics of what you are talking about (as do almost all people that can't grasp the simple distinctions between transformer architectures vs. LLMs generally) and are ignoring most of what I am saying.

I see no value in engaging further.

reply
>You clearly have idea about the basics of what you are talking about (as do almost all people that can't grasp the simple distinctions between transformer architectures vs. LLMs generally)

Yeah I'm not the one who doesn't understand the distinction between transformers and other potential LM architectures if your words are anything to go by, but sure, feel free to do whatever you want regardless.

reply