undefined

points

[-]

>It sounds like you're looking for something more than the simple reality that the math is what's going on.

Train a tiny transformer on addition pairs (i.e i.e '38393 + 79628 = 118021') and it will learn an algorithm for addition to minimize next token error. This is not immediately obvious. You won't be able to just look at the matrix multiplications and see what addition implementation it subscribes to but we know this from tedious interpretability research on the features of the model. See, this addition transformer is an example of a model we do understand.

So those inscrutable matrix multiplications do have underlying meaning and multiple interpretability papers have alluded as much, even if we don't understand it 99% of the time.

I'm very fine with simply saying 'LLMs understand Language' and calling it a day. I don't care for Searle's Chinese Room either. What I'm not going to tell you is that we understand how LLMs understand language.