undefined

points

[-]

I am quite certain.

The output is "just tokens"; the "position encodings" and "context" are inputs to the LLM function, not outputs. The information that a token can carry is bounded by the entropy of that token. A highly predictable token (given the context) simply can't communicate anything.

Again: if a tiny language model or even a basic markov model would also predict the same token, it's a safe bet it doesn't encode any useful thinking when the big model spits it out.

by Chance-Device5 hours ago|

parent|

[-]

I just don’t share your certainty. You may or may not be right, but if there isn’t a result showing this, then I’m not going to assume it.