undefined

points

[-]

In the way that you say, you can do lossless data compression, but then the LLM is used in a very distinct way than it is used in applications like chat or coding assistance.

In the latter applications, you do queries which aim to extract information from the training data set, but which may return hallucinated content instead of correct content.

If you use an LLM just to provide an estimation for the frequencies of tokens in an input data stream, and then you use the estimated frequencies to encode the input data, then you do not care about which were the tokens predicted by the LLM, because they are not used. The worst effect of any wrong predictions by the LLM is a slightly worse data compression ratio than the optimum.

When it is said that LLMs do a lossy data compression, that refers to the compression from the training data set to sequences of output tokens.