In the latter applications, you do queries which aim to extract information from the training data set, but which may return hallucinated content instead of correct content.
If you use an LLM just to provide an estimation for the frequencies of tokens in an input data stream, and then you use the estimated frequencies to encode the input data, then you do not care about which were the tokens predicted by the LLM, because they are not used. The worst effect of any wrong predictions by the LLM is a slightly worse data compression ratio than the optimum.
When it is said that LLMs do a lossy data compression, that refers to the compression from the training data set to sequences of output tokens.