This is literally one of the most knowledgeable person on the topic. I think you are the one that hasn’t peeled enough layers to connect with what they are saying.
If you say so.
> the author has nothing to do with the original comment
Except for the part of the comment that was assuming the author had no idea how this all works, has only used LLMs through API and has never run a local model, you mean?
Not really, LLMs give you a distribution over possible next tokens. You are free to then sample from this distribution how you want. There is no need to hack RNG or whatever, for example you can simply just take a greedy approach and always output the most likely token, in which case the LLM becomes deterministic (mathematically). This is equivalent to setting the temperature to 0.