upvote
Conceptually setting temperature to be >0 doesn't actually introduce any non-determinism. If your sampler is seeded then it will always choose the same next token. Higher temperature only flattens the logit distribution.
reply
The point of the blog is that even at "supposed" deterministic generative sampling, non-determinism creeps in. This in turn has disastrous effects in very real experiments.
reply
My point is that greedy sampling is not just not sufficient but also not necessary for deterministic inference.
reply