upvote
Isn’t your (b) only because of the addition of a random seed?
reply
LLM inference can be implemented in a way where nondeterminism depends only on the random seed, but that's not common. It ends up being more efficient/easier to implement kernels whose exact results depend on how many other prompts are being processed in parallel. See https://thinkingmachines.ai/blog/defeating-nondeterminism-in... for a pretty extensive exploration.
reply