undefined

points

[-]

LLM inference can be implemented in a way where nondeterminism depends only on the random seed, but that's not common. It ends up being more efficient/easier to implement kernels whose exact results depend on how many other prompts are being processed in parallel. See https://thinkingmachines.ai/blog/defeating-nondeterminism-in... for a pretty extensive exploration.