undefined

points

[-]

> "GPUs don't do deterministic matrix multiplications" is the biggest source of randomness in LLMs.

But this isn't a fundamental property of LLMs, it's just an implementation detail. It's pretty obvious that if you evaluate the matrix multiplications correctly and deterministically sample from the highest-probability outputs, you will have a deterministic LLM.

by vbarrielle1 days ago|

parent|

[-]

It may be an implementation detail, but in practice, if the only way to get a deterministic output is to run on the CPU, then it's not going to be usable.

by 3170701 days ago|

parent|

[-]

Actually, Google's TPUs are also deterministic!

by Dylan168071 days ago|

parent|

prev|

[-]

You can tell GPUs what order to do math instructions in.

by EvgeniyZh1 days ago|

prev|

[-]

You don't have to sample uniformly. You could take the lowest index of all maxima. But yeah, the main source of randomness is non-deterministic matmul, and temperature does nothing with it

by DougBTX1 days ago|

prev|

[-]

> GPUs put the associativity of the sums in matrix multiplications in arbitrary order

That’s user-controlled too, not an inherent property of GPUs:

https://docs.pytorch.org/docs/2.12/generated/torch.use_deter...

by vbarrielle1 days ago|

parent|

[-]

The matrix multiplication is only deterministic for sparse-dense products under these settings:

> torch.bmm() when called on sparse-dense CUDA tensors

And it's not listed under the operations that raise an exception otherwise, so I'm not sure the docs promise that dense-dense matrix-matrix products are deterministic.

by DougBTX1 days ago|

parent|

[-]

Oh, thanks, that’s interesting, I thought it covered that too!