upvote
By default CUDA isn't deterministic because of thread scheduling.

The main difference comes from rounding order of reduction difference.

It does make a small difference. Unless you have an unstable floating point algorithm, but if you have an unstable floating point algorithm on a GPU at low precision you were doomed from the start.

reply