upvote
The blog post is about LLM non-determinism in the context of serving at scale (variable batch size). The page you link is only about run-to-run determinism implicitly assuming a fixed batch size.
reply