Model performance consistency is important not because you want inference determinism (which you can actually get by setting tempetature to zero and applying a static seed). The `another axis of non-determinism` can be illustrated by the question "if I move from openrouter to bedrock, will gpt-5.5 perform the same?", to which the answer is no, at least not necessarily.
This is important because workflows that used to work on one platform might degrade or outright not work on another, even using the same model, which you have to account when deciding which provider to use.
For Anthropic, it can vary based on model and time. For Opus 4.7, Bedrock is the clear winner in TPS by leaps: https://artificialanalysis.ai/models/claude-opus-4-7/provide...