42 tps for Claude Opus 4.6 https://openrouter.ai/anthropic/claude-opus-4.6
143 tps for GLM 4.7 (32B active parameters) https://openrouter.ai/z-ai/glm-4.7
70 tps for Llama 3.3 70B (dense model) https://openrouter.ai/meta-llama/llama-3.3-70b-instruct
For GLM 4.7, that makes 143 * 32B = 4576B parameters per second, and for Llama 3.3, we get 70 * 70B = 4900B, which makes sense since denser models are easier to optimize. As a lower bound, we get 4576B / 42 ≈ 109B active parameters for Opus 4.6. (This makes the assumption that all three models use the same number of bits per parameter and run on the same hardware.)I'd say Opus is roughly 2x to 3x the price of the top Chinese models to serve, in reality.