> For those of us a bit crazy, we are running KimiK2.6, GLM5.1
Yes, those can compare to Opus, but you can't run those unquantized for less than $400k in hardware.
A single M3 maxed can run a Q2 Kimi 2.6, though thats with a hardly degraded perplexity.
2x M3s with RDMA can run a lossless Kimi2.6 at Q4, but with CPU only you would get okayish decode but horrible (+1m) TTFT, that wouldnt be a great _interactive_ experience.
If you believe what you read here, the gap is closing fast.