I'm also looking forward for the Cerebras Kimi K2.6 release, which should be even better at 1000 tps. It is hard to overstate how important speed is for programming. Instead of having to wait for a few minutes until a task is done, it is just done instantly, and you don't have to context switch from whatever else you were working on while waiting.
I hope they will make it available to regular customers.
Cerebras also seems to be killing off their regular APIs, they're deprecating models and GLM is still stuck on GLM 4.7, a whole 2 versions behind.
Thanks for the tip, looks fire.