undefined

points

[-]

85% of the compute for the final model is from them, and not the base Kimi model.

[-]

That just means it cost a lot.

Does it perform meaningfully better than the Kimi model given all that extra compute? And proportionally to the amount spent?