points
85% of the compute for the final model is from them, and not the base Kimi model.
Does it perform meaningfully better than the Kimi model given all that extra compute? And proportionally to the amount spent?