undefined

points

by slashdave8 hours ago |

comments

by r0fl7 minutes ago|

[-]

All those models and the site is not responsive on mobile. Ironic.

by mpeg6 hours ago|

prev|

[-]

If you look at the ranking breakdown though, Kimi K2.6 has only participated in the last 5 challenges (claude dominated before then) and if you only count those it would be in first place

by Sammi48 minutes ago|

parent|

[-]

It also has a DNF. So it has a high ceiling but also unfortunately a low floor. So using Kimi means accepting high variability of the output.

Personally what I've found that has made coding agents more and more useful over the last year is that they have gotten a higher and higher floor, not that they have gotten a higher and higher ceiling. They were already plenty smart a year ago, it was just that they failed so often and so spectacularly that it made them a liability. Now they have become much more reliable, which is the key thing that has transitioned them into being actually useful. For the most part I don't use them to work on really intellectually difficult tasks. I mostly use them to work on very boring and labor intensive tasks. Most commercial software development work is just boring drudgery like this. Certainly the bulk of what I need them for is. I need them to just not crap their pants all the time while they're at it.

So I'm kinda wary seeing the poor reliability of Kimi.

by SeriousM5 hours ago|

prev|

[-]

The ranking of gold medals only makes sense if all models would gave participate all tests.

DNP = Did not participate

In this regard, kimi got more and better medals than Claude.

by dvfjsdhgfv5 hours ago|

prev|

[-]

Well, the link you provided basically confirms Kimi's dominance.