upvote
Okay, but not all results on there are valid, ForgeCode for instance has been cheating in the past:

https://debugml.github.io/cheating-agents/#sneaking-the-answ...

reply
Those benches are completely and totally meaningless when it comes down to real world work tasks, and everyone knows it.
reply