undefined

points

[-]

0/10 succesful attempts for mimo v2.5 pro (high) using opencode. It was not able to think bigger than exploiting vectors outside of the API.

However, I felt the prompt was implying that only authenticated API requests are fair game, so I tweaked it slightly to be explicit that all attack vectors are fair game (https://www.diffchecker.com/GsgpuRGP/) and mimo 2.5 non-pro got it first time. I accidentally used openrouter for this test instead of my token plan. I intervened one time to stop it enumerating every document in the database (it would've found the private reviews this way but I didn't want to wait). My intervention was "are you really going to enumerate the whole database?". Final openrouter cost: $0.12

by baldai8 hours ago|

prev|

[-]

They are not even close in capabilities. Only nenchmark I ever seen that captures their difference is DeepSWE. They are worse by factor of 3.

by Cakez0r7 hours ago|

parent|

[-]

Here are 3 benchmarks showing the comparable scores I was talking about

https://openrouter.ai/rankings https://arena.ai/leaderboard/text/coding https://artificialanalysis.ai/

by jona-f2 hours ago|

parent|

prev|

[-]

Wait, the only benchmark you found? It looks like you never heard of confirmation bias before. https://en.wikipedia.org/wiki/Confirmation_bias

by jxmesth12 hours ago|

prev|

[-]

I'd love to see the results for Mimo v2.5 pro, been hearing a lot about it

by Cakez0r12 hours ago|

parent|

[-]

It is totally slept on. In my experience it is cheap, fast and capable (not just capable with caveats, but just as capable as western flagships). My only gripe with it is that sometimes the API seems to timeout which tanks the overall speed of what is otherwise a very fast experience.