upvote
0/10 succesful attempts for mimo v2.5 pro (high) using opencode. It was not able to think bigger than exploiting vectors outside of the API.

However, I felt the prompt was implying that only authenticated API requests are fair game, so I tweaked it slightly to be explicit that all attack vectors are fair game (https://www.diffchecker.com/GsgpuRGP/) and mimo 2.5 non-pro got it first time. I accidentally used openrouter for this test instead of my token plan. I intervened one time to stop it enumerating every document in the database (it would've found the private reviews this way but I didn't want to wait). My intervention was "are you really going to enumerate the whole database?". Final openrouter cost: $0.12

reply
They are not even close in capabilities. Only nenchmark I ever seen that captures their difference is DeepSWE. They are worse by factor of 3.
reply
Here are 3 benchmarks showing the comparable scores I was talking about

https://openrouter.ai/rankings https://arena.ai/leaderboard/text/coding https://artificialanalysis.ai/

reply
Wait, the only benchmark you found? It looks like you never heard of confirmation bias before. https://en.wikipedia.org/wiki/Confirmation_bias
reply
I'd love to see the results for Mimo v2.5 pro, been hearing a lot about it
reply
It is totally slept on. In my experience it is cheap, fast and capable (not just capable with caveats, but just as capable as western flagships). My only gripe with it is that sometimes the API seems to timeout which tanks the overall speed of what is otherwise a very fast experience.
reply