undefined

points

[-]

That arena leaderboard has some questionable results. Anyone who's used these models would know that ranking HiDream above Krea2 is a pretty hot take.

Many of these ELO comparative tests (ArtificialAnalysis is guilty as hell on this as well) also have other problems such as a considerable number of "amateur judges" tending to prioritize aesthetics over actual instruction-following given the prompt.

Also (less a critique of Arena.AI necessarily), but the MAI models are so incredibly locked down (e.g. censored) as to be functionally useless. I have a sneaking suspicion its fallout from Tay.

https://en.wikipedia.org/wiki/Tay_(chatbot)

by shmolyneaux2 hours ago|

prev|

[-]

I definitely appreciated your post about Nano Banana Pro. It's also a genuinely useful time-capsule for how these systems evolve and where they fall short. I've preferred the output of ChatGPT Image 2. I think a post would be very helpful for folks to see what they're missing.