upvote
> Wild differences in ELO compared to tfa's graph

Because those are two different, completely independent Elos... the one you linked is for LMArena, not Codeforces.

reply
> Very impressed by unsloth's team releasing the GGUF so quickly, if that's like the qwen 3.5, I'll wait a few more days in case they make a major update.

Same here. I can't wait until mlx-community releases MLX optimized versions of these models as well, but happily running the GGUFs in the meantime!

Edit: And looks like some of them are up!

reply
You're conflating lmarena ELO scores.

Qwen actually has a higher ELO there. The top Pareto frontier open models are:

  model                        |elo  |price
  qwen3.5-397b-a17b            |1449 |$1.85
  glm-4.7                      |1443 | 1.41
  deepseek-v3.2-exp-thinking   |1425 | 0.38
  deepseek-v3.2                |1424 | 0.35
  mimo-v2-flash (non-thinking) |1393 | 0.24
  gemma-3-27b-it               |1365 | 0.14
  gemma-3-12b-it               |1341 | 0.11
  gpt-oss-20b                  |1318 | 0.09
  gemma-3n-e4b-it              |1318 | 0.03
https://arena.ai/leaderboard/text?viewBy=plot

What Gemma seems to have done is dominate the extreme cheap end of the market. Which IMO is probably the most important and overlooked segment

reply
That Pareto plot doesn't seem include the Gemma 4 models anywhere (not just not at the frontier), likely because pricing wasn't available when the chart was generated. At least, I can't find the Gemma 4 models there. So, not particularly relevant until it is updated for the models released today.
reply
the benchmarks showing the "old" Chinese qwen models performing basically on par with this fancy new release kinda has me thinking the google models are DOA no? what am I missing?
reply