undefined

points

by kpw945 hours ago |

comments

by coder5435 hours ago|

[-]

> Wild differences in ELO compared to tfa's graph

Because those are two different, completely independent Elos... the one you linked is for LMArena, not Codeforces.

by nateb20224 hours ago|

prev|

[-]

> Very impressed by unsloth's team releasing the GGUF so quickly, if that's like the qwen 3.5, I'll wait a few more days in case they make a major update.

Same here. I can't wait until mlx-community releases MLX optimized versions of these models as well, but happily running the GGUFs in the meantime!

Edit: And looks like some of them are up!

by culi3 hours ago|

prev|

[-]

You're conflating lmarena ELO scores.

Qwen actually has a higher ELO there. The top Pareto frontier open models are:

  model                        |elo  |price
  qwen3.5-397b-a17b            |1449 |$1.85
  glm-4.7                      |1443 | 1.41
  deepseek-v3.2-exp-thinking   |1425 | 0.38
  deepseek-v3.2                |1424 | 0.35
  mimo-v2-flash (non-thinking) |1393 | 0.24
  gemma-3-27b-it               |1365 | 0.14
  gemma-3-12b-it               |1341 | 0.11
  gpt-oss-20b                  |1318 | 0.09
  gemma-3n-e4b-it              |1318 | 0.03

https://arena.ai/leaderboard/text?viewBy=plot

What Gemma seems to have done is dominate the extreme cheap end of the market. Which IMO is probably the most important and overlooked segment

by coder5431 hours ago|

parent|

[-]

That Pareto plot doesn't seem include the Gemma 4 models anywhere (not just not at the frontier), likely because pricing wasn't available when the chart was generated. At least, I can't find the Gemma 4 models there. So, not particularly relevant until it is updated for the models released today.

by gigatexal4 hours ago|

prev|

[-]

the benchmarks showing the "old" Chinese qwen models performing basically on par with this fancy new release kinda has me thinking the google models are DOA no? what am I missing?