upvote
on the Will It Mythos benchmark, small models are punching way above their weight(s)

gemma4-26B (#7)

qwen-3.6-27B (#9)

https://news.ycombinator.com/item?id=48640196

reply
I've tried running qwen 3.6 locally and it felt like LLMs a year ago where you can get them to do some stuff but the tasks have to be very small and you have to course correct them a lot to the point it's hard to say it's any faster than doing it all yourself.

Certainly the gap is closing but I feel it still makes more sense to pay pennies to run the full sized open models hosted on much better hardware.

reply
I had qwen36moe revamp my PhD thesis with a rewrite using JAX. Gave it access to my old code, helpednitnwhen it got stuck or didn't quite understand a few times.

Overall I was very impressed with its open box reimplementation. I remain of the mind they are widely underrated.

reply