upvote
yes, with one line change. grab the second code block in the article, that's the test harness rigged up to send all 80 questions and both turns through whatever model you want. find MODEL_ID = "google/gemma-4-E2B-it" and swap it to your huggingface id. run it. we'd love for people to keep testing different models on this. if you run qwen through it let us know what you find, post the results here.

We may beat you to it and we will share if we do lol

reply
It's almost like Qwen 3.5 9B is 4 times larger.
reply
and that 4x difference allows you to use CPUs and much cheaper hardware to achieve the same level of outcome... for free
reply