Or maybe the author has been running heavily quantized small models all that time — Gemma 4 gguf he's using is Q4 and only 16 GB. In my experience quants like this tend to perform much worse.
I also recommend anyone with a GB10 device to go try out the spark-vllm-docker setup, and check the Nvidia GB10 forums for the recently released optimised Qwen 3.5 122B A10B setup: 50tk/s is quite impressive for a decent local model!