undefined

points

[-]

I was also surprised by this sentence. It sounds like this is the author's first attempt at running models locally.

Or maybe the author has been running heavily quantized small models all that time — Gemma 4 gguf he's using is Q4 and only 16 GB. In my experience quants like this tend to perform much worse.

by nphard851 days ago|

prev|

[-]

To be fair, the author does mention the huge difference between Gemma 3 and Gemma 4 on Tau function calling benchmark.

by girvo2 days ago|

prev|

[-]

This entire article reads like AI slop anyway.

I also recommend anyone with a GB10 device to go try out the spark-vllm-docker setup, and check the Nvidia GB10 forums for the recently released optimised Qwen 3.5 122B A10B setup: 50tk/s is quite impressive for a decent local model!