undefined

points

[-]

Tbf the Sparks usefulness isn’t for inference IMO. Its memory bandwidth is too low for that.

But on the other hand, running Qwen 3.5 122B A10B locally on it using ~110GB of memory and getting 50tk/s generation and quite excellent prefill… I couldn’t do that on many other machines at this price point

For me this has been awesome to learn CUDA on, fine tuning models (until I get it close to what I want then it’s off to H100 or something clusters) and a bit of inference on the side

by verdverm14 hours ago|

prev|

[-]

There are a number of DGX benchmarks for these recent gemma-4 / qwen-3.6 models on the nvidia forum, ex: https://forums.developer.nvidia.com/t/qwen-qwen3-6-35b-a3b-a...