upvote
Tbf the Sparks usefulness isn’t for inference IMO. Its memory bandwidth is too low for that.

But on the other hand, running Qwen 3.5 122B A10B locally on it using ~110GB of memory and getting 50tk/s generation and quite excellent prefill… I couldn’t do that on many other machines at this price point

For me this has been awesome to learn CUDA on, fine tuning models (until I get it close to what I want then it’s off to H100 or something clusters) and a bit of inference on the side

reply
There are a number of DGX benchmarks for these recent gemma-4 / qwen-3.6 models on the nvidia forum, ex: https://forums.developer.nvidia.com/t/qwen-qwen3-6-35b-a3b-a...
reply