This model is a "dense" model. It will be much slower on macs. Concretely, on a M4 Pro, at Q6 gguf, it was ~9tok/s for me. 35-A3B (at Q4, with mlx, so not a fair comparison) was ~70 tok/s by comparison.
In general dedicated GPUs tend to do better with these kinds of "dense" models, though this becomes harder to judge when the GPU does not have enough VRAM to keep the model fully resident. For this model, I would expect if you have >=24GB VRAM you'd be fine, e.g. an NVIDIA {3,4,5}090-type thing.