You could run a 4-bit, which is 16-17GB. But, you'd need a smallish context or you'd need to quantize your KV cache. Something like TurboQuant or RotorQuant might help.
32GB is the lower bound for comfortably running this size model. I'd maybe even say 64GB is right-sized, because a 256k context is nice to have for agentic workflows, and that won't fit on a 32GB card without heavy quantization (but I haven't tried TurboQuant or RotorQuant to know what impact it has on memory use for context).
You could also put some of the model into system RAM, but that defeats the purpose of your argument that a 3090 will outperform a Mac Mini or Mac Studio. If part of a dense model is in system RAM, it absolutely will not outperform a recent unified memory device.
An AMD AI Pro R9700 32GB brand new is $1350 right now.
After some tweaking, I had it running faster than the models the 3090 could run, and it could obviously run with higher context limits and bigger models due to the extra vram.
But man, I have never purchased a computer which is more expensive than a decent family car.
https://www.microcenter.com/product/709071/pny-nvidia-rtx-pr...
I know you probably weren't referring to this type of memory in your post, but IMO it might be worth avoiding this term in the future unless you're referring to HBM, the standard.
Also, while memory bandwidth is important, it isn’t the only consideration. Apple’s architecture has memory bandwidth equal to a mid-range consumer GPU, but its GPU speed is much, much worse than, say, a 5080 or 5090. This translates into e.g. much slower time to first token on Mac systems compared to dedicated GPUs.