It’s currently unsupported on Llama.cpp and vllm doesn’t support GPU+CPU MoE, so unless all of you have an array of DGX Sparks in your bedroom, what’s the secret sauce?!
i don't comprehend why people are in such disbelief at how much better this stuff runs on a mac studio than on NVIDIA hardware with 1/5th the VRAM. look, what can i say? NVIDIA is a bigger rip off than Apple is!