I can run qwen 3.6 35B on my gaming PC at around 50 tok/s and other than power cost of a tiny bit extra per month, it's hardware I already owned from years ago.
I'm not really sure why qwen 3.6 35B is so expensive on openrouter, it seems abnormally high for what hardware it takes to run it.
I'm trying to go the same route, but I have a 5070Ti with only 16GB VRAM (I bought it for gaming) and I'm not sure how to run anything decent on it. I have 64 GB RAM if that matters
The main thing in LM studio (or whatever software you use, assuming it has fairly up to date stuff and exposes the toggles) is to offload MoE layers to the CPU, and use K/V cache quantization at Q8_0 or Q4_0.
Since you have more VRAM than I do, you could probably get away with MoE offload of like 15-20 so some remains on the GPU.
Just make sure GPU offload is turned all the way up. And I use 64k context size, although with 16GB VRAM you can probably do more.
You can find the best performance spot by playing with MoE offload until you find the number that gives the highest tok/s on your hardware.
https://www.reddit.com/r/LocalLLaMA/comments/1t9eo83/running...