You can try to offload the experts on CPU with llama.cpp (--cpu-moe) and that should give you quite the extra context space, at a lower token generation speed.
All that said you could probably squeeze it onto a 36GB Mac. A lot of people run this size model on 24GB GPUs, at 4-5 bits per weight quantization and maybe with reduced context size.
I wonder though, do Macs have swap, coupled unused experts be offloaded to swap?
[0] https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF?show_fil...
Output after I exit the llama-server command:
llama_memory_breakdown_print: | memory breakdown [MiB] | total free self model context compute unaccounted |
llama_memory_breakdown_print: | - MTL0 (Apple M3 Pro) | 28753 = 14607 + (14145 = 6262 + 4553 + 3329) + 0 |
llama_memory_breakdown_print: | - Host | 2779 = 666 + 0 + 2112 |