2x3090 (has an nvlink bridge though it didn't seem to matter hugely for inference)
Qwen 3.6 27b int4: Concurrency 1: 68 tok/s output Concurrency 32: 363 tok/s output Prompt processing speed: 1520 tok/s
Qwen 3.6 35ba3b int4: Concurrency 1: 150 tok/s output Concurrency 32: 1083 tok/s output Prompt processing speed: 4324 tok/s
Macbook Pro m3 36gb RAM: Qwen 3.6 27b int4: Concurrency 1: 18 tok/s output didn't measure the other metrics and it was a slightly different benchmark.