undefined

points

[-]

I benchmarked mine for a deep research workload I was running. Concurrency 1 is the speed you'd get if you're chatting with one agent,

2x3090 (has an nvlink bridge though it didn't seem to matter hugely for inference)

Qwen 3.6 27b int4: Concurrency 1: 68 tok/s output Concurrency 32: 363 tok/s output Prompt processing speed: 1520 tok/s

Qwen 3.6 35ba3b int4: Concurrency 1: 150 tok/s output Concurrency 32: 1083 tok/s output Prompt processing speed: 4324 tok/s

Macbook Pro m3 36gb RAM: Qwen 3.6 27b int4: Concurrency 1: 18 tok/s output didn't measure the other metrics and it was a slightly different benchmark.