$ llama-server --version
version: 8851 (e365e658f)
$ llama-batched-bench -hf unsloth/Qwen3.6-27B-GGUF:IQ4_XS -npp 1000,2000,4000,8000,16000,32000 -ntg 128 -npl 1 -c 34000
| PP | TG | B | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | T s | S t/s |
|-------|--------|------|--------|----------|----------|----------|----------|----------|----------|
| 1000 | 128 | 1 | 1128 | 1.529 | 654.11 | 3.470 | 36.89 | 4.999 | 225.67 |
| 2000 | 128 | 1 | 2128 | 3.064 | 652.75 | 3.498 | 36.59 | 6.562 | 324.30 |
| 4000 | 128 | 1 | 4128 | 6.180 | 647.29 | 3.535 | 36.21 | 9.715 | 424.92 |
| 8000 | 128 | 1 | 8128 | 12.477 | 641.16 | 3.582 | 35.73 | 16.059 | 506.12 |
| 16000 | 128 | 1 | 16128 | 25.849 | 618.98 | 3.667 | 34.91 | 29.516 | 546.42 |
| 32000 | 128 | 1 | 32128 | 57.201 | 559.43 | 3.825 | 33.47 | 61.026 | 526.47 | | PP | TG | B | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | T s | S t/s |
|-------|--------|------|--------|----------|----------|----------|----------|----------|----------|
| 1000 | 128 | 1 | 1128 | 0.684 | 1462.61 | 2.869 | 44.61 | 3.553 | 317.47 |
| 2000 | 128 | 1 | 2128 | 1.390 | 1438.84 | 2.868 | 44.64 | 4.258 | 499.80 |
| 4000 | 128 | 1 | 4128 | 2.791 | 1433.18 | 2.886 | 44.35 | 5.677 | 727.11 |
| 8000 | 128 | 1 | 8128 | 5.646 | 1416.98 | 2.922 | 43.80 | 8.568 | 948.65 |
| 16000 | 128 | 1 | 16128 | 11.851 | 1350.10 | 3.007 | 42.57 | 14.857 | 1085.51 |
| 32000 | 128 | 1 | 32128 | 25.855 | 1237.66 | 3.168 | 40.40 | 29.024 | 1106.96 |
Edit: Model gets stuck in infinite loops at this quantization level. I've also tried Q5_K_M quantization (fits up to 51968 context length), which seems more robust. $ llama-batched-bench -dev ROCm1 -hf unsloth/Qwen3.6-27B-GGUF:IQ4_XS -npp 1000,2000,4000,8000,16000,32000 -ntg 128 -npl 1 -c 34000
| PP | TG | B | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | T s | S t/s |
|-------|--------|------|--------|----------|----------|----------|----------|----------|----------|
| 1000 | 128 | 1 | 1128 | 1.034 | 966.90 | 4.851 | 26.39 | 5.885 | 191.67 |
| 2000 | 128 | 1 | 2128 | 2.104 | 950.38 | 4.853 | 26.38 | 6.957 | 305.86 |
| 4000 | 128 | 1 | 4128 | 4.269 | 937.00 | 4.876 | 26.25 | 9.145 | 451.40 |
| 8000 | 128 | 1 | 8128 | 8.962 | 892.69 | 4.912 | 26.06 | 13.873 | 585.88 |
| 16000 | 128 | 1 | 16128 | 19.673 | 813.31 | 4.996 | 25.62 | 24.669 | 653.78 |
| 32000 | 128 | 1 | 32128 | 46.304 | 691.09 | 5.122 | 24.99 | 51.426 | 624.75 |llama-* version 8889 w/ rocm support ; nightly rocm
llama.cpp/build/bin/llama-batched-bench --version unsloth/Qwen3.6-27B-GGUF:UD-Q8_K_XL -npp 1000,2000,4000,8000,16000,32000 -ntg 128 -npl 1 -c 34000
| PP | TG | B | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | T s | S t/s |
|-------|--------|------|--------|----------|----------|----------|----------|----------|----------|
| 1000 | 128 | 1 | 1128 | 2.776 | 360.22 | 20.192 | 6.34 | 22.968 | 49.11 |
| 2000 | 128 | 1 | 2128 | 5.778 | 346.12 | 20.211 | 6.33 | 25.990 | 81.88 |
| 4000 | 128 | 1 | 4128 | 11.723 | 341.22 | 20.291 | 6.31 | 32.013 | 128.95 |
| 8000 | 128 | 1 | 8128 | 24.223 | 330.26 | 20.399 | 6.27 | 44.622 | 182.15 |
| 16000 | 128 | 1 | 16128 | 52.521 | 304.64 | 20.669 | 6.19 | 73.190 | 220.36 |
| 32000 | 128 | 1 | 32128 | 120.333 | 265.93 | 21.244 | 6.03 | 141.577 | 226.93 |
More directly comparable to the results posted by genpfault (IQ4_XS):llama.cpp/build/bin/llama-batched-bench -hf unsloth/Qwen3.6-27B-GGUF:IQ4_XS -npp 1000,2000,4000,8000,16000,32000 -ntg 128 -npl 1 -c 34000
| PP | TG | B | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | T s | S t/s |
|-------|--------|------|--------|----------|----------|----------|----------|----------|----------|
| 1000 | 128 | 1 | 1128 | 2.543 | 393.23 | 9.829 | 13.02 | 12.372 | 91.17 |
| 2000 | 128 | 1 | 2128 | 5.400 | 370.36 | 9.891 | 12.94 | 15.291 | 139.17 |
| 4000 | 128 | 1 | 4128 | 10.950 | 365.30 | 9.972 | 12.84 | 20.922 | 197.31 |
| 8000 | 128 | 1 | 8128 | 22.762 | 351.46 | 10.118 | 12.65 | 32.880 | 247.20 |
| 16000 | 128 | 1 | 16128 | 49.386 | 323.98 | 10.387 | 12.32 | 59.773 | 269.82 |
| 32000 | 128 | 1 | 32128 | 114.218 | 280.16 | 10.950 | 11.69 | 125.169 | 256.68 | $ llama-batched-bench -dev Vulkan2 -hf unsloth/Qwen3.6-27B-GGUF:IQ4_XS -npp 1000,2000,4000,8000,16000,32000 -ntg 128 -npl 1 -c 34000
| PP | TG | B | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | T s | S t/s |
|-------|--------|------|--------|----------|----------|----------|----------|----------|----------|
| 1000 | 128 | 1 | 1128 | 3.288 | 304.15 | 9.873 | 12.96 | 13.161 | 85.71 |
| 2000 | 128 | 1 | 2128 | 6.415 | 311.79 | 9.883 | 12.95 | 16.297 | 130.57 |
| 4000 | 128 | 1 | 4128 | 13.113 | 305.04 | 9.979 | 12.83 | 23.092 | 178.76 |
| 8000 | 128 | 1 | 8128 | 27.491 | 291.01 | 10.155 | 12.61 | 37.645 | 215.91 |
| 16000 | 128 | 1 | 16128 | 59.079 | 270.83 | 10.476 | 12.22 | 69.555 | 231.87 |
| 32000 | 128 | 1 | 32128 | 148.625 | 215.31 | 11.084 | 11.55 | 159.709 | 201.17 |M2 Ultra, Q8_0
| PP | TG | B | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | T s | S t/s |
|-------|--------|------|--------|----------|----------|----------|----------|----------|----------|
| 512 | 128 | 1 | 640 | 1.307 | 391.69 | 6.209 | 20.61 | 7.516 | 85.15 |
| 1024 | 128 | 1 | 1152 | 2.534 | 404.16 | 6.227 | 20.56 | 8.760 | 131.50 |
| 2048 | 128 | 1 | 2176 | 5.029 | 407.26 | 6.229 | 20.55 | 11.258 | 193.29 |
| 4096 | 128 | 1 | 4224 | 10.176 | 402.52 | 6.278 | 20.39 | 16.454 | 256.72 |
| 8192 | 128 | 1 | 8320 | 20.784 | 394.14 | 6.376 | 20.08 | 27.160 | 306.33 |
| 16384 | 128 | 1 | 16512 | 43.513 | 376.53 | 6.532 | 19.59 | 50.046 | 329.94 |
| 32768 | 128 | 1 | 32896 | 99.137 | 330.53 | 7.081 | 18.08 | 106.218 | 309.70 |
DGX Spark, Q8_0 | PP | TG | B | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | T s | S t/s |
|-------|--------|------|--------|----------|----------|----------|----------|----------|----------|
| 512 | 128 | 1 | 640 | 0.881 | 580.98 | 16.122 | 7.94 | 17.003 | 37.64 |
| 1024 | 128 | 1 | 1152 | 1.749 | 585.43 | 16.131 | 7.93 | 17.880 | 64.43 |
| 2048 | 128 | 1 | 2176 | 3.486 | 587.54 | 16.169 | 7.92 | 19.655 | 110.71 |
| 4096 | 128 | 1 | 4224 | 7.018 | 583.64 | 16.245 | 7.88 | 23.263 | 181.58 |
| 8192 | 128 | 1 | 8320 | 14.189 | 577.33 | 16.427 | 7.79 | 30.617 | 271.75 |
| 16384 | 128 | 1 | 16512 | 29.015 | 564.68 | 16.749 | 7.64 | 45.763 | 360.81 |
| 32768 | 128 | 1 | 32896 | 60.413 | 542.40 | 17.359 | 7.37 | 77.772 | 422.98 |