undefined

upvote

points

by magicalhippo14 hours ago |

upvote

by ycui75 hours ago|

[-]

You can get 120TPS (144 peak) with Qwen3.6-27B on RTX PRO 6000 with autoround when MTP enabled. It runs faster than sonnet api calls.

5090 gets maybe 100TPS with MTP

reply