Hacker News
new
past
comments
ask
show
jobs
points
by
magicalhippo
14 hours ago
|
comments
by
ycui7
5 hours ago
|
[-]
You can get 120TPS (144 peak) with Qwen3.6-27B on RTX PRO 6000 with autoround when MTP enabled. It runs faster than sonnet api calls.
5090 gets maybe 100TPS with MTP
reply