Hacker News
new
past
comments
ask
show
jobs
points
by
dd8601fn
5 hours ago
|
comments
by
airstrike
4 hours ago
|
[-]
I suspect they quantize them, reduce thinking budgets, batch more requests, or all of the above.
reply
by
lwarfield
2 hours ago
|
parent
|
[-]
There's also lowering the number of experts you run in MoE models.
reply