upvote
I don't know what you mean by "quantization". I guess you're asking to explain how they measure "performance", but that kind of thing often won't neatly fit in a title.

I do think the phrasing is weird, though. The performance may be improving, but it isn't the thing getting "faster" (e.g. responses to queries might get "faster"). And the dollars aren't getting cheaper; the performance is. "Performance per dollar" is a rate; it is not getting faster or cheaper. It should just say "Performance per dollar is increasing".

reply
They're asking what precision the model parameters were quantized/rounded to, look up "LLM quantization" to learn more.

The old headline was something like "We served GLM5.2 on AMD MI355X at 2626 tok/s/node". They think it's bad to advertise performance numbers like that without specifying how much you quantized the model (Since you can 'cheat' more performance by quantizing more, and not mentioning the reduced output quality)

reply
Its MXFP4
reply
And to use the heading "Why this matters".
reply
A nice filter is checking for the `.ai` in the end. It is very likely slop if you see that. Slop meaning low-effort/clickbait/shallow/useless/scam etc.
reply
triggered the grifters
reply