undefined

points

[-]

I don't know what you mean by "quantization". I guess you're asking to explain how they measure "performance", but that kind of thing often won't neatly fit in a title.

I do think the phrasing is weird, though. The performance may be improving, but it isn't the thing getting "faster" (e.g. responses to queries might get "faster"). And the dollars aren't getting cheaper; the performance is. "Performance per dollar" is a rate; it is not getting faster or cheaper. It should just say "Performance per dollar is increasing".

by hmry34 minutes ago|

parent|

[-]

They're asking what precision the model parameters were quantized/rounded to, look up "LLM quantization" to learn more.

The old headline was something like "We served GLM5.2 on AMD MI355X at 2626 tok/s/node". They think it's bad to advertise performance numbers like that without specifying how much you quantized the model (Since you can 'cheat' more performance by quantizing more, and not mentioning the reduced output quality)

by ahmadyan16 hours ago|

prev|

[-]

Its MXFP4

by IshKebab9 hours ago|

prev|

[-]

And to use the heading "Why this matters".

by ozgrakkurt9 hours ago|

prev|

[-]

A nice filter is checking for the `.ai` in the end. It is very likely slop if you see that. Slop meaning low-effort/clickbait/shallow/useless/scam etc.

by 484849497 hours ago|

parent|

[-]

triggered the grifters