undefined

points

[-]

To add to the sibling "good is relative" it also depends what you're running, not just your relative tolerances of what good is. E.g. in a MoE the decode speedup means the speed of prompt processing delay is more noticeable for the same size model in RAM.

by convenwis10 hours ago|

prev|

[-]

Good is relative but first token was clearly the biggest limitation.

by brookst1 hours ago|

prev|

[-]

Yeah TTFT was terrible. I don’t think it’s unreasonable to benchmark the most-improved metric.