undefined

points

by GeekyBear8 hours ago |

comments

by danudey8 hours ago|

[-]

So throughput was already good but TTFT was the metric that needed more improvement?

by brookst1 minutes ago|

parent|

[-]

Yeah TTFT was terrible. I don’t think it’s unreasonable to benchmark the most-improved metric.

by zamadatix7 hours ago|

parent|

prev|

[-]

To add to the sibling "good is relative" it also depends what you're running, not just your relative tolerances of what good is. E.g. in a MoE the decode speedup means the speed of prompt processing delay is more noticeable for the same size model in RAM.

by convenwis8 hours ago|

parent|

prev|

[-]

Good is relative but first token was clearly the biggest limitation.