points
The local inference space is leaning to MoE models, and a lot of them have decent tokens / second, but horrible TTFT.