upvote
That prefill number isn't right. M4 Max hits 200-300: https://github.com/antirez/ds4/blob/main/speed-bench/m4_max_...
reply
M5 studio is gonna sell like hot cakes
reply
Hah, that's because the prompt itself was only about 30 tokens. We need a much bigger prompt to properly test PP.
reply
if it's just the coding agent system prompt and tools, you can cache that
reply
Yeah the problem is that's just the start of the context. There's, you know, all the tool call results and file reads and stuff.
reply
deleted
reply