undefined

points

by simonw13 hours ago |

comments

by embedding-shape12 hours ago|

[-]

Comparison with a RTX Pro 6000, with DeepSeek-V4-Flash-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2-imatrix.gguf:

prefill: 121.76 t/s, generation: 47.85 t/s

Main target seems to be Apple's Metal, so makes sense. Might be fun to see how fast one could make it go though :) The model seems really good too, even though it's in IQ2.

by xienze12 hours ago|

prev|

[-]

I don't want to be a jerk but 31t/s prefill is basically unusable in an agentic situation. A mere 10k in context and you're sitting there for 5+ minutes before the first token is generated.

by fgfarben11 hours ago|

parent|

[-]

That prefill number isn't right. M4 Max hits 200-300: https://github.com/antirez/ds4/blob/main/speed-bench/m4_max_...

by hadlock9 hours ago|

parent|

[-]

M5 studio is gonna sell like hot cakes

by throwdbaaway5 hours ago|

parent|

prev|

[-]