undefined

points

[-]

Personally, I've tried to squeeze more tok/s for a single DGX Spark deployment and DeepSeek V4 Flash but only got marginal improvements. There's work to do on fusing kernels and other optimizations that are already on antirez's roadmap so it is not worth duplicating efforts.

I've had positive experiences running GLM 4.7 via vLLM, tool calling works well and the inference is fast. Do you run DeepSeek V4 Flash on vLLM?

by wolttam4 hours ago|

parent|

[-]

Yep, those are the numbers I'm getting with DSv4 Flash on vLLM across 2 sparks.

by doctorpangloss1 hours ago|

prev|

[-]

DeepSeek v4 Flash MTP is a training optimization. It doesn't make inference run faster, it must run the entire model forward as the "verifier." This is in the paper, and this is why the docs they release do not mention using it for accelerated inference.

Eventually, I'm going to stop writing stuff like this @dang, because even though it is literally being read by a human, it's going to just be copy and pasted into a chatbot, which will actually spend the time trying to comprehend what I am saying.

by wolttam54 minutes ago|

parent|

[-]

> MTP in Inference. Our MTP strategy mainly aims to improve the performance of the main model, so during inference, we can directly discard the MTP modules and the main model can function independently and normally. *Additionally, we can also repurpose these MTP modules for speculative decoding to further improve the generation latency.*[1]

(emphasis mine)

> Instead of predicting just the next single token, DeepSeek-V3 predicts the next 2 tokens through the MTP technique. Combined with the framework of speculative decoding (Leviathan et al., 2023; Xia et al., 2023), it can significantly accelerate the decoding speed of the model.[2]

> As DeepSeek-V3, DeepSeek-V4 series also set MTP modules and objectives. Given that the MTP strategy has been validated in DeepSeek-V3, we adopt the same strategy for DeepSeek-V4 series without modification.[3]

[1]: https://arxiv.org/pdf/2412.19437#subsection.2.2

[2]: https://arxiv.org/pdf/2412.19437#subsubsection.5.4.3

[3]: https://arxiv.org/pdf/2606.19348v1#subsection.2.1

Side comment: I feel you may be too cynical towards your fellow commenters.