upvote
> All of the optimizations Deepseek have done are in software and it goes down to the PTX assembly level

DeepSeek are still using NVIDIA (PTX) to train on, but for inference have already transitioned to Huawei Ascend chips, and inference speed is what this paper is addressing.

reply
> Compared to Anthropic who are celebrating in fixing a flickering issue in a terminal app which took months to fix.

It's funny, because if you ran Claude Code on a slow terminal, the cause of the flicker was obvious: They kept dumping the entire history of the chat back into the terminal in a number of situations, and relied on the terminal to them end up in the correct state.

reply
Anthropic almost certainly also has optimized software down to the assembly level, considering this take-home interview challenge they published: https://github.com/anthropics/original_performance_takehome/... which is all about instruction-level performance optimizations. That they don't prioritize UI fixes just means they consider other things more important.
reply
Unlikely: that product is written completely by AI, of which they are not lacking.

More likely is that an AI generated codename is impossible to fix by humans, and SOTA was not able to figure it out until now.

reply
that's pretty silly to use as a measure of what they do internally
reply
It's pretty representative of what they do internally
reply
All frontier labs are working down to the PTX level (and lower)
reply