upvote
> - And whose inference benefits are negated at scale. Transformers are still cheaper if you want to serve lots of users.

But my entire point is about the reverse of this, the context of what I bring up is in single-user scenarios, which is where these diffusion models really make a large difference in performance.

Sure, I agree it's not a good fit for every single use case out there, everywhere. But after starting to play around with it closer myself, I think people are dismissing it a bit too quickly, at least if you're interested in running local models on your own hardware.

reply
I don't think you're really getting the point I'm trying to make. Everyone training llms regularly cares about serving users at scale and quality per compute invested. It's not just about OpenAI or Anthropic or Google. Qwen, Deepseek, Moonshot, whatever. They all care about it very much and basically can't afford to take a step back in those areas.

Since training models is currently a very expensive procedure, diffusion llms are destined to be relegated to the occasional research artifact at best. As things stand, making a serious commitment to them is basically the equivalent of throwing money into a fire pit and things are expensive enough as is.

Alternate Architectures that do a much better job matching transformers in quality have basically gone nowhere but you expect one that is basically worse in every way the labs care about won't ? I'm not trying to 'dismiss' dllms. I'm interested in them for the same reason you are. I'm just stating the factors at play plainly.

reply
Single user scenarios can also use MTP to make auto-regressive inference more compute-intensive with no loss of quality.
reply