Since training models is currently a very expensive procedure, diffusion llms are destined to be relegated to the occasional research artifact at best. As things stand, making a serious commitment to them is basically the equivalent of throwing money into a fire pit and things are expensive enough as is.
Alternate Architectures that do a much better job matching transformers in quality have basically gone nowhere but you expect one that is basically worse in every way the labs care about won't ? I'm not trying to 'dismiss' dllms. I'm interested in them for the same reason you are. I'm just stating the factors at play plainly.