undefined

points

[-]

Difficulty of scaling is not the only issue. Nobody is going to be particularly invested in scaling an architecture that has:

- consistently proven behind their auto-regressive counterparts in quality. Look at the dgemma benchmarks - pretty steep dropoffs and the more difficult the benchmark the worse the dropoff. That's not a good look and it's not like its some artifact of google's release. Every dllm is like this.

- And whose speed benefits are negated at scale (Google themselves say Diffusion-Gemma was more expensive to serve at scale).

Put yourself in the shoes of all the labs, even open source ones. Why would this be anything more than a passing experiment?