upvote
Not really. Unlike with physical goods like batteries, the hardware for training a diffusion vs an autoregressive language model is more or less exactly the same.

Although the lab that did this research (Chris Re and Tri Dao are involved) is run by the world's experts in squeezing CUDA and Nvidia hardware for every last drop of performance.

At the API level, the primary differences will be the addition of text infill capabilities for language generation. I also somewhat expect certain types of generation to be more cohesive (e.g. comedy or stories where you need to think of the punchline or ending first!)

reply
Same with digital vs analog
reply
Digital came later but beat analog at almost everything?
reply