upvote
I worked on it for a more specialized task (query rewriting). It’s blazing fast.

A lot of inference code is set up for autoregressive decoding now. Diffusion is less mature. Not sure if Ollama or llama cpp support it.

reply
Did you publish anything you could link wrt. query rewriting?
reply
How was the quality?
reply
Quality was about the same. I will say it was a pain to train since it isn’t as popular and there isn’t out of the box support.
reply
Interesting, thanks! That's pretty cool though!
reply
Based on my experience running diffusion image models I really hope this isn't going to take over anytime soon. Parallel decoding may be great if you have a nice parallel gpu or npu but is dog slow for cpus
reply
Because diffusion models have a substantially different refining process, most current software isn't built to support it. So I've also been struggling to find a way to play with these models on my machine. I might see if I can cook something up myself before someone else does...
reply