upvote
I was thining of something like LLaDa that uses a Transformer to predict forward masked tokens:

https://arxiv.org/abs/2502.09992

reply