upvote
I don't think the MoE part has anything to do with it, but the current gen of multimoddal models can do thinking interleaved with autoregressive(?*) image-gen so it's probably not long before they bake this into the RL process, same way native thought obviated need for "think carefully step by step" prompts.
reply
LLMs are rather devolving at this point.
reply