upvote
You mean Mercury 2, by Inception: https://openrouter.ai/inception/mercury-2
reply
That's completely different. That's like saying you want to compare the Nvidia 5090 GPU to the latest Call of Duty.
reply
Mamba-3 is an architecture while diffusion is, I believe, a type of objective. So these are not mutually exclusive and therefore not comparable.
reply
Not wrong, but I think it's more accurate to say:

Mamba is an architecture for the middle layers of the network (the trunk) which assumes decoding takes place through an autoregressive sequence (popping out tokens in order). This is the SSM they talk about.

Diffusion is an alternative to the autoregressive approach where decoding takes place through iterative refinement on a batch of tokens (instead of one at a time processing and locking each one in only looking forward). This can require different architectures for the trunk, the output heads, and modifications to the objective to make the whole thing trainable. Could mamba like ideas be useful in diffusion networks...maybe but it's a different problem setup.

reply