I don't work this way, so this is all a hypothetical to me, but the possibility space is larger than _any_ model can handle; models are effectively applying a really complex prior over a giant combinatorial space. I think the idea behind a swarm of small models (probably with higher temperature?) on a well-defined problem is akin to e.g. multi-chain MCMC.
reply