upvote
I think it's fundamentally not useful as long as there are other open source model releases. E.g. suppose you make SotA model at a particular size via decentralized training. Amazing. In a month Qwen/Deepseek/etc release a new model which is better. So why would you use the "decentralized one"?

Models have limited shelf live while things are improving rapidly, and decentralized training is just more wasteful.

However, things might change if we get to what Karpathy calls "cognitive core" - a stable model backbone which can be extended via skills/adapters/etc. Development of extensions to the core can be a lot more decentralized.

But for now these decentralized training attempts function largely as a deterrent to anti-open-source collusion

reply