upvote
Eh I think the small model thing is kind of a no-go.

Reason being is that many workloads for AI are dynamically mixed, where training from multiple subjects comes into play and you just can't know exactly what mix will be required for each task ahead of time.

I was hoping loras would do this for us as well but they don't really seem to have worked out for llms (compared to in the image/video diffusion space).

Perhaps some future model will have some sort of "core" that can load/unload portions of itself dynamically at runtime. Like go for a very horizontal architecture/hundreds of MoE and unload/load those paths/weights once a parent value meets or exceeds some minimum, hmmm.

reply