upvote
From my own empirical research, the generalized models acting as specialists outperform both the tiny models acting as specialists and the generalist models acting as generalists. It seems that if peak performance is what you're after, then having a broad model act as several specialized models is the most impactful.
reply