upvote
Of course it requires SOTA, people will always choose better models over some compact thing that is obviously more limited. You can't control the truth with models nobody wants to use.
reply
People choose SOTA right now because of the heavily subsidised model subscriptions. People aren't going to pay 20x the price for a model that's maybe 10% better.
reply
And the fact that "better" is highly subjective and domain/task/vibe-specific
reply
Why do I want the model I use for coding to know Shakespeare or vice versa?
reply
Because you communicate with it using natural language and real-world references and descriptions of what you want, you use emotion and emphasis (especially when re-prompting), you use examples and illustrative stories and common expressions. Understanding and interpreting all of that and replying in kind, to some degree, requires a large body of non-computation, cultural knowledge, or else the prompts are just meaningless words, and the replies will look like compiler output.
reply
That sounds intuitively true, but I’m not convinced that it is actually the case. I don’t think we know enough about neural network training to say what training and how many parameters are necessary for what kind of performance on which tasks. To me it looks like we currently guess that more is better and try to throw as much compute and data at the problem as is economically feasible. There is little incentive for companies to invest into small model research since their moat is huge models that require special hardware to run.
reply
Small models are the future.
reply