undefined

points

by tw198415 hours ago |

[-]

Of course it requires SOTA, people will always choose better models over some compact thing that is obviously more limited. You can't control the truth with models nobody wants to use.

by columnarx314 hours ago|

parent|

[-]

People choose SOTA right now because of the heavily subsidised model subscriptions. People aren't going to pay 20x the price for a model that's maybe 10% better.

by ezst13 hours ago|

parent|

[-]

And the fact that "better" is highly subjective and domain/task/vibe-specific

by adrianN13 hours ago|

parent|

prev|

[-]

Why do I want the model I use for coding to know Shakespeare or vice versa?

by Jare12 hours ago|

parent|

[-]

Because you communicate with it using natural language and real-world references and descriptions of what you want, you use emotion and emphasis (especially when re-prompting), you use examples and illustrative stories and common expressions. Understanding and interpreting all of that and replying in kind, to some degree, requires a large body of non-computation, cultural knowledge, or else the prompts are just meaningless words, and the replies will look like compiler output.

by adrianN8 hours ago|

parent|

[-]

That sounds intuitively true, but I’m not convinced that it is actually the case. I don’t think we know enough about neural network training to say what training and how many parameters are necessary for what kind of performance on which tasks. To me it looks like we currently guess that more is better and try to throw as much compute and data at the problem as is economically feasible. There is little incentive for companies to invest into small model research since their moat is huge models that require special hardware to run.

by Der_Einzige7 hours ago|

parent|

prev|

[-]

This is why: https://www.emergent-misalignment.com/

by rjzzleep13 hours ago|

parent|

prev|

[-]

Small models are the future.