I take "you can't have human-level intelligence without roughly the same number of parameters (hundreds of trillions)" as a null hypothesis: true until proven otherwise.
It's unfair to take some high number that reflects either disagreement, or assumes that size-equality has a meaning.
> level of quality
What is quality, though? What is high quality, though? Do MY FELLOW HUMANS really know what "quality" is comprised of? Do I hear someone yell "QUALITY IS SUBJECTIVE" from the cheap seats?
I'll explain.
You might care about accuracy (repetition of learned/given text) more than about actual cognitive abilities (clothesline/12 shirts/how long to dry).
From my perspective, the ability to repeat given/learned text has nothing to do with "high quality". Any idiot can do that.
Here's a simple example:
Stupid doctors exist. Plentifully so, even. Every doctor can pattern-match symptoms to medication or further tests, but not every doctor is capable of recognizing when two seemingly different symptoms are actually connected. (simple example: a stiff neck caused by sinus issues)
There is not one person on the planet, who wouldn't prefer a doctor who is deeply considerate of the complexities and feedback-loops of the human body, over a doctor who is simply not smart enough to do so and, thus, can't. He can learn texts all he wants, but the memorization of text does not require deeper understanding.
There are plenty of benefits for running multiple models in parallel. A big one is specialization and caching. Another is context expansion. Context expansion is what "reasoning" models can be observed doing, when they support themselves with their very own feedback loop.
One does not need "hundred" small models to achieve whatever you might consider worthy of being called "quality". All these models can not only reason independently of each other, but also interact contextually, expanding each other's contexts around what actually matters.
They also don't need to learn all the information about "everything", like big models do. It's simply not necessary anymore. We have very capable systems for retrieving information and feeding them to model with gigantic context windows, if needed. We can create purpose-built models. Density/parameter is always increasing.
Multiple small models, specifically trained for high reasoning/cognitive capabilities, given access to relevant texts, can disseminate multiple perspectives on a matter in parallel, boosting context expansion massively.
A single model cannot refactor its own path of thoughts during an inference run, thus massively increasing inefficiency. A single model can only provide itself with feedback one after another, while multiple models can do it all in parallel.
See ... there's two things which cover the above fundamentally:
1. No matter how you put it, we've learned that models are "smarter" when there is at least one feedback-loop involved.
2. No matter how you put it, you can always have yet another model process the output of a previously run model.
These two things, in combination, strongly indicate that multiple small, high-efficiency models running in parallel, providing themselves with the independent feedback they require to actually expand contexts in depth, is the way to go.
Or, in other words:
Big models scale Parameters, many small models scale Insight.
But a smart person who hasn’t read all the texts won’t be a good doctor, either.
Chess players spend enormous amounts of time studying openings for a reason.
> Multiple small models, specifically trained for high reasoning/cognitive capabilities, given access to relevant texts
So, even assuming that one can train a model on reasoning/cognitive abilities, how does one pick the relevant texts for a desired outcome?
Repeat ad-nauseam.
I wish the people who quote the blog post actually read it.