Some amount of knowledge is required for reasoning. Maybe such model can dynamically knowledge domains to have taxonomy. For example, model can't effective reason about development task, if it has no knowledge about development best practices. But population of New York or recipies can definitely be loaded run time with tools.
This is the root of problem. If you think about STEM universities, they don't really teach you things you need in the real world. They teach you what you need to know in order to go out there and accumulate the necessary information which can then be used to solve problems. Giving a person access to the internet or a super powerful calculator (like Mathematica) won't mean that they can do anything useful. They need tons of experience to use these tools in an effective way. That experience is basically all that implicit adjacent knowledge that we pick up along the way getting our degrees. And LLMs pick that up during pre-training. Drop this part and the outcome will be worthless.
In my school, math teacher was giving me prose, which I was converting to math notation. I could argue, that this prose→reasoning conversion is not required at training, and can be obtained at inference time with search tools.
Our computers can already do everything, have access to all the tools and information, yet they still need a human/intelligence to use it and apply to specific problems.
Even defining the problem requires knowledge.
As for the tools, if the model has access to 1000 tools, how would it know which one to use if it doesn't have any knowledge itself?
What if I ask for "table tennis spin" it had a "magnus effect calculator", how would it know to make the connection between the two?
This is only one example, plus if the topic is more complex, maybe it had to search/learn everything (what is table tennis, what is spin, what is a human, what is a ball), etc. So it would be like spawning a baby human, have it spend an (instant) life learning about the world before providing an answer. Maybe this could work in 10 yesrs, if models get stronger with huge context lengths and almost instant data retrieval. Is it the best way to go about things though? Most animals have most of their core abilities embedded in their DNA and "instincts". A cat doesn't have to learn what a bird is in order to hunt it, it's already "embedded" in its neural pathways, or even deeper, at a full-body level. Those type of systems are a lot more efficient than the learned ones. Maybe the best future AI, will have everything already embedded, instead of just being a strong reasoning machine. All AI responses should be instant and like "reflexes" instead of reasoned steps.
I think grounding your abstract problem to an example makes it more trivial, than it sounds in general.
> How would it know about Wikipedia and when to use it?
2 general concepts "You have to get good understanding of subject area before you do actions" + "Wikipedia is a good source of knowledge of subject areas" will get a model there.
> spawning a baby human, have it spend an (instant) life learning
Humans spend 99% of their life on boring repeating tasks, not learning anything, just navigating on heuristics.
(what is turkish)->(parse lots of potentially relevant/irrelevant context because I have no way of knowing which if any of this informs the doner kebab before I've looked at it)
>dish made of meat
(what is meat) -> (parse lots of potentially ir/relevant context because I don't know if the specific origin/chemistry/mechanics or whether maillard reactions are important before I learn about them)
>cooked on a vertical rotisserie.
(what is a rotisserie) -> etc etc etc
Seems significantly less efficient than just having the various (how to cook > meat, tools > rotisserie, how to cook > seasoning > tomato; lettuce; cabbage; onion with sumac; fresh or pickled cucumber or chili; various sauces, etc) just already built in to the weights.
Yes, but still "how to cook" is not atomic. It involves knowing how to move stuff, how to measure, what "cooked" looks like in different environment (i.e. different lighting) or variations in ingredients, how to recover from specific failures (i.e. a good cook can fix accidentally adding too much salt, by counter-balancing with an ingredient that absorbs the extra salt). And this is only one skill.
It's a bit how deep image neural nets work, where simply detecting shape primitives is not enough, the net is also the connection and relation between those primitives.
Even saying, the AI should just have the "cooking" or "coding" skill, trivializes the problem.
> Humans spend 99% of their life on boring repeating tasks
But we are also non-stop unconciously learning about the world non-stop, from the analgous stream of inputs and seeing the immediate result/feedback. Even looking at static picture is like over-training a specific dataset.
Because if the recipe just says "boil for 10 minutes" but the thing being cooked really needs a temperature of 212F for 10 minutes, the thing isn't going to be cooked if you're not actually at 212 for 10.
E.g. you put a graph in its content window, and you ask it to find a Hamiltonian cycle, can it do it?
Probably this could be a next step in the future for more powerful AIs, a layer that abstracts the facts in its content window away, and a layer that solves this types of abstractions.
If "all the knowledge" is what our models now do, what exactly would be the most extreme "none of the knowledge +search" ?
> language specifications.
It would load in all the knowledge to figure it what "language" means, then it would continue trying to decode what "specifications" means.
That might sound absurd, but to figure out the population of New York It's either: Just going to google it, or derive from primary sources.
But how is it ever going to interpret the primary sources? It needs to understand the question, how complex a question is, and how complete an answer is and how things relate. Thats just _too_ much language.
There might be a way to compact this down into a LLM-native language such that the request of `the population of New York` or `use best practices` is encoded without our messy human language for a reasoning model to work with, but the encoding itself has to be done by the "all the knowledge" llm. Now it seems we just rebuild something related to MoE with extra step afaict.
Turns out that without the world knowledge to have a base of facts, it is not.
So I don't think it's true that relevant knowledge was deprioritized. At least it wasn't supposed to be.
First, if you know nothing you don't even know what you're missing or what to search for.
Then, without unlimited context, you have to do research for every task all over again every time.
RAG on the initial prompt would be the first thing to try.
> Then, without unlimited context, you have to do research for every task all over again every time.
Thing is, we're really really good at building very fast search engines. Doing research all over again every time shouldn't be a problem.