upvote
Imagine you only know how to cook (use fry pan skill) and know how to cook omelette (recipe). You get the task to cook doner kebab. How many Wikipedia pages do you need to read to get a good understanding? I guess its max 5.

I think grounding your abstract problem to an example makes it more trivial, than it sounds in general.

> How would it know about Wikipedia and when to use it?

2 general concepts "You have to get good understanding of subject area before you do actions" + "Wikipedia is a good source of knowledge of subject areas" will get a model there.

> spawning a baby human, have it spend an (instant) life learning

Humans spend 99% of their life on boring repeating tasks, not learning anything, just navigating on heuristics.

reply
>Doner kebab or döner kebab[a] is a Turkish

(what is turkish)->(parse lots of potentially relevant/irrelevant context because I have no way of knowing which if any of this informs the doner kebab before I've looked at it)

>dish made of meat

(what is meat) -> (parse lots of potentially ir/relevant context because I don't know if the specific origin/chemistry/mechanics or whether maillard reactions are important before I learn about them)

>cooked on a vertical rotisserie.

(what is a rotisserie) -> etc etc etc

Seems significantly less efficient than just having the various (how to cook > meat, tools > rotisserie, how to cook > seasoning > tomato; lettuce; cabbage; onion with sumac; fresh or pickled cucumber or chili; various sauces, etc) just already built in to the weights.

reply
I'm just playing devil's advocate here.

Yes, but still "how to cook" is not atomic. It involves knowing how to move stuff, how to measure, what "cooked" looks like in different environment (i.e. different lighting) or variations in ingredients, how to recover from specific failures (i.e. a good cook can fix accidentally adding too much salt, by counter-balancing with an ingredient that absorbs the extra salt). And this is only one skill.

It's a bit how deep image neural nets work, where simply detecting shape primitives is not enough, the net is also the connection and relation between those primitives.

Even saying, the AI should just have the "cooking" or "coding" skill, trivializes the problem.

> Humans spend 99% of their life on boring repeating tasks

But we are also non-stop unconciously learning about the world non-stop, from the analgous stream of inputs and seeing the immediate result/feedback. Even looking at static picture is like over-training a specific dataset.

reply
Just boiling water would be difficult. Do I just add heat until I see bubbles? Or should I have a world model in which I understand that boiling water will be of varying temperatures at varying altitudes and given different liquids.

Because if the recipe just says "boil for 10 minutes" but the thing being cooked really needs a temperature of 212F for 10 minutes, the thing isn't going to be cooked if you're not actually at 212 for 10.

reply