upvote
>Even the most basic questions such as put a ball in a cup and place it on a table upside down then pick up the cup and put it in a box.

That reminds me - this used to be my go-to question for smaller models and on which they would always fail miserably on:

A small strawberry is placed in a large cup. The cup is placed upside down on the kitchen table. Someone then lifts the cup as-is and puts it in the microwave. Where is the strawberry when the cup is in the microwave?

Here's what the 1.9GB VibeThinker-3B-GGUF:Q4_K_M answered:

Answer: The strawberry is still on the kitchen table – it fell out when the cup was turned upside‑down, and the subsequent lift‑and‑microwave move doesn’t change that.

So it seems there is definite progress here. Both specialized and yet improved common sense on things outside its domain of specialization.

reply
Is that learned common sense or has it learned the structure of that particular problem?

What happens if you ask

A small strawberry is placed in a large cup. The cup is placed upside down on a saucer on the kitchen table. Someone then lifts the cup and saucer as-is and puts them in the microwave. Where is the strawberry when the cup is in the microwave?

reply
The hard part was always the number of 'r's
reply
> Even the most basic questions such as put a ball in a cup and place it on a table upside down then pick up the cup and put it in a box.

I do not think this is a great example. First, it is not a question. Second, it seems very related to robotics. A model itself cannot put a ball anywhere, it can just call tools and answer in text, image, etc.

An LLM seeing "put a x in a y and place it on a z upside down then pick up the y and put it in a z2." and then a question about what happens could check a rag for properties of those x,y,z,z2 and still answer. Alternatively, this could be useful for coding, for example. And that is a very extreme example. Some basic language plus tool use could go quite far. I think it is a very interesting direction vs here is a gpu the price of a car.

reply
I wasn't explicitly stating the question, It was paraphrasing a common test question for world knowledge.

That you don't need to have a ball, cup, table, or even the ability to perform physical actions in order to consider where the ball ends up is in-itself required knowledge.

reply
The thing is we tried that for decades, using more formal logic to build reasoning engines. And we never got it to be even a fraction as good and generic as learning-based LLMs are today.
reply
I dont think think my point is getting across. This is in the context of how much world knowledge a model needs to be trained on, not llm vs not llm.
reply