I thought we were talking about actual arithmetic not silly puzzles, and there are many human adults that would fail this, nevermind children.
>LLMs can match an example at exactly that trivial level because it can be predicted from context. However, if you construct a more complex example with several rules, especially with rules that have contradictions and have specified logic to resolve conflicts, they fail badly.
Even if that were true (Have you actually tried?), You do realize many humans would also fail once you did all that right ?
>They can't even reliably play Chess or Poker without breaking the rules despite those extremely well-represented in the dataset already, nevermind a made-up set of logical rules.
LLMs can play chess just fine (99.8 % legal move rate, ~1800 Elo)
https://arxiv.org/abs/2403.15498
I don‘t like to throw the word intelligence around, but when we talk about intelligence we are usually talking about human behavior. And there is nothing human about being extremely good at curve fitting in multi parametric space.