undefined

points

[-]

>They still make these errors on anything that is out of distribution. There is literally a post in this thread linking to a chat where Sonnet failed a basic arithmetic puzzle: https://news.ycombinator.com/item?id=47051286

I thought we were talking about actual arithmetic not silly puzzles, and there are many human adults that would fail this, nevermind children.

>LLMs can match an example at exactly that trivial level because it can be predicted from context. However, if you construct a more complex example with several rules, especially with rules that have contradictions and have specified logic to resolve conflicts, they fail badly.

Even if that were true (Have you actually tried?), You do realize many humans would also fail once you did all that right ?

>They can't even reliably play Chess or Poker without breaking the rules despite those extremely well-represented in the dataset already, nevermind a made-up set of logical rules.

LLMs can play chess just fine (99.8 % legal move rate, ~1800 Elo)

https://arxiv.org/abs/2403.15498

https://arxiv.org/abs/2501.17186

https://github.com/adamkarvonen/chess_gpt_eval

by runarberg4 hours ago|

parent|

[-]

I still have not been convinced otherwise that LLMs are just super fancy (and expensive) curve fitting algorithms.

I don‘t like to throw the word intelligence around, but when we talk about intelligence we are usually talking about human behavior. And there is nothing human about being extremely good at curve fitting in multi parametric space.