upvote
AGI’s 'general' is the wrong word, I thinkg. Humans aren’t general, we’re jagged. Strong in some areas, weak in others, and already surpassed in many domains.

LLM are way past us at languages for instance. Calculators passed us at calculating, etc.

reply
We don't call a calculator intelligent.

A calculator is extremely useful, but it is not intelligent.

A computer is extremely useful, but it is not intelligent.

Airplanes don't have wings, but they're damn sure useful, and also not intelligent.

If LLMs cannot learn to beat not-that-difficult of games better than young teens, they are not intelligent.

They are extremely useful. But they are not AGI.

Words matter.

reply
> If LLMs cannot learn to beat not-that-difficult of games better than young teens, they are not intelligent.

I agree, with unresolved questions. Does it count if the LLM writes code which trains a neural network to play the game, and that neural network plays the game better than people do? Does that only count if the LLM tries that solution without a human prompting it to do so?

reply
I disagree that LLMs cannot solve "unsolved problems." This is already happening, and at fundamental mathematical and medical levels (the fields that are the most demanding when it comes to quality).

The idea that we haven't taught LLMs to come up with new answers... That doesn't even sound plausible. Just crank up the temperature, and an LLM will throw out so many ideas you'll exhaust yourself trying to sort through them.

So what haven't we taught LLMs?

- Have we not taught them to "filter"? We just haven't equipped them with experience and intuition, because we only feed them either "absolute fakes" or "verified facts." We don't feed them the actual path of problem-solving and research; those datasets simply don't exist.

- Have we not taught them to "double-check"? They are already excellent at verifying the credibility of our work.

- Have we not taught them to "defend" their ideas? They can justify ironclad logic and spot potentially "flaky" logic better than any human.

- Have we not taught them to "publish" and "present to the scientific community"? It's just that the previous steps aren't fully polished yet.

And if you look at the question of "creating completely new ideas" from this angle and in this level of detail... To me personally, it doesn't seem at all like LLMs are incapable of this kind of work.

We simply haven't taught them how to do it yet, purely because we don't have a sufficient volume of the right training materials.

reply
Solving an unsolved problem does not require necessitate learning, it may just require effort.

ARC is trying to test if LLMs can actually learn how to play the game.

reply
So your definition of intelligence would be exactly equal to a human or some subset of them you choose? Could a dog solve ARC-AGI? Probably not. I would not say they lack intelligence. Same with a fruit fly. What if the calculator is powered by actual living neurons? I think you need to know where you actually think the difference between organic machine and intelligence is before making blanket statements.

A modern LLM in a loop with a harness for memory and behavior modification in a body would probably fool me.

reply
"a harness for a memory" so it still requires external tools to work well. The whole point of this benchmark is to validate the systems can solve problems without any sort of outside help.
reply
> Airplanes don't have wings

???

reply
Interesting take.

Just to drive that thought further.

What are you suggesting, should we rename it. To me the fundamental question is this.

Do we still have tasks that humans can do better than AIs?.

I like the question. I think another good test is "make money". There are humans that can generate money from their laptop. I don’t think AI will be net positive.

I’ve tried to create a Polymarket trading bot with Opus 4.6. The ideas were full of logical fallacies and many many mistakes.

But also I’m not sure how they would compare against an average human with no statistics background..

I think it’s really to establish if we by AGI mean better than average human or better than best human..

reply
I don't have a good alternative sadly. Human Equivalent Intelligence? ChatGPT suggests "Systems that increasingly Pareto-dominate human intelligence across domains". Not so catchy.

The "things that currently make money" definition is interesting. Bc they are the things that automation can't currently do, because could be automated, then price would tend to 0 and and couldn't make money at it.

reply
We are jagged, but we can smooth that jaggedness if we choose to do so. LLMs stay jagged.
reply
There's no objective measure of intelligence comparisons, we only say llm is jagged compared to humans.
reply
I’d actually focus on something else entirely here.

Let's be honest: we are giving LLMs and humans the exact same tasks, but are we putting them on an equal playing field? Specifically, do they have access to the same resources and behavioral strategies?

- LLMs don't have spatial reasoning.

- LLMs don't have a lifetime of video game experience starting from childhood.

- LLMs don't have working memory or the ability to actually "memorize" key parameters on the fly.

- LLMs don't have an internal "world model" (one that actively adapts to real-world context and the actual process of playing a game).

... I could go on, but I've outlined the core requirements for beating these tests above.

So, are we putting LLMs and humans in the same position? My answer is "no." We give them the same tasks, but their approach to solving them—let alone their available resources—is fundamentally different. Even Einstein wouldn't necessarily pass these tests on the first try. He’d first have to figure out how to use a keyboard, and then frantically start "building up new experience."

P.S. To quickly address the idea that LLMs and calculators are just "useful tools" that will never become AGI—I have some bad news there too. We differ from calculators architecturally; we run on entirely different "processors." But with LLMs, we are architecturally built the same way: it is a Neural Network that processes and makes decisions. This means our only real advantage over them is our baseline configuration and the list of "tools" connected to our neural network (senses, motor functions, etc.). To me, this means LLMs don't have any fundamental "architectural" roadblocks. We just have a head start, but their speed of evolution is significantly faster.

reply
>But with LLMs, we are architecturally built the same way: it is a Neural Network that processes and makes decisions.

There are high-level similarities between ANNs and the human brain but they are very, very, very different in a ton of ways.

reply
[dead]
reply
LLMs haven't passed us in language, a child can learn language with so so much less data than an LLM can
reply
isn't that more like rate of learning? Agreed LLM consume a lot of data.

But your average LLM understands more languages then anyone alive. So super human understanding of various text based languages.

reply
Rate of learning and general applicability of what is learned is essentially the point of ARC-AGI.

That's why all the AIs score abysmally until humans step in to guide them (fine tuning, harnesses, etc).

reply
The thing is.. this is more akin to testing a blind person's performance on a driving test than testing his intelligence.

I would imagine if you simply encoded the game in textual format and asked an LLM to come up with a series of moves, it would beat humans.

The problem here is more around perception than anything.

reply
I had the same theory back when ARC-AGI-2 came out, and surprisingly encoding it into text didn't help much - LLMs just have a huge blind spot around spatial reasoning, in addition to being bad at vision. The sorts of logic and transformations involved in this just don't show up much in the training data (yet)

I still agree that this is like declaring blind people lack human intelligence, of course.

reply
It only tests puzzle solving, intelligence is cost compression that powers itself.
reply
Previous iterations of ARC-AGI were reminiscent of IQ tests. This one is just too easy and the fact that models do terribly bad on it probably means that there is input mode mismatch or operation mode mismatch.

If model creators are willing to teach their llms to play computer games through text it's gonna be solved in one minor bump of the model version. But honestly, I don't think they are gonna bother because it's just too stilly and they won't expect their models are going to learn anything useful from that.

Especially since there are already models that can learn how to play 8-bit games.

It feels like ARC-AGI jumped the shark. But who knows, maybe people who train models for robots are going to take it in stride.

reply