There is a long history of people arguing that intelligence is actually the ability to predict accurately.
https://www.explainablestartup.com/2017/06/why-prediction-is...
> Intelligence isn't just about memorizing facts, it's about reasoning.
Initially, LLMs were basically intuitive predictors, but with chain of thought and more recently agentic experimentation, we do have reasoning in our LLMs that is quite human like.
That said, there is definitely a biased towards training set material, but that is also the case with the large majority of humans.
For the Esoland benchmarks, I would be curious how much adding a SKILLS.md file for each language would boost performance?
I am pretty confidence that we are in the AGI era. It is unsettling and I think it gives people cognitive dissonance so we want to deny it and nitpick it, etc.
There sure is, and in psychological circles that it appears that there's an argument that that is not the case.
https://gwern.net/doc/psychology/linguistics/2024-fedorenko....
> Initially, LLMs were basically intuitive predictors, but with chain of thought and more recently agentic experimentation, we do have reasoning in our LLMs that is quite human like.
If you handwave the details away, then sure it's very human like, though the reasoning models just kind of feed the dialog back to itself to get something more accurate. I use Claude code like everyone else, and it will get stuck on the strangest details that humans actively wouldn't.
> For the Esoland benchmarks, I would be curious how much adding a SKILLS.md file for each language would boost performance?
Tough to say since I haven't done it, though I suspect it wouldn't help much, since there's still basically no training data for advanced programs in these languages.
> I am pretty confidence that we are in the AGI era. It is unsettling and I think it gives people cognitive dissonance so we want to deny it and nitpick it, etc.
Even if you're right about this being the AGI era, that doesn't mean that current models are AGI, at least not yet. It feels like you're actively trying to handwave away details.
Much of our reasoning is based on stimulating our sensory organs, either via imagination (self-stimulation of our visual system) or via subvocalization (self-stimulation of our auditory system), etc.
> it will get stuck on the strangest details that humans actively wouldn't.
It isn't a human. It is AGI, not HGI.
> It feels like you're actively trying to handwave away details.
Maybe. I don't think so though.
Personally, I've used LLMs to debug hard-to-track code issues and AWS issues among other things.
Regardless of whether that was done via next-token prediction or not, it definitely looked like AGI, or at least very close to it.
Is it infallible? Not by a long shot. I always have to double-check everything, but at least it gave me solid starting points to figure out said issues.
It would've taken me probably weeks to find out without LLMd instead of the 1 or 2 hours it did.
In that context, I have a hard time thinking how would a "real" AGI system look like, that it's not the current one.
Not saying current LLMs are unequivocally AGI, but they are darn close for sure IMO.
Why is it that LLMs could ace nearly every written test known to man, but need specialized training in order to do things like reliably type commands into a terminal or competently navigate a computer? A truly intelligent system should be able to 0-shot those types of tasks, or in the absolute worst case 1-shot them.
Being able to actually reason about things without exabytes of training data would be one thing. Hell, even with exabytes of training data, doing actual reasoning for novel things that aren't just regurgitating things from Github would be cool.
Being able to learn new things would be another. LLMs don't learn; they're a pretrained model (it's in the name of GPT), that send in inputs and get an output. RAGs are cool but they're not really "learning", they're just eating a bit more context in order to kind of give a facsimile of learning.
Going to the extreme of what you're saying, then `grep` would be "darn close to AGI". If I couldn't grep through logs, it might have taken me years to go through and find my errors or understand a problem.
I think that they're ultimately very neat, but ultimately pretty straightforward input-output functions.
Well, I guess you lose artificial if there’s a human brain hidden in the box.