And even if you think the chance is zero, unless you also think there is a zero chance they will be capable of pivoting quickly, it might still be beneficial.
I think his views are largely flawed, but chances are there will still be lots of useful science coming out of it as well. Even if current architectures can achieve AGI, it does not mean there can't also be better, cheaper, more effective ways of doing the same things, and so exploring the space more broadly can still be of significant value.
I believe he didn't think that reasoning/CoT would work well or scale like it has
Of course now we know this was delusional and it seems almost funny in retrospect. I feel the same way when I hear that 'just scale language models' suddenly created something that's true AGI, indistinguishable from human intelligence.
Whenever I see people think the model architecture matters much, I think they have a magical view of AI. Progress comes from high quality data, the models are good as they are now. Of course you can still improve the models, but you get much more upside from data, or even better - from interactive environments. The path to AGI is not based on pure thinking, it's based on scaling interaction.
To remain in the same miasma theory of disease analogy, if you think architecture is the key, then look at how humans dealt with pandemics... Black Death in the 14th century killed half of Europe, and none could think of the germ theory of disease. Think about it - it was as desperate a situation as it gets, and none had the simple spark to keep hygiene.
The fact is we are also not smart from the brain alone, we are smart from our experience. Interaction and environment are the scaffolds of intelligence, not the model. For example 1B users do more for an AI company than a better model, they act like human in the loop curators of LLM work.
Just because RNNs and Transformers both work with enormous datasets doesn't mean that architecture/algorithm is irrelevant, it just suggests that they share underlying primitives. But those primitives may not be the right ones for 'AGI'.
It's only with hindsight that we think contagionism is obviously correct.
It really depends what you mean by 'we'. Laymen? Maybe. But people said it was wrong at the time with perfectly good reasoning. It might not have been accessible to the average person, but that's hardly to say that only hindsight could reveal the correct answer.
I'm not aware that we have notably different data sources before or after transformers, so what confounding event are you suggesting transformers 'lucked' in to being contemporaneous with?
Also, why are we seeing diminishing returns if only the data matters. Are we running out of data?
The METR time-horizon benchmark shows steady exponential growth. The frontier lab revenue has been growing exponentially from basically the moment they had any revenues. (The latter has confounding factors. For example it doesn't just depend on the quality of the model but on the quality of the apps and products using the model. But the model quality is still the main component, the products seem to pop into existence the moment the necessary model capabilities exist.)
I'm on the contrary believe that the hunt for better data is an attempt to climb the local hill and be stuck there without reaching the global maximum. Interactive environments are good, they can help, but it is just one of possible ways to learn about causality. Is it the best way? I don't think so, it is the easier way: just throw money at the problem and eventually you'll get something that you'll claim to be the goal you chased all this time. And yes, it will have something in it you will be able to call "causal inference" in your marketing.
But current models are notoriously difficult to teach. They eat enormous amount of training data, a human needs much less. They eat enormous amount of energy to train, a human needs much less. It means that the very approach is deficient. It should be possible to do the same with the tiny fraction of data and money.
> The fact is we are also not smart from the brain alone, we are smart from our experience. Interaction and environment are the scaffolds of intelligence, not the model.
Well, I learned English almost all the way to B2 by reading books. I was too lazy to use a dictionary most of the time, so it was not interactive: I didn't interact even with dictionary, I was just reading books. How many books I've read to get to B2? ~10 or so. Well, I read a lot of English in Internet too, and watched some movies. But lets multiply 10 books by 10. Strictly speaking it was not B2, I was almost completely unable to produce English and my pronunciation was not just bad, it was worse. Even now I stumble sometimes on words I cannot pronounce. Like I know the words and I mentally constructed a sentence with it, but I cannot say it, because I don't know how. So to pass B2 I spent some time practicing speech, listening and writing. And learning some stupid topic like "travel" to have a vocabulary to talk about them in length.
How many books does LLM need to consume to get to B2 in a language unknown to it? How many audio records it needs to consume? Life wouldn't be enough for me to read and/or listen so much.
If there was a human who needed to consume as much information as LLM to learn, they would be the stupidest person in all the history of the humanity.
It was empirical and, though ultimately wrong, useful. Apply as you will to theories of learning.
I won't comment on Yann LeCun or his current technical strategy, but if you can avoid sunk cost fallacy and pivot nimbly I don't think it is bad for Europe at all. It is "1 billion dollars for an AI research lab", not "1 billion dollars to do X".
Sure LLMs are getting better and better, and at least for me more and more useful, and more and more correct. Arguably better than humans at many tasks yet terribly lacking behind in some others.
Coding wise, one of the things it does “best”, it still has many issues: For me still some of the biggest issues are still lack of initiative and lack of reliable memory. When I do use it to write code the first manifests for me by often sticking to a suboptimal yet overly complex approach quite often. And lack of memory in that I have to keep reminding it of edge cases (else it often breaks functionality), or to stop reinventing the wheel instead of using functions/classes already implemented in the project.
All that can be mitigated by careful prompting, but no matter the claim about information recall accuracy I still find that even with that information in the prompt it is quite unreliable.
And more generally the simple fact that when you talk to one the only way to “store” these memories is externally (ie not by updating the weights), is kinda like dealing with someone that can’t retain memories and has to keep writing things down to even get a small chance to cope. I get that updating the weights is possible in theory but just not practical, still.
What's still missing is the general reasoning ability to plan what to build or how to attack novel problems - how to assess the consequences of deciding to build something a given way, and I doubt that auto-regressively trained LLMs is the way to get there, but there is a huge swathe of apps that are so boilerplate in nature that this isn't the limitation.
I think that LeCun is on the right track to AGI with JEPA - hardly a unique insight, but significant to now have a well funded lab pursuing this approach. Whether they are successful, or timely, will depend if this startup executes as a blue skies research lab, or in more of an urgent engineering mode. I think at this point most of the things needed for AGI are more engineering challenges rather than what I'd consider as research problems.
Wait, we have another acronym to track. Is this the same/different than AGI and/or ASI?