More specifically, it was really AlexNet, the 2012 ImageNet entry, running on two NVIDIA GTX 580's, that highlighted the practicality and utility of running large scale neural nets on affordable hardware. CUDA had been released in 2006, but cuDNN (the CUDA library for neural nets) didn't come out until 2014 - after AlexNet had already kickstarted the demand.
What followed from AlexNet was a few years of intense competition on the ImageNet benchmark, and larger and larger/deeper neural nets (CNNs), which gave rise to a lot of the algorithms and concepts still used today such as residual connections (originally from ResNet), ADAM (training algorithm), ReLU/etc, normalization, dropout, etc... all the fundamentals that made building large neural nets possible.
Schmidhuber's continual reminding everyone that he was working on neural nets back in the 1990s is beyond tiresome. Yes, he should have been recognized alongside Hinton/Bengio/LeCun as one of the pioneers, but time for him to get over it.
Particular architectures don't matter so much yet. It's quite possible that S3-Mamba or xLSTM could be used in lieu of transformers and we would still have LLMs.
I disagree. But more critically, I'd argue it's the legacy of the PDP project that led to what became foundation models today.
One interesting thing to note from the PDP handbook are mentions by LeCun and Hinton of what would later be called CNNs, which LeCun claims to have invented. It seems that Hinton deserves just as much credit as LeCun, and in any case these are discussed just as locally connected models using shared weights as an optimization.
2012 really fundamentally changed everything for the AI community, I’d argue because tensorflow/keras/pytorch followed and that made the infrastructure accessible for distributed training.
Did any country voluntarily choose to join? Maybe Belarus?
And do you think the people in Europe are voluntarily a part of your undeclared, but very much existing, empire?
And then, as a whole, this weighs in favor of European scholars and also should properly inform the funding of similar research in the EU.
Writing the last in the light of a month-and-a-half wait (to date) for EuroHPC to process their own form where we submitted a funding request by no less than University + Private Company already established in the area + 4 alumni, two PHDs and one postdoc. Zero response since.
https://www.youtube.com/watch?v=AIiwuClvH6k
When it comes to attention, details matter, since the idea itself is obvious - weighted inputs, and implicit attention is present in every neural network - this is what weights are.
The specific form of attention used by the Transformer is key-based associative attention, aka "Bahdanau attention" introduced in Bahdanau's paper "Neural Machine Translation by Jointly Learning to Align and Translate". It's perhaps worth noting that the word "attention" is barely even mentioned in this paper, other than noting that this weighted input mechanism can be seen as a form of attention (presumably mentioned since attention was at that time a recurring theme in various types of neural network).
Bahdanau attention - not just the general concept of attention - seems to be a very critical piece of the Transformer architecture since this this is what allows the Transformer to find things in context and is behind the "induction head" mechanism that appears central to how Transformers operate.
And while it is very true that often the research coming out of Academia is useless, what is always neglected are the roots of the research done in private labs.
When Jürgen Schmidhuber and team published their work on Neural Nets back in 1991 it was also useless. Unless you had a supercomputer and very, very deep pockets you were not going to do anything with what came out of their lab.
But still, 30 years later here we are, standing on top of the shoulders of this useless research.
To put it more simply, people with academic credentials should not demand acknowledgement of their current intellectual work while denigrating and ridiculing the importance of very similar work done in the past.
And that's where Schmidhuber goes off the rails: publicly shaming published papers into citing you isn't good academic practice. It's bullying.
You can't claim independence from past work simply because you didn't look directly at it. The job of an academic researcher is to know the landscape of relevant ideas, where they come from, where they're going, and to hopefully contribute a few new good ones.
Citation chains should extend back from your work, along a reasonable line conceptual inheritance, back to a reasonable point of origin. Schmidhuber has different definitions for both of these reasonables than the bulk of the ML research community, to a point that makes him difficult to satisfy.
Spamming citations is unnecessary.
For example, take a look at Albert Einstein's Google Scholar profile. He's not the top cited physicist. Not even close. It's because other researchers don't explicitly cite his papers. https://scholar.google.com/citations?user=qc6CJjYAAAAJ&hl=en...
Same with Tim Berners-Lee and the World Wide Web. Imagine if his original paper were cited every time someone deployed a web site.
If I’m in the private sector, and I rediscover something from first principles, it is not my responsibility to go search all academia to see if someone’s done it before so I can cite their work.
If I rely on a code library that doesn’t explicitly cite papers it was built on, it is also not my responsibility to go find all the papers that it might’ve been built from and cite those papers.
But if you build on them you should have read them. I don't know about the specifics and I don't know if Schmidhuber is out of line or not, and citations and impact factors are a terrible mess, but generally speaking, you are responsible for finding and reading and citing any related work that needs to be cited, and if you work on neural networks in an academic context you probably have been forced to read that particular one at some point. Citation obligations don't just disappear because you don't want to do the research.
What you're referring to is the "development" part of that. In some sense: the job you have _exists precisely because it's not part of the research phase_, and it's equally as valuable as the research part. Research is the proof of concept; development is scaling up and making production-ready and finding small efficiencies and so on.
From an industry perspective, it's tempting to conflate these, because that's what industry research labs are designed to do: integrated R&D. But that is not at all how academic research labs work.
Soon we will also blame academia for not providing iOS and android apps
The goal of academia isn't to be practical, "only" learning.
Many ideas come from philosophy, which many find useless.
Heraclitus discovered change back in ancient Greek, I don't know where we would be in scientific research without that (deliberately ignoring the debate about the originality of what we know about Heraclitus work). I bet his contemporaries found his "research" useless.
The closest to that that I've seen is that traditional academia approaches are too far removed from practical applications for highly applied fields like software engineering, or too slow for fast-moving fields like modern day ML (thus, all the preprints).
I used to work at Nokia Research when they still made phones. Probably the closest thing Europe had to Silicon Valley twenty years ago. Except it was in Helsinki. Lots of stuff got invented there. Nokia didn't really manage to capitalize on its own inventions of course. Or rather it got caught up in its own clumsy attempts throwing babies out of the window by the bucket load. But others sure did. A lot of modern smart phones still have tech in them that Nokia pioneered before either Google or Apple shipped a smart phone.
At the time there was a lot of talk about the demise of industrial research labs. Bell labs (now actually owned by Nokia!), Xerox PARC, IBM, and all the other big US labs that produced amazing stuff are former shadows of themselves. There is some truth in that
But you could argue that Google and Apple picked up some of the slack. And the current AI boom came out of Google cherry picking all the best universities for their AI talent and putting them all together in a research group that then got free reign. Like Nokia, that involved a lot of ejecting of babies with the bath water. But it seems to have spawned lots of new startups that can trace their roots back to that research group in Google.
You don't know ahead of time, where the breakthrough will come from.
There is ton of research that sits on the shelf, and then years later, it gets re-combined with some other useless research, and boom, some big breakthrough.
This current attitude of all research is worthless, so it should be cancelled, is shooting our future selves in the face.
Just as the Dewey Decimal System really only served the purpose of providing the facetious nominal linearization of an arbitrary depth ontological oversimplification, so too humans are much more like random pattern matching machines than festidious sense-makers glued to absolutes derived from false appeals to static mono-perspective ontological hierarchies. The same is becoming lived experience in the LLM age, although the tiktokked youth apparently cannot string ten words together or focus longer than three seconds to attest, I'd wager they can feel it. Are we losing something by rejecting the habit of rigorously manually tending to spurious and temporary ontologies? Yes. Is it necessarily a loss in the long term? Probably not, in the same way we no longer write long-form letters or leave calling cards. Are we gaining something in response? Yes, at a minimum much stronger cross-pollination between ivory towers by fearless exploratory pragmatists who disrespect the would-be scope of nominal professions in favor of holistic thinking... both AI and human.
[0] https://en.wikipedia.org/wiki/Science_and_Civilisation_in_Ch...
Practically no one is against hard science research, properly conducted. The issues are rampant fraud / p-hacking / unreproducible garbage mixed with an unhealthy dose of ideological monoculture and indoctrination, garnished with rising tuition prices while sitting on huge endowments in case of the Ivy Leagues.
As long as you do that with your own money (or money got freely given from other people), sure.
If you use taxpayer money, that's a different game.
However I often see this going from "there's issues" to discounting academia altogether and positioning private labs as a good or only alternative.
After all, most people in the open science collaboration which published the seminal paper kicking off the replication crisis were from academia.
Well... that's "starve the beast" in action. A lot of things we take for granted, that underpin our modern ways of life, came to be due to government investing. Laser, radar, microwaves, the early Internet, that all was military R&D.
"Unfortunately" (well, for the rich and the MIC, at least) there is no way for people to siphon off money in government-funded research, so once the libertarian/small-state BS completely took over following the collapse of the USSR, a lot of that got torn down or supplemented with enough bureaucracy to make Germans cry... and that's why reusable rockets were not invented at NASA but at SpaceX instead.
https://en.wikipedia.org/wiki/Jet_Propulsion_Laboratory
> Founded in 1936 by California Institute of Technology (Caltech) researchers, the laboratory is now owned and sponsored by NASA and administered and managed by Caltech.
Minimum-Landing-Error Powered-Descent Guidance for Mars Landing Using Convex Optimization http://larsjamesblackmore.com/BlackmoreEtAlJGCD10.pdf
Elon originally wanted parachutes and was convinced by Lars to go with self landing rockets.
Unfortunately, as the early history of SpaceX shows, it required a lot of failures to learn from to design the current crop of rockets. And that's the advantage that private R&D has... as long as the person in charge has money, failure is an option, because in anything publicly funded, any failure will relentlessly be blamed on the currently governing party by the opposition.
If sentiment on HN were as you say, how could your pro-academia and anti-big tech comment be sitting at the top as the most upvoted comment?
Yes is very easy to forget, cause the trillion is not being made in Europe. If it was really conceived in Munich (like the maps that got stolen also), it show how incompetent is Europe to keep it´s technology and protect European companies.
It is painful to read this article.
It's like saying it's painful that the Web was invented in Europe and opened for everybody rather than being kept at CERN to protect European companies.
In the Schmidhuber case their is 20 years and a chain of countless other works in between the two.
The real root of the current AI boom is a master thesis from university of Toronto.
The thesis demonstrated that neural networks much longer than before could be trained by simply having a random fraction of the neurons excluded during forward and back propagation.
That's how we got practical deep neural networks. Without that we would still be in AI winter.
Indeed I remember buying a set of three conference-papers-as-books around that time, titled Artificial Neural Networks .. proceedings of the whatever the conference was.
No doubt Schmidhuber made important contributions, but I see him pop up claiming to be the 'root' of it all every couple of years.
related paragraph from Wikipedia:
Modern backpropagation was first published by Seppo Linnainmaa as "reverse mode of automatic differentiation" (1970)[26] for discrete connected networks of nested differentiable functions.[27][28][29]
In 1982, Paul Werbos applied backpropagation to MLPs in the way that has become standard.
Both papers are direct applications of the chain rule applied to estimate the gradient of a multivariate function.
Name a single aspect of something modern like the Transformer architecture or how it is trained, that is even indirectly attributable to Schmidhuber.
No doubt he'd be jumping up and down wanting to take credit for residual connections, but where was Schmidhuber in the ImageNet era when everyone else was discovering how to build deep neural nets? Why didn't Schmidhuber invent ResNets, but instead waited until someone else (Kaiming He) did, then claim credit for it?
I'll bet Schmidhuber isn't done with yet ... when someone eventually comes up with an architecture for AGI, Schmidhuber will come out of the woodwork and point to a note he made on a napkin in 1800 that predicted it all.
It's nauseating how all the researchers who happened to work for big tech got tons of media coverage but Schmidhuber and his team were getting zero coverage yet they made massive contributions. I bet there are many others not mentioned.
Nobody even knows about Frank Rosenblatt. It's insane how distorted our perception of innovation is.
Even science has been corrupted. It makes one doubt every story we're told about who invented what.
Very Trump-like statement - "Not many people know this, but ...". Yes, I lot of people know this. Any class that even says a little about the history of NNs will talk about Rosenblatt and the Perceptron.
Sure. I think it starts to get more interesting when the influences that Rosenblatt explicitly cites in his seminal Perceptron paper (e.g. Hayek) become part of the discussion (which rarely happens in my experience).