undefined

points

[-]

I agree. I also think it's about the hardware and, obviously, recognizing AD as the fundamental primitive.

Particular architectures don't matter so much yet. It's quite possible that S3-Mamba or xLSTM could be used in lieu of transformers and we would still have LLMs.

by LogicFailsMe6 hours ago|

prev|

[-]

And Google's acquisition of DNN Research to get the ball rolling with conv nets and AI moneyball, followed by the acquisition of Deepmind. Schmidhuber IMO *has* been recognized as one of the 4 horseman and rightly so, but what has he done lately? Just noticed they now say the 3 godfathers of AI. This is what people hate about academia. It's not academia itself, it's the mean girl politics that emerge from the tenure system. And at this point, tenure should be abolished IMO having been utterly weaponized to defend the status quo.

by Scroll_Swe20 minutes ago|

prev|

[-]

Thanks AI for destroying my hobby. :)

by AndrewKemendo4 hours ago|

prev|

[-]

This is well put.

2012 really fundamentally changed everything for the AI community, I’d argue because tensorflow/keras/pytorch followed and that made the infrastructure accessible for distributed training.

by alephnerd2 hours ago|

prev|

[-]

> The current AI boom has more to do with NVIDIA, and the popularity of computer gaming giving us GPU compute, than who was using neural networks back in 1990's

I disagree. But more critically, I'd argue it's the legacy of the PDP project that led to what became foundation models today.

by HarHarVeryFunny1 hours ago|

parent|

[-]

The PDP project was very early - relevant in term of neural net history of course, but hard to see much there relevant to today's large models other than Hinton's reinvention of SGD as an alternative to the layer-wise training that was then the norm.

One interesting thing to note from the PDP handbook are mentions by LeCun and Hinton of what would later be called CNNs, which LeCun claims to have invented. It seems that Hinton deserves just as much credit as LeCun, and in any case these are discussed just as locally connected models using shared weights as an optimization.