undefined

points

[-]

This absolutely was the crux of the (first) mystery, and I would argue that "deep learning theory" really only took off once it recognized this. There are other mysteries too, like the feasibility of transfer learning, neural scaling laws, and now more recently, in-context learning.