Perhaps more important, just because it is easy to escape any local minimum does not mean that there is necessarily a trend towards a really good optimum, as it can just bounce between a bunch of really bad ones for a long time. This actually happens almost all the time if you try to design your entire architecture from scratch, e.g. highly connected networks. People who are new to the field sometimes don't seem to understand why SGD doesn't just always fix everything; this is why. You need very strong inductive biases in your architecture design to ensure that the loss (which is data-dependent so you cannot ascertain this property a priori) exhibits a global bowl-like shape (we often call this a 'funnel') to provide a general trajectory for the optimizer toward good solutions. Sometimes this only works for some optimizers and not others.
This is why architecture design is something of an art form, and explaining "why neural networks work so well" is a complex question involving a ton of parts, all of which contribute in meaningful ways. There are often plenty of counterexamples to any simpler explanation.
If they were all correlated with each other that does not seem far fetched.
E.g. you could land perfectly on a local minima but you won’t stay the unless your step size was minute or the minima was quite substantial.