Because many of these have the same underlying causal structures - humans doing things, weather correlations, holidays.
Well studied behavioral stuff like "the stock market takes the stairs up and the elevator down" which is not really captured by "traditional" modelling tools.
I'm sure people will be doing mechanical interpretation on these models to extract what they pattern match for prediction.
This might be a totall wrong approach, but I think it might make sense to try to model a matched filter based on previous stock selloff/bullrun trigger events, and then see if the it has any predictive ability, likewise the market reaction seems to be usually some sort of delayed impulse-like activity, with the whales reacting quickly, and then a distribution of less savvy investors following up the signal with various delays.
I'm sure other smarter people have explored this approach much more in depth before me.
NNs do ok on those time series problems where it is really about learning a function directly off time. This is nonlinear regression where time is just another input variable.
Cases where one has to adjust for temporaly correlated errors, those seem to be harder for NNs. BTW I am talking about accuracies beyond what a typical RNN variants will achieve, which is pretty respectable. It's the case that more complicated DNNs don't seem to do much better inspite of their significant model complexity.
The M series of competitions change the tasks every year to explore what models perform best under different scenarios. As I mentioned, neural network based models win here and there, but very spotty performance over all.
Or, you know, maybe they aren't. Thermometers and photon counts are related to weather sometimes, but not holidays. Holidays are related to traffic sensors and to markets, but not Geiger counters.
> Well studied behavioral stuff like "the stock market takes the stairs up and the elevator down" which is not really captured by "traditional" modelling tools.
Prices are the opposite, up like a shot during shocks, falling slowly like a feather. So that particular pattern seems like a great example of over-fitting danger and why you wouldn't expect mixing series of different types to be work very well.
The model will have a library of patterns, and will be able to pattern match subtle ones to deduce "this time series has the kind of micro-patterns which appear in strongly weather influenced time-series", and use this to activate the weather pattern cluster.
To use your example, when served thermometer data, the model notices that the holiday pattern cluster doesn't activate/match at all, and will ignore it.
And then it makes sense to train it on the widest possible time series, so it can build a vast library of patterns and find correlations of activation between them.