Shockingly, we seem to have found a self attention mechanism of that quality, it just has the sad property of growing at O(N^2) where N is the context length.
Nvidia uses ML for finetuning and architecturing their chips. this might be one use case.
Another one would be to put EVERYTHING from your company into this context window. It would be easier to create 'THE' model for every company or person. It might also be saver than having a model train with your data because you don't have a model with all your data, only memory.
But maybe that’s enough tokens to feed an entire lifetime of user behaviour in for the digital twin dystopia?