upvote
One of my favorite bits of my PhD dissertation was factoring an intractable 3-dimensional integral

\iiint f(x, y, z) dx dy dz = \int [\int g(x, y) dx]*[\int h(y, z) dz] dy

which greatly accelerated numerical integration (O(n^2) rather than O(n^3)).

My advisor was not particularly impressed and objectively I could have skipped it and let the simulations take a bit longer (quite a bit longer--this integration was done millions of times for different function parameters in an inner loop). But it was clever and all mine and I was proud of it.

reply
Convolution is a local operation.

Attention is a global operation.

reply
This brings me back to DSP class, man learning about FFT was eye-opening.
reply
That's like saying sorting can be done in O(n) because radix sort exists. If you assume some structure, you lose generality, i.e. there'll be some problems it's no longer able to solve. It can no longer approximate any arbitrary function that needs perfect memory over the sequence.
reply