Computational power. Without self attention, you have a sloppy implementation of something called a PDA (push-down-automaton) -- like an old HP calculator. With it, you have an even sloppier implementation of a Turing machine.
So (modulo a _lot_ of details) it increases the power from that of a "calculator" to that of a "computer".