As you said, it's not in any way intrinsic to the LLM, though it may be a very necessary optimization on today's hardware.
IMO, we are probably talking about a 6x slow down (for typical english). You would need to be absolutely stupid not to implement some kind of optimisation along these lines.
Slower and maybe a little dumber; But it would work.