Try this out using a local LLM. You'll see that as the conversation grows, your prompts take longer to execute. It's not exponential but it's significant. This is in fact how all autoregressive LLMs work.
Also by the way, caching does not make LLM inference linear. It's still quadratic, but the constant in front of the quadratic term becomes a lot smaller.
Touché. Still, to a reasonable approximation, caching makes the dominant term linear, or equiv, linearly scales the expensive bits.
This is the operation that is basically done for each message in an LLM chat in the logical level: the complete context/history is sent in to be processed. If you wish to process only the additions, you must preserve the processed state on server-side (in KV cache). KV caches can be very large, e.g. tens of gigabytes.