undefined

points

[-]

I suspect the answer is that the AI chat-app is built so that the LLM response tokens are sent straight into the HTTP response as a SSE stream, without being stored (in their intermediate state) in a database. BUT the 'full' response _is_ stored in the database once the LLM stream is complete, just not the intermediate tokens.

If you look at the gifs of the Claude UI in this post[1], you can see how the HTTP response is broken on page refresh, but some time later the full response is available again because it's now being served 'in full' from the database.

[1]: https://zknill.io/posts/chatbots-worst-enemy-is-page-refresh...