upvote
If you think statefull LLMs would be easier to handle then stateless... Then I think you haven't done a lot of software engineering
reply
Maybe a charitable reading of the parent comment, but my interpretation of it was that while the _models_ are stateless, modern deployments of these models for inference rely on state.

For example, tiered pricing for cached context relies on state, even if the models don’t.

reply
For that matter, the agent harness accumulating "chat history" is state.
reply
Of course you can pass in your own state, but I always wondered about an LLM that has conversation context stay resident in GPU memory somehow.

Or maybe this already effectively covered by context caching and the gains would be minimal (stateless, but if you pass in the same context or the same head context, it’s already in GPU memory and doesn’t need to be loaded?).

reply
This. lol. If you think state makes things easier you're in for a big surprise.
reply
That does not seem to be related to llms? It is more about the harness that utilizes them, right?
reply
The parent comment is AI generated drivel, that’s why. Incredible that it has generated such a lively discussion.
reply
It costs tokens, so it helps the business model, so it’s not a bug but a feature.
reply