undefined

points

[-]

Maybe a charitable reading of the parent comment, but my interpretation of it was that while the _models_ are stateless, modern deployments of these models for inference rely on state.

For example, tiered pricing for cached context relies on state, even if the models don’t.

by zahlman2 hours ago|

parent|

[-]

For that matter, the agent harness accumulating "chat history" is state.

by seanmcdirmid2 hours ago|

prev|

[-]

Of course you can pass in your own state, but I always wondered about an LLM that has conversation context stay resident in GPU memory somehow.

Or maybe this already effectively covered by context caching and the gains would be minimal (stateless, but if you pass in the same context or the same head context, it’s already in GPU memory and doesn’t need to be loaded?).

by random32 hours ago|

prev|

[-]

This. lol. If you think state makes things easier you're in for a big surprise.