For example, tiered pricing for cached context relies on state, even if the models don’t.
Or maybe this already effectively covered by context caching and the gains would be minimal (stateless, but if you pass in the same context or the same head context, it’s already in GPU memory and doesn’t need to be loaded?).