Advantage of SML in between some outputs cannot be compressed without losing context, so a small model does that job. It works but most of these solutions still have some tradeoff in real world applications.
We compress tool outputs at each step, so the cache isn't broken during the run. Once we hit the 85% context-window limit, we preemptively trigger a summarization step and load that when the context-window fills up.