undefined

points

by elephanlemon7 hours ago |

comments

by jaredsohn2 minutes ago|

[-]

[delayed]

by nr3786 hours ago|

prev|

[-]

Oh that's quite a nice idea - agentic context management (riffing on agentic memory management).

There's some challenges around the LLM having enough output tokens to easily specify what it wants its next input tokens to be, but "snips" should be able to be expressed concisely (i.e. the next input should include everything sent previously except the chunk that starts XXX and ends YYY). The upside is tighter context, the downside is it'll bust the prompt cache (perhaps the optimal trade-off is to batch the snips).

by mksglu3 hours ago|

parent|

[-]

Good point on prompt cache invalidation. Context-mode sidesteps this by never letting the bloat in to begin with, rather than snipping it out after. Tool output runs in a sandbox, a short summary enters context, and the raw data sits in a local search index. No cache busting because the big payload never hits the conversation history in the first place.

by FuckButtons3 hours ago|

prev|

[-]

Yeah, the fact that we have treated context as immutable baffles me, it’s not like humans working memory keeps a perfect history of everything they’ve done over the last hour, it shouldn’t be that complicated to train a secondary model that just runs online compaction, eg: it runs a tool call, the model determines what’s Germaine to the conversion and prunes the rest, or some task gets completed, ok just leave a stub in the context that says completed x, with a tool available to see the details of x if it becomes relevant again.

by mksglu3 hours ago|

parent|

[-]

That's pretty much the approach we took with context-mode. Tool outputs get processed in a sandbox, only a stub summary comes back into context, and the full details stay in a searchable FTS5 index the model can query on demand. Not trained into the model itself, but gets you most of the way there as a plugin today.

by esperent2 hours ago|

parent|

prev|

[-]

Is it because of caching? If the context changes arbitrarily every turn then you would have to throw away the cache.

by 8note36 minutes ago|

prev|

[-]

i think something kinda easy for that could be to pretend that pruned output was actually done by a subagent. copy the detailed logs out, and replace it with a compacted summary.

by esperent2 hours ago|

prev|

[-]

> For example, if you’re working with a tool that dumps a lot of logged information into context

I've set up a hook that blocks directly running certain common tools and instead tells Claude to pipe the output to a temporary file and search that for relevant info. There's still some noise where it tries to run the tool once, gets blocked, then runs it the right way. But it's better than before.

by mksglu3 hours ago|

prev|

[-]

That's exactly what context-mode does for tool outputs. Instead of dumping raw logs and snapshots into context, it runs them in a sandbox and only returns a summary. The full data stays in a local FTS5 index so you can search it later when you need specifics.

by 8note35 minutes ago|

parent|

[-]

what i want is for the agent to initially get the full data and make the right decision based on it, then later it doesnt need to know as much about how it got there.

isnt that how thinking works? intermediate tokens that then get replaced with the reuslt?