It's general attention collapse and it happens everywhere once you start noticing it.
The simplest example, which even frontier models fail at, is something of the form `A and not B', which they keep insisting means `A and B' after the text gets pushed far enough back in the context.
The only solution, I think, that is even theoretically capable of fixing this is using a different form of attention. One which innately understands tree-like structures and binds tree nodes close together regardless of overall distance from the end of the stream.
Incidentally this is what I'm also working on at $job.
src/forge/context/ - specifically TieredCompact in strategies.py. That's the furthest I took it. The tool-call collapse in particular has been useful in agentic coding, but I haven't formalized/generalized it yet. I think within forge it'll be a callable tool that will rely on the model knowing when to trigger it (as you said - "I'm done with the task, can collapse"). That's the part I need to abstract out of my bespoke implementation.
Your idea of using task shape to dynamically set those thresholds (or even move to model-triggered) I think is the key but is a trickier implementation. That's what I haven't gotten around to yet.
Definitely on my todo list but happy to check out a PR if you have something in mind.
Some additional info on my current public hack is also at: https://github.com/antoinezambelli/forge/blob/main/docs/USER...
1. Runtime-computed "context pressure" — tokens-since-last-compaction, depth of tool-call nesting, response/call ratio in recent turns. The runtime computes this; the model never sees it.
2. Model-emitted "natural breakpoint" — a tool call the model fires when it perceives it's done with a thread (file closed, task complete, branch abandoned).
Compaction fires on the AND of both. Keeps the model from compacting mid-reasoning-chain, and keeps the runtime from waiting until 90% context for the model to notice on its own.
Going to actually go read TieredCompact tonight — curious whether you've ended up tying triggers to task signals or kept them on model self-report.