Yeah, this is basically what I ran into too. I actually wrote about this in Stage 6 (
https://ivanmagda.dev/posts/s06-context-compaction/) I went with your option (1): once history crosses a token threshold, the agent asks the model to summarize everything so far, then swaps the full history for that summary. Keeps the context window clean, though you do lose the ability to go back and reference exact earlier tool outputs.
The hard part was picking when to trigger it. Too early and you're throwing away useful context. Too late and the model's already struggling. I ended up just using a simple token count — nothing clever, but it works.
And yeah, the Swift angle was genuinely fun. Defining tool schemas as Codable structs that auto-generate JSON schemas at compile time, getting compiler errors instead of runtime API failures is a huge win.