A good method seems to be only make a skill or memory when the LLM gets something wrong, or if you actually observe it's always doing the same step and you can get the model to the same place with less tokens.
Even the /handoff skill was written by the model…
I also imagine that varies by model.
It should be a first class feature of the harness, tbh. It kind of is with the /compact [focus] parameter but this is coarse and leaves no record. I find keeping the handoff files in the repo to be useful for historical context and later debugging.
The solution that I've developed is, let the agent figure things out efficiently, without inflating the context. I have what I call a smart repo that better explains this at
https://github.com/gitsense/smart-ripgrep
The basic idea is, when the agent does a ripgrep it gets back files + matching lines + context.
For those where the code is almost entirely a black box and cannot easily recover when something goes wrong. They are much more keen on this context management and planning because recovering from derailments is much harder (and takes longer) because its often a conversation with llm to try to recover to where they were before.
But are they really working instead of making it worse? Are there any tests or real case studies done by users not tool's author? From my experience, removing from context works more often then adding.
There I usually lay out stuff like "this is a personal greenfield project" and "don't bother with multi-user support" etc. Or Claude will default to creating something WEBSCALE for a simple tool that won't run outside of my local LAN-only Proxmox setup. And that'll also skip massive database migration support for a project that's 3 days old - the agent doesn't know that. I'm just dropping it on the project after a full memory wipe.
This means its changes will either be out of alignment with the overall project and its "style" and goals, or it waste tokens re-getting to know the basics about the project each time.
No third case.