Yeah this is a very interesting angle. Our primary mechanism here is via agent created auto-memories today. The agent keeps track of the most useful steps, and more importantly, dead end steps as it executes runbooks. We think this offers a great bridge to suggest runbook updates and keep them current.
>> Curious how much savings do you observe from using runbook versus purely let Claude do the planning at first.
Really depends on runbook quality, so I don't have a straightforward answer. Of course, it's faster and cheaper if you have well defined steps in your runbooks. As an example, `check logs for service frontend, faceted by host_name`, vs. `check logs`. Agent does more exploration in the latter case.
We wrote about the LLM costs of investigating production alerts more generally here, in case helpful: https://relvy.ai/blog/llm-cost-of-ai-sre-investigating-produ...
In our experience, runbooks provide a consistent, fast, and reliable way of investigating incidents (or ruling out common causes). In their absence, the AI does its usual open-ended exploration.