upvote
Often I find LLMs doing multiple steps to achieve some goals (e.g. do certain operations against JIRA or Gitlab), and if the LLM work seems useful, I instruct it to create a tool to achieve the task more directly and revise skill data to make use of the tool.

Granted I've let it mostly vibecode those tools, so they might be garbage. I should perhaps have it do a refactoring round to make more composable tools..

reply
You are completely wrong, but one might get that impression from not using SOTA models in the Sonnet ballpark.
reply
I think both preceding comments are a bit too strongly worded. I’m experimenting as well with pairing deterministic programming with llm use in a similar fashion and find that it allows you to squeeze more out of smaller models than with llm-only agentic loops. It is also no question for me that the large SOTA models can do way more in llm-only agentic loops with less hassle and pre-work. If you discount the hassle of actually running them, that is. So I guess it depends a bit on what your objective is.
reply