upvote
> I'm not sure I understand this argument. I create new tools all the time as part of my development work, and I have skills stored that tell agents how to use them. They use them flawlessly.

I highly doubt that your tool is like this:

> git branch -vv | grep ': gone]'| grep -v "*" | awk '{ print $1; }' | xargs -r git branch -d

Or:

> ffmpeg -i main_course.mp4 -i reaction_cam.mov \ -filter_complex \ "[1:v]scale=480:270[pip_scaled]; \ [0:v][pip_scaled]overlay=W-w-20:20[pip_video]; \ [pip_video]drawtext=text='LIVE RECORDING':fontcolor=white:fontsize=24:box=1:boxcolor=black@0.6:x=30:y=30[final_video]; \ [0:a][1:a]amix=inputs=2:duration=first:dropout_transition=2[final_audio]" \ -map "[final_video]" -map "[final_audio]" \ -c:v libx264 -crf 21 -preset fast \ -c:a aac -b:a 192k \ output_production.mp4

LLMs generate these for breakfast.

reply
It’s really wild watching LLMs construct those calls. They batch so many different checks and stuff into a single tool call, delimit them with markers, etc.

The crazy thing to me is that this kind of “composition of small tools to create something bigger” is the biggest vindication of the Unix philosophy I can think of.

I have to wonder how much of that behavior was trained into the model and how much it is the secret herbs and spices they toss into the harnesses own system prompts.

reply
Totally breaks the permission model in Claude Code.
reply
Personally I really dislike when the agents generate super long composed shell commands because they are really hard to audit. ffmpeg I'd whitelist, but if it makes a mistake in some super long chained git command it can have pretty scary consequences.
reply
I think the issues is, it is going against a very well established pattern. I have a tool that wraps ripgrep so that search results always includes context and from time to time, the agent will use ripgrep by itself and when I ask why, it would go "yeah I should have done that"

There are work arounds though and I am creating what I call knowledge triggers for Pi that are similar to claude's "PreToolUse" so having the agent use oak all the time is not an issue in my opinion.

The challenge for oak is why? Considering how I actually want to slow agents down so I can ensure it is doing the right thing and because the massive bottle kneck is the LLM themselves, speed when measured in milliseconds or even seconds will not concern many.

I thought oak was more of, we know how to prompt inject context based on code that is stored in oak for example, but faster operations can help, but the use case is limited. The missing piece for better/correct code is context at the right time.

reply
> I think the issues is, it is going against a very well established pattern. I have a tool that wraps ripgrep so that search results always includes context and from time to time, the agent will use ripgrep by itself and when I ask why, it would go "yeah I should have done that"

There's a limit of how many simultaneous instructions an agent can follow (the exact number depends on the specific model so instructions that are fine for one model may overwhelm another). If this keeps happening, consider trimming your instructions or even better, solving it at the harness level (like intercepting and rewriting ripgrep calls to use your thing, like rtk [0] does in agents that supports this)

Overall, never leave to an agent an instruction that must be followed at all times. For example, doing things in a git hook beats a multi-command workflow every time the agent commit, etc.

Is this state of things forever? I don't think so. Very soon models will become so better this will be a non-problem

[0] https://github.com/rtk-ai/rtk

reply
I use a new VCS already (jj, highly recommended) and Claude forgets to use it all the time despite many obvious instructions in many places.
reply