upvote
I'm not sure I understand this argument. I create new tools all the time as part of my development work, and I have skills stored that tell agents how to use them. They use them flawlessly.

When I say "benchmark the query engine using the foobar dataset and compare it to run 431", the agents go and run my special benchmark tool and use the different subcommands to compare results and so on.

I'm sure a new VCS would be a little less smooth sailing, but not by much.

reply
> I'm not sure I understand this argument. I create new tools all the time as part of my development work, and I have skills stored that tell agents how to use them. They use them flawlessly.

I highly doubt that your tool is like this:

> git branch -vv | grep ': gone]'| grep -v "*" | awk '{ print $1; }' | xargs -r git branch -d

Or:

> ffmpeg -i main_course.mp4 -i reaction_cam.mov \ -filter_complex \ "[1:v]scale=480:270[pip_scaled]; \ [0:v][pip_scaled]overlay=W-w-20:20[pip_video]; \ [pip_video]drawtext=text='LIVE RECORDING':fontcolor=white:fontsize=24:box=1:boxcolor=black@0.6:x=30:y=30[final_video]; \ [0:a][1:a]amix=inputs=2:duration=first:dropout_transition=2[final_audio]" \ -map "[final_video]" -map "[final_audio]" \ -c:v libx264 -crf 21 -preset fast \ -c:a aac -b:a 192k \ output_production.mp4

LLMs generate these for breakfast.

reply
It’s really wild watching LLMs construct those calls. They batch so many different checks and stuff into a single tool call, delimit them with markers, etc.

The crazy thing to me is that this kind of “composition of small tools to create something bigger” is the biggest vindication of the Unix philosophy I can think of.

I have to wonder how much of that behavior was trained into the model and how much it is the secret herbs and spices they toss into the harnesses own system prompts.

reply
Totally breaks the permission model in Claude Code.
reply
Personally I really dislike when the agents generate super long composed shell commands because they are really hard to audit. ffmpeg I'd whitelist, but if it makes a mistake in some super long chained git command it can have pretty scary consequences.
reply
I think the issues is, it is going against a very well established pattern. I have a tool that wraps ripgrep so that search results always includes context and from time to time, the agent will use ripgrep by itself and when I ask why, it would go "yeah I should have done that"

There are work arounds though and I am creating what I call knowledge triggers for Pi that are similar to claude's "PreToolUse" so having the agent use oak all the time is not an issue in my opinion.

The challenge for oak is why? Considering how I actually want to slow agents down so I can ensure it is doing the right thing and because the massive bottle kneck is the LLM themselves, speed when measured in milliseconds or even seconds will not concern many.

I thought oak was more of, we know how to prompt inject context based on code that is stored in oak for example, but faster operations can help, but the use case is limited. The missing piece for better/correct code is context at the right time.

reply
> I think the issues is, it is going against a very well established pattern. I have a tool that wraps ripgrep so that search results always includes context and from time to time, the agent will use ripgrep by itself and when I ask why, it would go "yeah I should have done that"

There's a limit of how many simultaneous instructions an agent can follow (the exact number depends on the specific model so instructions that are fine for one model may overwhelm another). If this keeps happening, consider trimming your instructions or even better, solving it at the harness level (like intercepting and rewriting ripgrep calls to use your thing, like rtk [0] does in agents that supports this)

Overall, never leave to an agent an instruction that must be followed at all times. For example, doing things in a git hook beats a multi-command workflow every time the agent commit, etc.

Is this state of things forever? I don't think so. Very soon models will become so better this will be a non-problem

[0] https://github.com/rtk-ai/rtk

reply
I use a new VCS already (jj, highly recommended) and Claude forgets to use it all the time despite many obvious instructions in many places.
reply
> Models know git because there's a monstrous amount of git in their training data. Models never heard of a new thing "for agents", so you have to teach them to use it via skills and docs.

Another option: when model invokes standard tool, rewrite the invocation to newfangled tool.

Bunch of ways of doing it:

(a) Invocation of standard tool returns error saying to use newfangled tool instead

(b) Invocation of standard tool returns message saying it has been dynamically rewritten to invoke newfangled tool, followed by newfangled tool output

(c) Invocation of standard tool in context is dynamically rewritten to invocation of newfangled tool, prior to execution

In case (c), the model ends up thinking it somehow knew about this new thing all along, even though it actually didn’t

reply
Options (a) and (b) add more bloat to the model’s context window and option (c) seem to reduce to having similar functions that already existed. There is also the option to trick the LLM that it’s using the old function exactly as-is, while the harness abstracts away a completely different methodology. Cursor often does exactly this: they use an internally built vectorized search when the model calls the default “find” bash command. The LLM is none the wiser that the function’s implementation is completely different.

Regardless, in any of these cases, the implementation for any of these above options may be vastly superior to the “naive” implementation for agents — but then the parent comment here is right that an engineer would need to justify their implementation to users, not just make a loud conjecture. It’s a non-trivial claim to say that a bespoke solution not present in tool-use training and accounting for context-rot would result in a better performing model. Moreover, justifying an agent-specific efficiency gain that humans wouldn’t benefit from makes the claim even more non-trivial. Using Sagan’s razor, it’s then reasonable for people to ask for a comparably non-trivial amount of evidence.

reply
Totally agree. I used to work with a team that built a project for creating ontologies of Git repositories. The goal was to help LLMs onboard faster and navigate the repo better.

In the end, it became heavy overengineering: people no longer understood not only the repo itself, but also the extra layer describing it. Meanwhile, coding assistants are already quite good at reading codebases directly.

reply
Git has worktrees, which provide a means of creating branch linked physical working directories. I built in UI assistance for creating worktress associated with the agent session in https://www.agentkanban.io (an agent integrated kanban board for use with copilot / claude and vs code). I agree, I would rather try and make use of a tool that the agent is already familiar with, unless it's missing features that the agent needs to achieve its goal (which git is not)
reply
Yes but each of these require an entire materialization of all files and assets which for some repos can be a non-trivial amount of waiting time and disk space when you are doing lots of things in parallel. You can also use worktree workflow in oak by cloning into separate spaces. This is sometimes ideal for even our own workflows depending on context, I personally do it often. Mount can be super advantageous in a lot of contexts and if you work at a certain scale, those margins of time savings matter. If you aren't hitting limitations with git - that is great for you and your workflow! Our general philosophy is that no one should feel limited by computers. We have been limited by git in certain contexts and are trying to solve our own problems and we have opened it up in the case it might be useful for others who have run into similar problems.
reply
Running builds on FUSE all the time is likely more wasteful than cloning a typical repo once per session.
reply
* It dramatically improves the speed and context your agents need when working on serious projects: 50% fewer VCS-related tokens and 90% faster per operation.

Sounds like a good optimization to me. VCS is a waste of tokens for sure. I’m intrigued to hear more.

reply
yep. claude keeps "habitually" trying to use `rg -rn` instead of `rg -n` because it was instructed to use "rg" instead of "grep" by Anthropic, but uses arguments for grep: `grep -rn`. My instructions and "memory" are not helping. "Oh, I did it again, and you've instructed me not to". Older tools are better for current "agents".
reply
I cant agree more.
reply
deleted
reply
Totally correct on the burden of proof here. Agents DO know git extremely well. There’s a huge amount of git in model training data, and anything new starts behind because you have to teach the model what it is, what commands to run, and where the sharp edges are. For us “for agents” does not mean “new syntax that we hope agents can read docs for.”

The thing we’re trying to optimize is not whether an agent can remember the command. It’s the runtime shape of agent-driven development.

When an agent drives a VCS through a captured terminal, things that are tolerable for humans become direct costs: clone/setup time, worktree setup, full status output, huge diffs, branch cleanup, interactive prompts, shared-checkout mutation, repeated preflight checks. Those costs show up as wall time, bytes over the wire, transcript tokens, and recovery steps.

So the Oak bet is narrower than “agents can’t use git.” They can. The bet is that if you assume branch-per-agent workflows, lots of parallel sandboxes, large repos, and non-interactive command execution, the VCS interface should have different defaults if you want to optimize for shipping speed and efficiency of token usage. If you're already going fast enough and not running out of tokens - then using oak seems pretty silly.

People do not need to ditch git to try Oak out. One workflow we care about is letting agents work in Oak where the agent-specific costs matter, then exporting back to git for the human review, CI, release, or compliance workflows.

Totally agree this should be provable and benchmarked. The homepage has Oak vs Git numbers because we do not want “for agents” to just be vibes. We’re measuring transcript bytes, estimated tokens, tool calls, wall time, large diff/status behavior, and contention in agent-style workflows. We’re also working on the benchmarks repo in the open: https://oak.space/oak/benchmarks

The exciting part to me is that we can already improve on tokens and timing despite starting with the model-prior deficit you’re describing. If we can win on measured agent workflows while git still has the advantage of being deeply baked into the models, I’m incredibly bullish on where Oak can get to as the tool and the ecosystem matures.

Longer term, if Oak proves useful and sticks around, future frontier models will likely have more Oak examples in training data, which lowers the upfront learning tax for an extra boost.

reply
How did you speed up things (eg clone, worktree setup) compared to Git? Could the same work for human facing tools?
reply
Mostly through networked file system mounts with FSKit/FUSE backing when working on tasks in parallel. May be applicable for human facing tools but I think workflows there are already pretty set with having files locally and mounts need some lifecycles that agents are probably better at handling.
reply