You can always narrow down a capability (get a new capability pointing to a subdirectory or file, or remove the writing capability so it is read only) but never make it more broad.
In a system designed for this it will be used for everything, not just file system. You might have capabilities related to network connections, or IPC to other processes, etc. The latter is especially attractive in microkernel based OSes. (Speaking of which, Redox OS seems to be experimenting with this, just saw an article today about that.)
See also https://en.wikipedia.org/wiki/Capability-based_security
Admittedly, there’s a little more friction and agent confusion sometimes with this setup, but it’s worth the benefit of having zero worries about permissions and security.
Wrong layer. You want the deletion to actually be impossible from a privilege perspective, not be made practically harder to the entity that shouldn't delete something.
Claude definitely knows how to reimplement `rm`.
I have seen the AI break out of (my admittedly flimsy) guards, like doing simply
safepath/../../stuff or something even more convoluted like symlinks.
That would make it far less useful in general.
1) more guardrails in place
2) maybe more useful error messages that would help LLMs
3) no friction with needing to get any patches upstreamed
External tool calling should still be an option ofc, but having utilities that are usable just like what's in the training data, but with more security guarantees and more useful output that makes what's going on immediately obvious would be great.
But that's also the most damaging actions it could take. Everything on my computer is backed up, but if Claude insults my boss, that would be worse.
Oh, I'm totally not arguing for cutting off other capabilities, I like tool use and find it to be as useful as the next person!
Just that the shell tools that will see A LOT of usage have additional guardrails added on top of them, because it's inevitable that sooner or later any given LLM will screw up and pipe the wrong thing in the wrong command - since you already hear horror stories about devs whose entire machines get wiped. Not everyone has proper backups (even though they totally should)!
And when that fails for some reason it will happily write and execute a Python script bypassing all those custom tools