undefined

points

[-]

Weirdly, the existing first party solutions around denying commands don't seem to help here.

Often enough, when one of the agents prompts for running "sudo", and I reject it, it will do what looks very much like malicious exploration to figure out how to handle things anyway, including once hijacking a separate shell's pty where I did have a valid sudo session already in order to execute some commands.

We don't yet have the capability to make these models behave in a consistent, deterministic, or safe manner yet, so a first party solution isn't even necessarily that much better. Especially if it gives a false sense of security.