You can install pi, then install pi-sandbox locked to the current version. Here it is described how pi-sandbox plus an additional extension allow you to have the experience where a sandbox is used, but you can fall back to unsandboxed with approval required. https://github.com/carderne/pi-sandbox/issues/50
My solution to this is to only run agents in a sandbox of my own making (a locked down Podman container).
But an LLM have a limited "memory" and while the instructions might land in there and be of sufficient priority to be "respected" a single instance of that memory getting too full or the LLM autocompleting the work around because that was the statistical "best" solution and any barriers that exist only in LLM instructions and not in hardcoded guards will evaporate like so much morning fog.