undefined

points

by simonw11 hours ago |

comments

by e1g11 hours ago|

[-]

Thank you for your work - I have sent many of your links to my people.

Your point is totally fair for evaluating security tooling. A few notes -

1. I implemented this in Bash to avoid having an opaque binary in the way.

2. All sandbox-exec profiles are split up into individual files by specific agent/integration, and are easily auditable (https://github.com/eugene1g/agent-safehouse/tree/main/profil...)

3. There are E2E tests validating sandboxing behavior under real agents

4. You don't even need the Safehouse Bash wrapper, and can use the Policy Builder to generate a static policy file with minimal permissions that you can feed to sandbox-exec directly (https://agent-safehouse.dev/policy-builder). Or feed the repo to your LLMs and have them write your own policy from the many examples.

5. This whole repo should be a StrongDM-style readme to copy&paste to your clanker. I might just do that "refactor", but for now added LLM instructions to create your own sandbox-exec profiles https://agent-safehouse.dev/llm-instructions.txt

by big_toast9 hours ago|

parent|

[-]

I love this implementation. Do you find the SBPL deficient in any ways?

Would xcodebuild work in this context? Presumably I'd watch a log (or have an agent) and add permissions until it works?

by e1g4 hours ago|

parent|

[-]

SBPL is great for filesystem controls and I haven’t hit roadblocks yet. I wish it offered more controls of outbound network requests (ie filtering by domain), but I understand why not.

Yes, Safehouse should work for xcodebuild workloads in the way you described - try to run it, watch for failures, extend the profile, try again. Your agent can do this in a loop by itself - just feed it the repo as there are many integrations that are not enabled by default that will help it.

by kstenerud6 hours ago|

prev|

[-]

If you're looking for one better documented and tested, you might like https://github.com/kstenerud/yoloai

by okanesen6 hours ago|

parent|

[-]

I'm having trouble understanding what makes this: "better documented and tested"? Care to elaborate how the testing was done? What are the differences?

by vasco5 hours ago|

prev|

[-]

So create a 'destroy my computer' test harness and run it whenever you test another wrapper. If it works you'll be fine. If it doesn't you buy a new computer.