undefined

points

[-]

That's a decent practice from the lens of reducing blast radius. It becomes harder when you start thinking about unattended systems that don't have you in the loop.

One problem I'm finding discussion about automation or semi-automation in this space is that there's many different use cases for many different people: a software developer deploying an agent in production vs an economist using Claude Vs a scientist throwing a swarm to deal with common ML exploratory tasks.

Many of the recommendations will feel too much or too little complexity for what people need and the fundamentals get lost: intent for design, control, the ability to collaborate if necessary, fast iteration due to an easy feedback loop.

AI Evals, sandboxing, observability seem like 3 key pillars to maintain intent in automation but how to help these different audiences be safely productive while fast and speak the same language when they need to product build together is what is mostly occupying my thoughts (and practical tests).

by daveguy11 hours ago|

parent|

[-]

Current LLMs are nowhere near qualified to be autonomous without a human in the loop. They just aren't rigorous enough. Especially the "scientist throwing a swarm to deal with common ML exploratory tasks." The judgement of most steps in the exploratory task require human feedback based on the domain of study.

> Many of the recommendations will feel too much or too little complexity for what people need and the fundamentals get lost: intent for design, control, the ability to collaborate if necessary, fast iteration due to an easy feedback loop.

Completely agreed. This is because LLMs are atrocious at judgement and guiding the sequence of exploration is critically dependent on judgement.

by shich5 hours ago|

prev|

[-]

The proxy approach for secret injection is the right mental model, but it only works if the proxy itself is hardened against prompt injection. An agent that can't access secrets directly can still be manipulated into crafting requests that leak data through side channels — URL params, timing, error messages.

The deeper issue: most of these guardrails assume the threat is accidental (agent goes off the rails) rather than adversarial (something in the agent's context is actively trying to manipulate it). Time-boxed domain whitelists help with the latter but the audit loop at session end is still reactive.

The /revert snapshot idea is underrated though. Reversibility should be the first constraint, not an afterthought.

by buremba5 hours ago|

parent|

[-]

> but it only works if the proxy itself is hardened against prompt injection.

Yes, I'm experimenting using a small model like Haiku to double check if the request looks good. It adds quite a bit of latency but it might be the right approach.

Honestly; it's still pretty much like early days of self driving cars. You can see the car can go without you supervising it but still you need to keep an eye on where it's going.

by Doublon11 hours ago|

prev|

[-]

I'd like to try a pattern where agents only have access to read-only tools. They can read you emails, read your notes, read your texts, maybe even browse the internet with only GET requests...

But any action with side-effects ends up in a Tasks list, completely isolated. The agent can't send an email, they don't have such a tool. But they can prepare a reply and put it in the tasks list. Then I proof-read and approve/send myself.

If there anything like that available for *Claws?

by swid8 hours ago|

parent|

[-]

There is no real such thing as a read only GET request if we are talking about security issues here. Payloads with secrets can still be exfiltrated, and a server you don’t control can do what it wants when it gets the request.

by zahlman7 hours ago|

parent|

prev|

[-]

GET and POST are merely suggestions to the server. A GET request still has query parameters; even if the server is playing by the book, an agent can still end up requesting GET http://angelic-service.example.com/api/v1/innocuous-thing?pa... and now your `dangerous-secret` is in the server logs.

You can try proxying and whitelisting its requests but the properly paranoid option is sneaker-netting necessary information (say, the documentation for libraries; a local package index) to a separate machine.

by fnord7711 hours ago|

prev|

[-]

> 1. Don't let it send emails from your personal account, only let it draft email and share the link with you.

Right now there's no way to have fine-grained draft/read only perms on most email providers or email clients. If it can read your email it can send email.

> 3. Don't let your agents see any secret. Swap the placeholder secrets at your gateway and put human in the loop for secrets you care about.

harder than you might think. openclaw found my browser cookies. (I ran it on a vm so no serious cookies found, but still)

by buremba11 hours ago|

parent|

[-]

> Right now there's no way to have fine-grained draft/read only perms on most email providers or email clients. If it can read your email it can send email.

> harder than you might think. openclaw found my browser cookies. (I ran it on a vm so no serious cookies found, but still)

You should never give any secrets to your agents, like your Gmail access tokens. Whenever agents needs to take an action, it should perform the request and your proxy should check if the action is allowed and set the secrets on the fly.

That means agents should not have access to internet without a proxy, which has proper guardrails. Openclaw doesn't have this model unfortunately so I had to build a multi-tenant version of Openclaw with a gateway system to implement these security boundaries.

by zahlman7 hours ago|

parent|

[-]

> That means agents should not have access to internet without a proxy, which has proper guardrails. Openclaw doesn't have this model unfortunately so I had to build a multi-tenant version of Openclaw with a gateway system to implement these security boundaries.

I wonder how long until we see a startup offering such a proxy as a service.

by arianvanp11 hours ago|

parent|

prev|

[-]

Literally every email client on the planet has supported `mailto:` URIs since basically the existence of the world wide web.

Just generate a mailto Uri with the body set to the draft.

by zahlman7 hours ago|

parent|

prev|

[-]

> harder than you might think. openclaw found my browser cookies. (I ran it on a vm so no serious cookies found, but still)

It's easy, and you did it the right way. Read "don't let your agents see any secret" as "don't put secrets in a filesystem the agents have access to".

by livestories5 hours ago|

parent|

prev|

[-]

I think mailto: links they output (a la

https://mailtolink.me/

) are a great way to get these drafts out even.