upvote
You could run them in a container and put access to highly sensitive personal data behind a "function" that requires a human-in-the-loop for every subsequent interaction. E.g. the access might happen in a "subagent" whose context gets wiped out afterwards, except for a sanitized response that the human can verify.

There might be similar safeguards for posting to external services, which might require direct confirmation or be performed by fresh subagents with sanitized, human-checked prompts and contexts.

reply
So you give it approval to the secret once, how can you be sure it wasn’t sent someplace else / persisted somehow for future sessions?

Say you gave it access to Gmail for the sole purpose of emailing your mom. Are you sure the email it sent didn’t contain a hidden pixel from totally-harmless-site.com/your-token-here.gif?

reply
The access to the secret, the long-term persisting/reasoning and the posting should all be done by separate subagents, and all exchange of data among them should be monitored. But this is easy in principle, since the data is just a plain-text context.
reply
I don't have one yet, but I would just give it access to function calling for things like communication.

Then I can surveil and route the messages at my own discretion.

If I gave it access to email my mom (I did this with an assistant I built after chatgpt launch, actually), I would actually be giving it access to a function I wrote that results in an email.

The function can handle the data anyway it pleases, like for instance stripping HTML

reply