undefined

points

[-]

I don't trust the harness, and I especially don't trust that the LLM won't be able to subvert the harness, or trick me via the harness. I assume that the LLM will be able to leak any secret in the harness context to arbitrary internet destinations, or somehow encode the secret in a work product. Eg space characters at the end of lines encoding access tokens.

Having the harness in one VM, and tool use applied to user data in another, is about as safe as you can be at present. You can mount filesystem fragments from the data VM into the harness VM, but tool execution remains painful.

Having all authorisation and access control exist outside of the harness layer is essential. It should only have narrowly scoped and time limited credentials that are bound to its IP, and even then that is problematic.

by TeMPOraL5 hours ago|

prev|

[-]

> Effectively you have an arm of the lethal trifecta and pretending otherwise is more dangerous than helpful.

"Lethal trifecta" is basically describing phishing but in a way more palatable to people who would rather die before allowing themselves to anthropomorphize LLMs even a little bit. It's not a problem you can fix with better coding, like some SQL injection. You can only manage risk around it (for which sandboxing is one of many solutions that can help).

So on one hand, I agree with you - you need to be mindful of what you're actually dealing with. On the other hand, you always have this, and need this, for the agent to be able to do anything useful.

by 3form1 minutes ago|

parent|

[-]

Phishing is only a subset of the issue, so I don't think that name's appropriate, besides being used for other things in other contexts (which would be another reason for me not to try and overload it).

by aluzzardi12 hours ago|

prev|

[-]

Author here.

I should have made it more clear that the article is about agent / harness building (not about running third party agents).

> I barely trust the harness more than the LLM

Since we built it, I trust it just as much as I trust our API server :)

The latter gets untrusted inputs from the internet, while the former gets untrusted inputs from the LLM

by bauerd3 hours ago|

prev|

[-]

>Having said that, some components need to live outside the sandbox (otherwise, who creates the sandbox?).

I run a single-node k3d cluster on each of my MacBooks which uses Agent Sandbox[0] to keep harnesses isolated. Harnesses access models through LiteLLM only. I have aliases for `kubectl exec`ing into whatever harness I need.

[0] https://agent-sandbox.sigs.k8s.io

by gmerc9 hours ago|

prev|

[-]

The LLM has harness control in claude ;) “Let me switch off the sandbox and try again”

by tantalor9 hours ago|

prev|

[-]

> if your harness has an ability to do something the LLM can't

What does this even mean. The only capability of an LLM is generate text.

by jbstack5 hours ago|

parent|

[-]

The LLM can only generate text. The harness can do more than just generate text. By joining the two you're allowing the LLM (through text) to carry out whatever actions the harness can take.

My brain can only generate electrical signals. My hand responds to electrical signals and can interact with the real world. The two together can do more than just what my brain alone can do.

If you don't trust a particular brain, don't put a gun in the hand which is connected to it. If you don't trust a LLM, don't connect it to a harness which has access to your production database and only recent backups (https://www.theregister.com/2026/04/27/cursoropus_agent_snuf...).

by girvo4 hours ago|

parent|

prev|

[-]

We’ve trained models on JSON schemas for “tool calls”, and then built software to interpret and run those calls for the LLMs