undefined

points

[-]

AFAIK Claude doesn't support it, but if you're willing to go the extra mile, you can get creative with some bash script: https://pastebin.com/raw/m9YQ8MyS (generated this a second ago - just to get the point across )

To be clear, I don't do this. I never saw an agent cheat by peeking or something. I really did look through their logs.

I'd be very interested to see claude code and other tools support this pattern when dispatching agents to be really sure.

by achierius3 hours ago|

parent|

[-]

> To be clear, I don't do this.

How do you know that it works then? Are you using a different tool that does support it?

by skybrian3 hours ago|

parent|

prev|

[-]

So what do you do? Do you define roles somewhere and tell the agent to assign these roles to subagents?

by ssk422 hours ago|

parent|

[-]

Fun to see you not on tildes.

Setting up a clean room is one of the only ways to do Evals on agentic harnesses. Especially prevalent with Windsurf which doesn’t have an easy CLI start.

So how? The easiest answer when allowed is docker. Literally new image per prompt. There’s also flags with Claude to not use memory and from there you can use -p to have it just be like a normal cli tool. Windsurf requires manual effort of starting it up in a new dir.

by skybrian22 minutes ago|

parent|

[-]

Sounds interesting, but I'm not quite getting the relevance for people writing code with an agent. Should I be doing evals?