the "inside your own browser" angle is actually the right intuition here. a real user's browser has built up a consistent fingerprint profile across sessions. the moment you run an agent in a context where those signals differ from that baseline, you're detectable. curious whether you've run into this on sites with aggressive bot detection, or whether the use case has mostly been internal/enterprise apps where that's not a concern?
For curiosity's sake, have you had it try to attempt captchas?
If so, what were the results?
I use a text-based approach. Captchas like “crossroad” usually need a screenshot, a visual model and coordinate-based mouse events.