undefined

points

[-]

The thing that seems to bring up these extremely unlikely destructive token sequences and it totally seems to be letting agents just run for a long time. I wonder if some kind of weird subliminal chaos signal develops in the context when the AI repeatedly consumes its own output.

Personally I don't even let my agent run a single shell command without asking for approval. That's partly because I haven't set up a sandbox yet, but even with a sandbox there is a huge "hazard surface" to be mindful of.

I wonder if AI agent harnesses should have some kind of built-in safety measure where instead of simply compacting context and proceeding, they actually shut down the agent and restart it.

That said I also think even the most advanced agents generate code that I would never want to base a business on, so the whole thing seems ridiculous to me. This article has the same energy as losing money on NFTs.

by mike_hearn5 hours ago|

parent|

[-]

I don't think it's that. It's really all about context. Humans always have at least a bit of context so it's hard for us to imagine what it's like to have none at all. But the AI genuinely has none. And it's under (training) pressure to get the task done quickly, be a yes man, and so on.

Humans do make mistakes like these. I'm not sure where the fault really lies here. I can imagine a human under time pressure making the same error. It's maybe a goof in the safety design of railway. It shouldn't be possible to delete all your backups with a single API call using a normal token.

by coalstartprob14 hours ago|

prev|

[-]

[dead]