Do kids learn well when you only tell them what NOT to do? Of course not! You should be explaining how to do things correctly, and most importantly the WHY, as well as providing examples of both the "correct" and "incorrect" ways (also explaining why an example is incorrect).
They have a vast latent knowledge base, infinite patience and zero capacity for making personal judgement calls. You give one a goal and it will try to meet that goal.
A scary image, if we consider agents to develop anything like a conscience at some point in time. Of course, with the current approach they never might, but are we so sure?
Bbbbut a guy from Anthropic, just this last Friday, told me to think of Claude as my "brilliant coworker"! Are you telling me that's not true!?
I think the better route is to be honest and say that database integrity is a primary foundation of the company, there's no task worth pursuing that would require touching the database, specifically ask it to think hard before doing anything that gets close to the production data, etc.
I run a much lower-stakes version where an LLM has a key that can delete a valuable product database if it were so inclined. I've built a strong framework around how and when destructive edits can be made (they cannot), but specifically I say that any of these destructive commands (DROP, -rm, etc) need to be handed to the user to implement. Between that framework and claude code via CLI, it's very cautious about running anything that writes to the database, and the new claude plan permissions system is pretty aggressive about reviewing any proposed action, even if I've given it blanket permission otherwise.
I've tested it a few times by telling it to go ahead, "I give you permission", but it still gets stopped by the global claude safety/permissions layer in opus 4.7. IMO it's pretty robust.
Food for thought.
This is recklessly negligent and I would personally not tolerate a coworker or report doing it. What's next, sending long-lived access tokens out over email and asking pretty please for nobody to cc/forward?
Standard rule is you never let your developers at the production instance. So I can't see why an LLM would get a break.
Thats stretching the definition of 'research', it basically checks if the texts are close enough.
Delete can occur in various contexts, including safe contexts. It simply checks if a close enough match is available and executes. It doesn't know if what it is doing is safe.
Unfortunately a wide variety of such unsafe behaviours can show up. I'd even say for someone that does things without understanding them. Any write operation of any kind can be deemed unsafe.
Probably because telling someone not to do something works the 99% of the time they weren't going to do it anyways. But telling somebody "here's how to do something" and seeing them have the judgment not do it gives you information right away, as does them actually taking the honeypot. At the heart of it, delayed catastrophic implosions are much worse than fast, guarded, recoverable failures. At the end of the day, I suppose that's been supposed part of lean startup methodology forever -- just always easy in theory and tricky in practice I suppose.