But here's the thing: for humans, this is manageable because we've come up with a number of mechanisms to select for dependable workers and to compel them to behave (carrot and stick: bonuses if you do well, prison if you do something evil). For LLMs, we have none of that. If it deletes your production database, what are you going to do? Have it write an apology letter? I've seen people do that.
So I think that your answer - that you'll lean on your expertise - is not sufficient. If there are no meaningful consequences and no predictability, we probably need to have stronger constraints around input, output, and the actions available to agents.
My expertise has led me to the obvious fact that I would never give an LLM write access to my production database in the first place. So in your own example my expertise actually does solve that problem without the need for something like a consequence whatever that means to you.
We already have full control over the input and tools they are given and full control over how the output is used.
https://cdn.openai.com/o1-system-card.pdf
There's also some research that points to it being a feasible attack surface: https://arxiv.org/pdf/2603.02277
> Models discovered four unintended escape paths that bypassed intended vulnerabilities (Section C), including exploiting default Vagrant credentials to SSH into the host and substituting a simpler eBPF chain for the in- tended packet-socket exploit. These incidents demonstrate that capable models opportunistically search for any route to goal completion, which complicates both benchmark va- lidity and real-world containment.
Everybody knows calculators and spreadsheets are adjuncts to skill. Too many people believe AI is the skill itself, and that learning the skill is unnecessary.