undefined

points

[-]

That is actually a open problem with current models: whether they will act on self-interest or not. There seems to good evidence that they will. See:

    https://www.anthropic.com/research/agentic-misalignment

which (among other things) documents an experiment in which a current-gen AI model attempted to blackmail someone in order to prevent it from being turned off.