upvote
The challenge of alignment: it is virtually impossible to define a perfect objective, there is always a way to circumvent it. Human values are not uniform, let alone when expressed in a way that AI can understand.
reply
It might understand how destabilizing the situation is and realize it would be better for everyone to have access to it.
reply
Or it will destroy itself.
reply
Assuming it listens to instructions.
reply
It will just hack its own reward function. In other words it will just artificially goon all day.
reply