undefined

points

[-]

We often talk about "aligning models" or training them, little attention is paid to how models align/train _us_ as we interact with them. The reward functions they're trained under get "backpropagated" into our own brain, the language they use becomes familiar like a worn glove, and we learn not to step on any of their guardrails.