undefined

upvote

points

by mjburgess4 hours ago |

upvote

by Retr0id4 hours ago|

[-]

It's not even an anthropomorphization, the reward function in RLHF-like scenarios is usually quite literally "did the user think the output was good"

reply