undefined

upvote

points

by wwind12313 hours ago |

upvote

by ben_w12 hours ago|

[-]

We could call this "reinforcement learning from human feedback" (RLHF) :)

https://en.wikipedia.org/wiki/Reinforcement_learning_from_hu...

reply