Hacker News
new
past
comments
ask
show
jobs
points
by
vincnetas
13 hours ago
|
comments
by
wwind123
12 hours ago
|
[-]
This kind of approach would generally still need human guidance, otherwise these models might get stuck in weird niche corners of the problem space that would not be relevant to any real world project.
reply
by
ben_w
12 hours ago
|
parent
|
[-]
We could call this "reinforcement learning from human feedback" (RLHF) :)
https://en.wikipedia.org/wiki/Reinforcement_learning_from_hu...
reply