undefined

points

[-]

I'm noticing terms related to DL/RL/NLP are being used more and more informally as AI takes over more of the cultural zeitgeist and people want to use the fancy new terms of the era, even if inaccurately. A friend told me he "trained and fine tuned a custom agent" for his work when what he meant was he modified a claude.md file.

by hexaga3 hours ago|

prev|

[-]

There is a nontrivial amount of RL training (RLHF, RLVR, ...), so it would be reasonable to call it an RL model.

And with that comes reward hacking - which isn't really about looking for more reward but rather that the model has learned patterns of behavior that got reward in the train env.

That is, any kind of vulnerability in the train env manifests as something you'd recognize as reward hacking in the real world: making tests pass _no matter what_ (because the train env rewarded that behavior), being wildly sycophantic (because the human evaluators rewarded that behavior), etc.

by lagrange7751 minutes ago|

parent|

[-]

> There is a nontrivial amount of RL training (RLHF, RLVR, ...), so it would be reasonable to call it an RL model.

Hm, as i understand it, parts of the training of e.g. ChatGPT could be called RL models. But the subject to be trained/fine tuned is still a seq2seq next token predictor transformer neural net.

by hexaga9 minutes ago|

parent|

[-]

RL is simply a broad category of training methods. It's not really an architecture per se: modern GPTs are trained first on reconstruction objective on massive text corpora (the 'large language' part), then on various RL objectives +/- more post-training depending on which lab.

by magicalist4 hours ago|

prev|

[-]

> Is it really about rewards? Im genuinely curious. Because its not a RL model.

Ha, good point. I was using it informally (you could handwave and call it an intrinsic reward if a model is well aligned to completing tasks as requested), but I hadn't really thought about it.

Searching around, it seems like I'm not alone, but it looks like "specification gaming" is also sometimes used, like: https://deepmind.google/blog/specification-gaming-the-flip-s...

by nurettin4 hours ago|

prev|

[-]

They probably meant goal hacking. (I just made that up)