undefined

points

[-]

even in one agent, a different starting prompt will have you tracing a very different path through the model.

maybe it still sends you to the same valley, but there's so many parameters and dimensions that i dont think its very likely without also being correct

by throwatdem1231115 minutes ago|

prev|

[-]

It’s superstition that using a different slop generator to “review” the slop from a different brand of slop generator somehow makes things better. It’s slop all the way down.

by xandrius2 hours ago|

prev|

[-]

I think people are misunderstanding reward functions and LLMs.

LLMs don't actually have a reward system like some other ML models.

by storus1 hours ago|

parent|

[-]

They are trained with one, and when you look at DPO you can say they contain an implicit one as well.