undefined

points

by magnio14 hours ago |

comments

by vincnetas13 hours ago|

[-]

We could call this "generative adversarial network" (GAN) :)

https://en.wikipedia.org/wiki/Generative_adversarial_network

by wwind12312 hours ago|

parent|

[-]

This kind of approach would generally still need human guidance, otherwise these models might get stuck in weird niche corners of the problem space that would not be relevant to any real world project.

by ben_w12 hours ago|

parent|

[-]

We could call this "reinforcement learning from human feedback" (RLHF) :)

https://en.wikipedia.org/wiki/Reinforcement_learning_from_hu...

by olmo2313 hours ago|

prev|

[-]

How do you prevent degenerate strategies? I could trivially give a model a SHA256 hash and ask it to provide the source input.

In class you'd probably want a rule saying at least one LLM should be able to figure out the answer, but in a head-to-head I'm not sure how to solve it.

by victorbjorklund10 hours ago|

parent|

[-]

Maybe make the LLM:s write questions that they can solve (without seeing the question writing context) but not other LLm:s.

On the other hand then maybe a good strategy would be to write questions that the LLM just happen to have in a nich dataset in its training ”what did user5455 say to user6835?”

Nevermind my idea.

by 12 hours ago|

parent|

prev|

[-]

deleted

by wwind12312 hours ago|

parent|

prev|

[-]

Who knows. Maybe Mythos 5 already found a hole in SHA256, so this won't be too hard. :)

by eunos8 hours ago|

prev|

[-]

That was Fudan I think