undefined

points

[-]

>LLMs grading the answers is relying on the LLM knowing the answer and not just hallucinating it. You also have issues if/when the model refuses to answer, or if it gets stuck in a loop (e.g. if running locally with a heavily quantized model).

And LLMs have gotten good at handling these issues. There is asymmetric difficulty in generating a solution and verifying it correct. And overtime LLMs are getting better and better which allows training on synthetic data to make it better.