EDIT: Also, having the LLM botch a clue occasionally could be a feature. E.g. a bumbling character that you might need to "interrogate" a bit before you actually get the clue in a way that makes sense, and can't be sure it's entirely correct. That could make some characters more realistic.
Basically you have your big clever LLM generating the outputs, and then you have your small dumb LLM reading them and going “did I understand that? Did it make sense?” - basically emulating the user before the response actually gets to the user. If it’s good, on it goes to the user, if not, the student queries Einstein with feedback to have another crack.
https://openai.com/index/prover-verifier-games-improve-legib...