That seems a bit like deck-chairs on the Titanic. The hard part isn't icon design, the hard part is (A) ensuring a clear list exists of what the NPC is supposed to ensure the user knows and (B) determining whether those goals were received successfully.
For example, imagine a mystery/puzzle game where the NPC needs to inform the user of a clue for the next puzzle, but the LLM-layer botches it, either by generating dialogue that phrases it wrong, or by failing to fit it into the first response, so that the user must always do a few "extra" interactions anyway "just in case."
I suppose you could... Feed the output into another document of "Did this NPC answer correctly" and feed it to another LLM... but down that path lies [more] madness.
EDIT: Also, having the LLM botch a clue occasionally could be a feature. E.g. a bumbling character that you might need to "interrogate" a bit before you actually get the clue in a way that makes sense, and can't be sure it's entirely correct. That could make some characters more realistic.
Basically you have your big clever LLM generating the outputs, and then you have your small dumb LLM reading them and going “did I understand that? Did it make sense?” - basically emulating the user before the response actually gets to the user. If it’s good, on it goes to the user, if not, the student queries Einstein with feedback to have another crack.
https://openai.com/index/prover-verifier-games-improve-legib...