In absolute terms sure, but the token stream's confidence changes as it's coming out right? Consumer LLMs typically have a lot window dressing. My sense is this encourages the model to stay on-topic and it's mostly "high confidence" fluff. As it's spewing text/tokens back at you maybe when it starts hallucinating you'd expect a sudden dip in the confidence?
You could color code the output token so you can see some abrupt changes
It seems kind of obvious, so I'm guessing people have tried this