In my experience LLMs often have really solid insights in the thinking chains then vomit a nonsense score that doesn't make sense.
Now I'm not sure if this is actually an LLM only thing. Because I think people probably do similar when you ask them to give a number to things without providing a concrete grading rubric...
The fact that the LLM appears to never assign an actual 0 or 10 makes me suspicious. Especially when the prompt includes explicit examples of what counts as a 10.