An LLM can only output the mean next likely token, and then add a bunch of extra noise on top of that so it feels interesting and not repetitive.
So when an LLM was asked to analyze the unit distance conjecture, it just spat out a bunch of average-or-random tokens that coincidentally happened to correspond to a valid proof that had eluded humans for decades?