It's possible LLMs can handle this after all! But at least so far we only have existence proofs of humans doing this, not LLMs yet, and I don't think it's easy to be certain how far away LLMs are from doing this. I should distinguish between LLMS and AI more generally here; I'm skeptical LLMs can do this, I think some other kind of more complete AI almost certainly can.
I supposed you could just, I dunno, randomly combine words into every conceivable sentence possible and treat each new sentence as a theory to somehow test and brute force your way through the infinite possible theories you could come up with. But at that point you're closer to the whole infinite random monkeys producing Shakespeare thing than you are to any useful conclusion about intelligence.
This doesn't make any sense, by their nature they can't "guess-and-check" things outside their training set.
And most of the mathematicians seem to welcome this "brute forcing" by the LLMs. It connects pieces that people didn't realize could be connected. That opens up a lot of avenues for further exploration.
Now, if the LLMs could just do something like ingesting the Mochizuki stuff and give us a decent confirmation or disproof ...