Unfortunately, it is not, and many of its attempts at mathematical proofs have major flaws. You shouldn't trust its proofs unless you are already able to evaluate them--which I think is pretty much all the OP is saying.
There is one exception to this: If the AI also delivers the proof of why the math is correct, in a machine-checked format, and I understand the correctness theorem (not necessarily its proof). Then I would use it without hesitation.
I struggle to remember even relatively simple maths like working out "what percentage of X is Y" so if I write a formula like that I'll put in some simple values like 12 and 6 or 10,000 and 2,456 just to confirm I haven't got the values backwards or something. I've been shown sheets where someone put a formula in that they don't understand, checked it with numbers they can't easily eyeball and just assumed it was right as it's roughly in their ball park / they had no idea what the end result should be.
Then again I've also seen sheets where a 10% discount column always had a larger number than the standard price so even obviously wrong things aren't always checked.
I've reached solutions by trial and error too, and tried to rationalize them later, quite a few times. And it's easier to rationalize a working solution, however adversarial you claim to be in your rationalization.
I don't see using gen AI for the (not so) “brute force” exploration of the solution space as that different from trial and error and post fact rationalization.
Normally with mathematical problems you have to prove the solution correct. Testing is not sufficient, unless you can test all possible inputs exhaustively.
If it’s beyond our ability to review and we blindly trust it’s correct based on a limited set of tests… we’re asking for trouble.
I’m going to guess that this is Gell-Mann amnesia more than anything, and it’s going to get a lot of organizations into a lot of weird places.
... that can't even count.
"PhD level" just means you finished a bachelor and masters degree and are now doing a bit of original research as an employed research assistant.
Claude isn't "PhD level" anything. This shows a complete lack of understanding here. Claude has read every single text book in existence, so it can surface knowledge locked away in book chapters that people haven't read in years (nobody really reads those dense books on niche topics from start to finish).
Since Claude has infinite patience, you can just keep asking until you get it.