This is like saying that LLMs can evaluate paintings better than art experts. But only when looking at data that can be communicated via text.
Of course they can, because it makes no sense to do such a thing.
That actually seems like a good application – automatically get a quick AI second opinion for everything; if it's dissenting the first/human medic can re-review, or comment why it's slop, or get a third/second-human opinion.
(I'm assuming most cases would be You're absolutely right, that's an astute diagnosis.)