upvote
They are using the “gold standard for the evaluation of expert medical computing systems” not a proxy for what a doctor actually does when diagnosing someone.

It may have some utility after diagnosis, but this test doesn’t demonstrate utility for patients.

reply
It will also tell you you're God and/or a toaster. If you're gonna let benchmarks convince you to listen to an LLM on matters of health it's your funeral, just don't get anyone else killed with you please.
reply
But I, SCP-426, am a toaster.
reply