Much moreso than modern AI systems are.
In humans, it seems that improvement in a new domain seems to follow a logarithmic scale.
Why wouldn’t this be the same for an AI?
If anything, using AI, they may improve more than before.
Is medical diagnosis one of these high judgement tasks? Personally I don’t think so.
If the latter part of your post were true, how come the demand for radiologists has grown? The problem with this place is it’s full of people who don’t understand nuance. And your post demonstrates this emphatically.
The first is that a technical solution can be trained on _ALL_ medical data and have access to it all in the moment. It is difficult to assume a doctor could also achieve this.
The second is that for medical cases understanding the sum of all symptoms and the patients vitals would lead to an accurate diagnosis a majority of the time. AI/ML is entirely about pattern recognition, when you combine this with point one, you end up with a system that can quickly diagnose a large portion of patients in extremely short timeframes.
On a different note, I think we can leave the ad-hominem attacks at home please.
Quite to the contrary, I think it's extremely trivial to find a task where humans beat LLMs.
For all the money that's been thrown at agentic coding, LLMs still produce substantially worse code than a senior dev. See my own prior comments on this for a concrete example [1].
These trivial failure cases show that there are dimensions to task proficiency - significant ones - that benchmarks fail to capture.
> Is medical diagnosis one of these high judgement tasks?
Situational. I would break diagnosis into three types:
1. The diagnosis comes from objective criteria - laboratory values, vital signs, visual findings, family history. I think LLMs are likely already superior to humans in this case.
2. The diagnosis comes from "chart lore" - reading notes from prior physicians and realizing that there is new context now points to a different diagnosis. (That new context can be the benefit of hindsight into what they already tried and failed and/or new objective data). LLMs do pretty good at this when you point them at datasets where all the prior notes were written by humans, which means that those humans did a nontrivial part of the diagnostic work. What if the prior notes were written by LLMs as well? Will they propagate their own mistakes forward? Yet to be studied in depth.
3. The diagnosis comes from human interaction - knowing the difference between a patient who's high as a bat on crack and one who's delirious from infection; noticing that a patient hesitates slightly before they assure you that they've been taking all their meds as prescribed; etc. I doubt that LLMs will ever beat humans at this, but if LLMs can be proven to be good at point 2, then point 3 alone will not save human physicians.
[1] https://news.ycombinator.com/threads?id=Calavar#47891432
Agree with your division but I'm baffled by this argument. If humans are better than machines at point 3 and can also use a machine to do point 2, then unless they have particularly terrible biases against taking point 2 data into account they're going to be strictly better than machines alone. Doctors have costs, but they're costs people/society are generally willing to underwrite, and misdiagnosis also has costs...
I and likely the person who you replayed to don't find that existing studies actually hold this to be true.