I'm not sure if your intent was to come across as having written this yourself, but it did not have the effect of improving my perception that this approach is flawed.
I was also disappointed that you didn't address the variability in scores. I'm inferring that you believe the larger model takes care of the main observation in the post, but I don't really see you directly addressing the points.
Maybe it's just me.
Reading this thread, I'm hoping to minimize the variability even further (even though I know it can't be fully removed).
Or are you using it to screen? I'm confused.
Rest of the ones with good scores (at least more than 40K), was reviewed manually.
>>No human can read that many resumes well. So I built something to rank them, helping me decide which resumes to read first
Translation: it's an ATS.
>>the system was designed to rank resumes, not reject them
>>Only resumes at the very bottom of the distribution were filtered out
Translation: it was designed to reject the CVs