upvote
[author]

I agree it is not much additional evidence! If someone wanted to try running the same test on a series of N commits from that list including this one I'd be very curious to see the answer!

reply
Realistically, if you are scanning each kernel commit to check if they might be patching a security issue, you are going to be asking an LLM "is this security related, if so vaguely how" with low effort and taking "maybe" as a yes before feeding it to a more expensive model. You aren't trying to establish a probability of an ultimately unknowable fact, there is ground truth that you can find by producing an exploit, so you are just trying to pre-filter before spending the money to find it.
reply
Yeah, ideally we would need the phi coefficient (aka MCC, the binary Pearson correlation), which can be calculated from a confusion matrix of yes/no LLM classifications for all kernel diffs. (Number of true positives, true negatives, false positives, false negatives.)
reply