upvote
In TFA they talk a fair bit about how different models perform wrt false positives:

“The results show something close to inverse scaling: small, cheap models outperform large frontier ones.”

reply