1. Pass the raw text through the filter to obtain the spans.
2. Map all the spans back to the original text.
Now you have all the PII information.
I'm suggesting that a model designed for high-accuracy redaction can also be used to find all PII in unredacted text. For example, if I don't already know how to find PII (e.g., regex, NLP, etc.) I can use OpenAI's Privacy Filter model to do the work for me.
And because each span has a type (PRIVATE_NAME, etc.) I don't even need to do any work to find only the specific information I am looking for; something that simple diffing wouldn't do.
I'm not saying it's an issue, I just think it is interesting that a tool designed to protect PII can also be used to find it with minimal effort. And it looks like someone already implemented it: https://github.com/chiefautism/privacy-parser.