OpenAI Privacy Filter

upvote

OpenAI Privacy Filter

(openai.com)

229 points

by tanelpoder3 days ago |

upvote

by fzxu226 hours ago|

[-]

Working on this: https://github.com/KevinXuxuxu/anon_proxy, a sort of anonymization proxy to use with LLM providers. It does model (OpenAI privacy filter) + regex PII detection, and replaces them back-and-forth for API requests and responses. With locally hosted detection model, no PII leaves your local environment. I find it very useful especially when you're working on sensitive documents (legal, tax, immigration etc.), hope you find it helpful as well :)

reply

upvote

by stingraycharles1 hours ago|

[-]

How does it handle “unredaction” in responses? E.g. let’s say the LLM does something with the document. You redacted its input, so it emits redacted content. Now what?

reply

upvote

by blfr1 hours ago|

[-]

This is very cool because it allows you to use any model. Obviously, it still lets the model and its operator see the entire context of the conversation.

I quite like Moxie's Confer[1] approach to just encrypt the whole thing in such a way that no one except the end-user sees the plaintext.

[1] https://confer.to/

reply

upvote

by stratos1233 days ago|

[-]

There's some interesting technical details in this release:

> Privacy Filter is a bidirectional token-classification model with span decoding. It begins from an autoregressive pretrained checkpoint and is then adapted into a token classifier over a fixed taxonomy of privacy labels. Instead of generating text token by token, it labels an input sequence in one pass and then decodes coherent spans with a constrained Viterbi procedure.

> The released model has 1.5B total parameters with 50M active parameters.

> [To build it] we converted a pretrained language model into a bidirectional token classifier by replacing the language modeling head with a token-classification head and post-training it with a supervised classification objective.

reply

upvote

by LatencyKills10 hours ago|

[-]

Couldn't this be used to locate private data in unstructured text without having to rely on other means of PII detection?

1. Pass the raw text through the filter to obtain the spans.

2. Map all the spans back to the original text.

Now you have all the PII information.

reply

upvote

by Everdred2dx9 hours ago|

[-]

Yep, and already has been done.

https://github.com/chiefautism/privacy-parser

reply

upvote

by yjftsjthsd-h8 hours ago|

[-]

If you have the redacted and unredacted versions, then you can diff them; that seems unsurprising? Unless I'm really misunderstanding "spans"?

reply

upvote

by LatencyKills1 hours ago|

[-]

> If you have the redacted and unredacted versions, then you can diff them; that seems unsurprising?

I'm suggesting that a model designed for high-accuracy redaction can also be used to find all PII in unredacted text. For example, if I don't already know how to find PII (e.g., regex, NLP, etc.) I can use OpenAI's Privacy Filter model to do the work for me.

And because each span has a type (PRIVATE_NAME, etc.) I don't even need to do any work to find only the specific information I am looking for; something that simple diffing wouldn't do.

I'm not saying it's an issue, I just think it is interesting that a tool designed to protect PII can also be used to find it with minimal effort. And it looks like someone already implemented it: https://github.com/chiefautism/privacy-parser.

reply

upvote

by nl8 hours ago|

[-]

I'm no where near as smart as OpenAI of course, but I did build https://tools.nicklothian.com/webner/index.html that uses a BERT based named-entity-recognition model running in your browser to do a subset of PII redaction.

It works pretty well for the use cases I was playing with.

The OpenAI model is small enough that I might enhance my tool to use it.

reply

upvote

by stingraycharles7 hours ago|

[-]

I just used it on a document, but the amount of false positives this generates make it faily difficult to use?

I fed it a ~ 100 line markdown document, took about 10 seconds, and it decided that "matter" (as in, frontmatter), "end" (as in, frontend), MCP (as in, mcp server) are organizations.

Most of them don't even make grammatical sense, e.g. "Following the discussion in <PERSON_1>, blahblah".

Brings me back to what NLP was like a decade ago. I always thought spaCy was a very nice project in that space.

reply

upvote

by nl7 hours ago|

[-]

Yeah this really is roughly NLP ~10 years ago.

It does work better on plain text than markdown because of casing. I can't see what you used (kinda the point - because it run all in your browser) but if you can share the markdown as a gist or something I can take a look and comment more concretely.

reply

upvote

by maciejzj4 hours ago|

[-]

On a side note, when I click the link it redirects me to machine-translated version of OpenAI website with completely botched meaning - the word “redacted” is translated to a false friend “redagować” which means to edit/refine text, not anonymize.

reply

upvote

by mplanchard3 days ago|

[-]

It would be nice if their examples weren’t mostly things that are easy to catch with regex, but it’s cool to see if released as an open, local model.

reply

upvote

by JLO6410 hours ago|

[-]

For my customers I use regexes to block them from potentially publishing personal emails/phone numbers to their websites but I really wouldn't mind running this in addition just for the extra peace of mind. I don't have a GPU on our server, but I hope this is light enough of a model to handle CPU only inference on less than 2k tokens at a time.

reply

upvote

by mayneack9 hours ago|

[-]

Curious how this compares to presidio which mixes regex with a model: https://microsoft.github.io/presidio/

reply

upvote

by pros5 hours ago|

[-]

[dead]

reply

upvote

by mentalgear2 days ago|

[-]

SuperagentLM made available on-edge PPI redaction models already a few years ago in sizes 20B, 3B, 200M. They still seem to be available via their legacy API - well worth checking out to compare against this one. https://docs.superagent.sh/legacy/llms/superagent-lm-redact-...

reply

upvote

by hiAndrewQuinn3 days ago|

[-]

I'm surprised nobody else has commented on this. This is a very straightforward and useful thing for a small locally runnable model to do.

reply

upvote

by apothegm3 days ago|

[-]

And also something that it’s dangerous to try to do stochastically.

reply

upvote

by agnishom18 minutes ago|

[-]

One could chain a regex based system together with this

reply

upvote

by hiAndrewQuinn3 days ago|

[-]

It's going to be stochastic in some sense whether you want it to be or not, human error never reaches zero percent. I would bet you a penny you'd get better results doing one two-second automated pass + your usual PII redaction than your PII redaction alone.

reply

upvote

by ori_b9 hours ago|

[-]

The advantage of computers was that they didn't make human errors; they did things repeatedly, quickly, and predictably. If I'm going to accept human error, I'd like it to come from a human.

reply

upvote

by creesch5 hours ago|

[-]

> The advantage of computers was that they didn't make human errors;

Sure they do, computers repeatedly, quickly, and predictably do what they are programmed to do. Which includes any human errors in that programming.

reply

upvote

by laserlight4 hours ago|

[-]

> predictably do what they are programmed to do

And now they predictably do what they are not programmed to do.

reply

upvote

by pigpag8 hours ago|

[-]

[dead]

reply

upvote

by cyanydeez3 days ago|

[-]

I think the problem is most secrets arn't stochastic; they're determinant. When the user types in the wrong password, it should be blocked. Using a probabilistic model suggests an attacker only now needs to be really close, but not correct.

Sure, there's some math that says being really close and exact arn't a big deal; but then you're also saying your secrets don't need to be exact when decoding them and they absolutely do atm.

Sure looks like a weird privacy veil that sorta might work for some things, like frosted glass, but think of a toilet stall with all frosted glass, are you still comfortable going to the bathroom in there?

reply

upvote

by CityOfThrowaway9 hours ago|

[-]

I dunno what use case you're thinking this is for.

The use case for this is that many enterprise customers want SaaS products to strip PII from ingested content, and there's no non-model way to do it.

Think, ingesting call transcripts where those calls may include credit card numbers or private data. The call transcripts are very useful for various things, but for obvious reasons we don't want to ingest the PII.

reply

upvote

by traceroute666 hours ago|

[-]

> Think, ingesting call transcripts where those calls may include credit card numbers or private data. The call transcripts are very useful for various things, but for obvious reasons we don't want to ingest the PII.

Credit card numbers are deterministic. A five year old could write a script to strip out credit card numbers.

As for other PII ? You're seriously expecting an LLM to find every instance of every random piece of PII ? Worldwide ? In multiple languages ? I've got an igloo I'd like to sell you ...

reply

upvote

by moralestapia2 days ago|

[-]

The alternative being?

reply

upvote

by Fraaaank2 hours ago|

[-]

From a compliance POV it's not enough. For example: "<NAME PERSON ONE> is president of the United States" is still identifiable even though the name has been redacted.

Since you can't be 100% certain that a filter redacts all personal data, you'd have to make sure that you have measures in place which allow OpenAI to legally process personal data on your behalf. Otherwise you'd technically have a data breach (from a GDPR pov).

And if OpenAI can legally process personal data on your behalf, why bother filtering if processing with filtering is also compliant?

reply

upvote

by hiAndrewQuinn8 hours ago|

[-]

For the confused: this link must have gotten revived or something, I posted this comment a few days ago. Looks like it's getting the accolades I claim it deserves now.

reply

upvote

by tanelpoder7 hours ago|

[-]

It was put into second-chance pool by moderators. I originally submitted this link a few days ago and today got this (semi?)automated email from HN, an excerpt below:

  The submission "OpenAI Privacy Filter" that you posted to Hacker News (https://news.ycombinator.com/item?id=47870901) looks good, but hasn't had much attention so far. We put it in the second-chance pool, so it will get a random placement on the front page some time in the next day or so.

  This is a way of giving good HN submissions multiple chances at the front page. If you're curious, you can read about it at https://news.ycombinator.com/item?id=26998308 and other links there.

reply

upvote

by ashwindharne3 days ago|

[-]

Same here, this is an incredibly useful thing to have in the toolkit

reply

upvote

by usdogu3 hours ago|

[-]

Someone has created the reverse of it: https://github.com/chiefautism/privacy-parser

reply

upvote

by 7777777phil3 days ago|

[-]

> The model is available today under the Apache 2.0 license on Hugging Face (opens in a new window) and Github (opens in a new window).

Bringing back the Open to OpenAI..

reply

upvote

by awestroke2 hours ago|

[-]

It's only open because nobody who's interested in this model would send their data to openai to be stripped of PII. If they thought otherwise, it would be closed-weights and API-only for "safety" reasons

reply

upvote

by Havoc3 days ago|

[-]

50M effective parameters is impressively light. Is there a similarly light model on the prompt injection side? Most of the mainstream ones seem heavier

reply

upvote

by freakynit9 hours ago|

[-]

Can someone explaon how can I reconstruct the original entities back if there are, for example, more than one person names?

reply

upvote

by pros6 hours ago|

[-]

You cannot — not with the model alone. It gives you spans + types, not identity.

You need to do that part yourself after the model runs. The filter gives you spans; for each one, assign a stable ID (PERSON_1, PERSON_2) and keep {PERSON_1: "Harry", PERSON_2: "Ron"} next to the document. Swap IDs in before the LLM call, swap originals back in the reply.

Scoping that map to a document/project keeps the same person consistent across calls, so Harry stays PERSON_1 instead of becoming PERSON_3 the next time he's mentioned.

(Disclosure: I'm building a Mac privacy tool, RedMatiq, that does exactly this. The mapping layer turned out substantially harder than detection.)

reply

upvote

by flashdesk7 hours ago|

[-]

This is exactly where stochastic approaches feel uncomfortable.

For anything touching security or privacy, even small inconsistencies can quickly erode trust.

reply

upvote

by I_am_tiberius4 hours ago|

[-]

I assume they use this model to be able to train new models with user data.

reply

upvote

by I_am_tiberius4 hours ago|

[-]

If they think they are respecting their users' privacy by doing so, they are very very wrong.

reply

upvote

by flashdesk6 hours ago|

[-]

This is where stochastic approaches start to feel a bit uncomfortable.

Even small mistakes can make something dealing with sensitive data hard to trust. It seems useful as a first pass, but I’d probably still want some deterministic checks or a human in the loop to feel confident using it.

reply

upvote

by fathermarz6 hours ago|

[-]

I built a community tool for exactly this, based on privacy first principals but around the what. It’s workflow based and not “put your sensitive data into ChatGPT and hope it captures the right stuff”. Mostly built for security folks but anyone can use it

Check it out: https://redact.cabreza.com

reply

upvote

by ares6234 hours ago|

[-]

This looks actually useful. But can someone help me understand how you address the non-perfect scores: "Privacy Filter achieves an F1 score of 96% (94.04% precision and 98.04% recall)."

How would you actually use this if it can fail redacting 4% of the data. How do you reliably know which 4% failed?

reply

upvote

by vessenes36 minutes ago|

[-]

My experience with models that can reach high 90%-ile benchmark rates on tests is that often that last few percentage is arguable, vague, and often experts would disagree. You could try it yourself by training an MNIST classifier and seeing which digits your model inevitably cannot guess -- you'll be like "...wait a minute..."

Anyway, I have no idea what the underlying data here looks like, but I bet it's pretty unusual.

When I was working on my first job out of college, we were given a large contract and told to redact with black Sharpie every name of a company; it was a basic document prep exercise ahead of a strategy session for a competitor. Standard practice was to share general information but not specific. Our redaction error rate on 200 pages of contract was ... not 100%.

reply

upvote

by ndom913 days ago|

[-]

Where's the gguf from Unsloth and co?

reply

upvote

by nickthegreek10 hours ago|

[-]

[dead]

reply

upvote

by aubinkure2 days ago|

[-]

[dead]

reply

upvote

by haricomputer9 hours ago|

[-]

[dead]

reply

upvote

by y0eswddl3 days ago|

[-]

[flagged]

reply

upvote

by klauserc3 days ago|

[-]

Was my first thought as well, but this is an open weights model. You can run it on your own hardware.

reply