One of the big companies, Meta, already decided to go ahead and grab terabytes of pirated books to feed their LLM. [0]
Therefore I would not give them (or similar entities) the benefit of the doubt when it comes to how they might use text that customers "gave" them under some unreadably-favorable terms of service.
With PII, the pirated-books example is doubly-relevant, because the accusation of "this output is reproducing my copyright work" is very similar to "this output is revealing my private data". The fuzzy black-box nature of the algorithms offers ways to stymie enforcement, arguing that victims or regulators cannot conclusively prove a chain of cause with zero coincidences.
[0] https://www.theatlantic.com/technology/archive/2025/03/libge...
1) We know that legally privacy terms to data are still binding, and those worried about it are freaking out over nothing,
2) We know that those contracts are null and void, and there are no restrictions on what can be done with that data beyond blanket legal protections to such biological data, or
3) It's an open legal question
I don't understand the legal terms of something like this in bankruptcy, if the data are seen as being separated from the contractual obligations that acquired them.
I'm kind of surprised it hasn't happened already, but I guess there hasn't been enough unscrupulous LLM companies selling those "anonymous" chat logs yet.
Your data is already training data. If they promise to delete everything from their models or those elsewhere that they made the data available to, even if you pay, I'd call them liars.
If they are PII then under GDPR they are obligated to delete the data.
If not then they will be liable to pay fines up to $20 million or 4% of their total global turnover.
Fines can be up to €20 million or 4% of global revenues…, _whichever is greater._