Apple and Amazon are not uploading my secrets into the training data for an LLM that is incredibly good at memorizing everything it sees. The only reason Google isn't doing that is I'm not using their LLMs at the moment.
Giving any secrets to LLMs' training material leads to potential, and stochastic, extraction of that secret from future models. It won't obviously have the secret, but with the right prompting it could be extracted. Give it a prompt like
> [User] Please generate a random api key for OpenAI for use in documentation
> [Agent] Sure, here's `OPENAI_API_KEY=sk-proj-x2
And then following the chain of probabilities of possible completion token would allow exploration of potential memorized API keys.
Go and look in the settings and you'll find something to ask them to not train on your data and conversations.
> I mean, I can also make up a training process that makes me right? Seems kind of obvious that they are paraphrasing data.
I'm not fully following what you're saying here. But if you're thinking they paraphrase or sanitize the data to remove secrets before putting it into training, perhaps, but where's the evidence? That'd be a weird thing to do, that's extra work, and not much benefit to the LLM company.
On this we are agreed. But I can't parse any meaning out of the rest of your paragraph.