upvote
Well, it's like robbing the robbers, when it comes to training data
reply
Except one of the robberers is a massive corporation with even bigger legal team...
reply
It is more like imitating the imitators. There is not much of a legal case here, but poisoning the data is fair game both for those producing original data as well as for those producing its regurgitations.
reply
I think its very hard for the 'websites' to poison the data for ai though, we dont have the 'single point of ingestion' to measure when its being pumped for training data.
reply
You could run Gemma models locally to distill them. Or any other model with tool use.
reply
Yeah, but we wanted Gemini
reply