https://www.tomshardware.com/tech-industry/artificial-intell...
> Facebook parent-company Meta is currently fighting a class action lawsuit alleging copyright infringement and unfair competition, among others, with regards to how it trained LLaMA. According to an X (formerly Twitter) post by vx-underground, court records reveal that the social media company used pirated torrents to download 81.7TB of data from shadow libraries including Anna’s Archive, Z-Library, and LibGen. It then used this information to train its AI models.
> Aside from those messages, documents also revealed that the company took steps so that its infrastructure wasn’t used in these downloading and seeding operations so that the activity wouldn’t be traced back to Meta. The court documents say that this constitutes evidence of Meta’s unlawful activity, which seems like it’s taking deliberate steps to circumvent copyright laws.
> so that its infrastructure wasn’t used in these downloading and seeding operations so that the activity wouldn’t be traced back to Meta.
(emphasis added)
If you'd like it from another source using different words, https://masslawblog.com/copyright/copyright-ai-and-metas-tor... has
> According to the plaintiffs’ forensic analysis, Meta’s servers re-seeded the files back into the swarm, effectively redistributing mountains of pirated works.
and specifically talks about that being a problem.
I will grant that until/unless the cases are decided, this is allegedly, so we'll see.
Do you think that OpenAI or Anthropic should get a pass for using torrents if they used special BitTorrent clients that only leached? Do you think the RIAA would be cool with me if I did the same?
> There is no dispute that Meta torrented LibGen and Anna's Archive, but the parties dispute whether and to what extent Meta uploaded (via leeching or seeding) the data it torrented. A Meta engineer involved in the torrenting wrote a script to prevent seeding, but apparently not leeching. See Pls. MSJ at 13; id. Ex. 71 ¶¶ 16–17, 19; id. Ex. 67 at 3, 6–7, 13–16, 24–26; see also Meta MSJ Ex. 38 at 4–5. Therefore, say the plaintiffs, because BitTorrent's default settings allow for leeching, and because Meta did nothing to change those default settings, Meta must have reuploaded “at least some” of the data Meta downloaded via torrent. The plaintiffs assert further that Meta chose not to take any steps to prevent leeching because that would have slowed its download speeds. Meta responds that, even if it reuploaded some of what it downloaded, that doesn't mean it reuploaded any of the plaintiffs’ books. It also notes that leeching was not clearly an issue in the case until recently, and so it has not yet had a chance to fully develop evidence to address the plaintiffs’ assertions.
They did leeching but not seeding. https://caselaw.findlaw.com/court/us-dis-crt-n-d-cal/1174228...
> If I a civilian did this I would face time in prison
no if you had leeched its is very unlikely that you would face time in prison.
Wrong. Michael Clark testified under oath that they tried to minimize seeding and not that they prevented it entirely. His words were: "Bashlykov modified the config setting so that the smallest amount of seeding possible could occur" (https://storage.courtlistener.com/recap/gov.uscourts.cand.41...)
They could have used or written a client that was incapable of seeding but they didn't.
> no if you had leeched its is very unlikely that you would face time in prison.
Not the one who claimed that, but if I think it's fair to say that doing what they did, at that scale, could easily result in me (and most people) being bankrupted by fines and/or legal expenses.
Do you not think an engineer who went to such efforts to disable seeding wouldn’t go the full extent? Why not?