I think you're looking at the wrong end of the spectrum there. It's some of the biggest players who flaunt the rules.
"Several AI companies said to be ignoring robots dot txt exclusion, scraping content without permission: report" (2024) https://www.tomshardware.com/tech-industry/artificial-intell...
Even if you believe what the AI companies are doing is or should be a copyright violation, the Internet Archive is redistributing in a more direct manner.