undefined

points

[-]

And whilst the IA will honour requests not to archive/index, more aggressive scrapers won't, and will disguise their traffic as normal human browser traffic.

So we're basically decided we only want bad actors to be able to scrape, archive, and index.

by JumpCrisscross3 hours ago|

parent|

[-]

> we're basically decided we only want bad actors to be able to scrape, archive, and index

AI training will be hard to police. But a lot of these sites inject ads in exchange for paywall circumvention. Just scanning Reddit for the newest archive.is or whatever should cut off most of the traffic.

by nullhole32 minutes ago|

prev|

[-]

Can you give a reference for The Guardian blocking IA? I just checked with an article from today - already archived, and a manual re-archive worked.

by fc417fc8024 hours ago|

prev|

[-]

Presumably someone has already built this and I'm just unaware of it, but I've long thought some sort of crowd sourced archival effort via browser extension should exist. I'm not sure how such an extension would avoid archiving privileged data though.

by ajb3 hours ago|

parent|

[-]

That exists for court documents (RECAP) but I think they didn't have to solve the issue of privilege as PACER publishes unprivileged docs.

by nxobject7 minutes ago|

parent|

[-]

In particular, habeas petitions against DHS, and SSA appeals aren’t available online for public inspection: you have to go to a clerk’s office and pay for physical copies. (I think this may have been reasonable given the circumstances in past decades… not so now.)