More centralized web ftw.
My current problem is OpenAI, that scans massively ignoring every limit, 426, 444 and whatever you throw at them, and botnets from East Asia, using one IP per scrap, but thousands of IPs.
Good enough for me.
> More centralized web ftw.
This ain't got anything to do with "centralized web," this kind of epistemological vandalism can't be shunned enough.
I'm completely uncertain that the unsophisticated garbage I generated makes any difference, much less "poisons" the LLMs. A fellow can dream, can't he?
many scraper already know not to follow these, as it's how site used to "cheat" pagerank serving keyword soups
1. Simple, cheap, easy-to-detect bots will scrape the poison, and feed links to expensive-to-run browser-based bots that you can't detect in any other way.
2. Once you see a browser visit a bullshit link, you insta-ban it, as you can now see that it is a bot because it has been poisoned with the bullshit data.
My personal preference is using iocaine for this purpose though, in order to protect the entire server as opposed to a single site.