upvote
No. Open is open. Beyond DDoS protections, there should be no limits.

If load on the server is a concern, make the whole database available as a torrent. People who run scrapers tend to prefer that anyway.

This isn't someone's hobby project run from a $5 VPS - they can afford to serve 10k qps of readonly data if needed, and it would cost far less than the salary of 1 staff member.

reply
> Open is open.

I’d then ask OpenAI to be open too since open is open.

reply
Rate limiting is a DDoS protection.
reply
Pedantically: rate limiting is DoS prevention, not DDoS prevention. If you rate limit per IP, you're not mounting effective protection against a distributed attack. If you're rate limiting globally, you're taking your service offline for everyone.
reply
You're talking about a tragedy of the commons situation. There is an organic query rate of this based on the amount of public interest. Then there is the inorganic vacuuming of the entire dataset by someone who wants to exploit public services for private profit. There is zero reason why the public should socialize the cost of serving the excess capacity caused by private parties looking to profit from the public data.

I could have my mind changed if the public policy is that any public data ingested into an AI system makes that AI system permanently free to use at any degree of load. If a company thinks that they should be able to put any load they want on public services for free, they should be willing to provide public services at any load for free.

reply
The world is not black and white.
reply
The issue with that is people can then flood everything with huge piles of documents, which is bad enough if it's all clean OCR'd digital data that you can quickly download in its entirety, but if you're stuck having to wait between downloading documents, you'll never find out what they don't want you to find out.

It's like having you search through sand, it's bad enough while you can use a sift, but then they tell you that you can only use your bare hands, and your search efforts are made useless.

This is not a new tactic btw and pretty relevant to recent events...

reply
Systems running core government functions should be set up to be able to efficiently execute their functions at scale, so I'd say it should only restrict extreme load, ie DoS attacks
reply
If the rate limit is reasonable (allows full download of the entire set of data within a feasible time-frame), that could be acceptable. Otherwise, no.
reply