- Caching helps, but is nowhere near a complete solution. Of the 4M requests I've observed 1.5M unique paths, which still overloads my server.
- Limiting request time might work, but is more likely to just cause issues for legitimate visitors. 5ms is not a lot for cgit, but with a higher limit you are unlikely to keep up with the flood of requests.
- IP ratelimiting is useless. I've observed 2M unique IPs, and the top one from the botnet only made 400 well-spaced-out requests.
- GeoIP blocking does wonders - just 5 countries (VN, US, BR, BD, IN) are responsible for 50% of all requests. Unfortunately, this also causes problems for legitimate users.
- User-Agent blocking can catch some odd requests, but I haven't been able to make much use of it besides adding a few static rules. Maybe it could do more with TLS request fingerprinting, but that doesn't seem trivial to set up on nginx.
Because this is something which is happening continuously & i have observed so many HN posts like these (Anubis iirc was created by its creator out of such frustration too). Git servers being scraped to the point of its effectively an DDOS.
2026-01-28 21'460
2026-01-29 27'770
2026-01-30 53'886
2026-01-31 100'114 #
2026-02-01 132'460 #
2026-02-02 73'933
2026-02-03 540'176 #####
2026-02-04 999'464 #########
2026-02-05 134'144 #
2026-02-06 1'432'538 ##############
2026-02-07 3'864'825 ######################################
2026-02-08 3'732'272 #####################################
2026-02-09 2'088'240 ####################
2026-02-10 573'111 #####
2026-02-11 1'804'222 ##################Thoughts on having an ssh server with https://github.com/charmbracelet/soft-serve instead?
Let's not forget that scrapers can be quite stupid. For example, if you have phpBB installed, which by defaults puts session ID as query parameter if cookies are disabled, many scrapers will scrape every URL numerous times, with a different session ID. Cache also doesn't help you here, since URLs are unique per visitor.