points
https://data.commoncrawl.org/projects/hyperlinkgraph/cc-main...
and the specific file that's every host we've seen in the latest 3 crawls is: