There are also scrapers that are hiding behind normal browser user agents. When I looked at IP ranges, at least some of them seemed to be coming from data centers in China.
Why? Data. Every bit of it is it might be valuable. And not to sound tin foil hatty, but we are getting closer to a post-quantum time (if we aren't already ).
As for what you can do on your own, it really depends on your network. OpenWRT routers can run tcpdump, so you can check for suspicious connections or DNS requests, but it gets really hard to tell if you have lots of cloud-tethered devices at home. IoT, browser extensions, and smartphone applications are the usual suspects.
Your router may have the ability to log requests, but many don't, and even if yours does, if you're concerned the device may be compromised, how can you trust the logs?
BUT, with all that said, these attacks are typically not very sophisticated. Most of the time they're searching for routers at 192.168.1.1 with admin/admin as the login credentials. If you have anything else set, you're probably good from 97% of attackers (This number is entirely made up, but seriously that percentage is high). You can also check for security advisories on your model of router. If you find anything that allows remote access, assume you're compromised.
---
As a final note, it's more likely these days that the devices running these bots are IoT devices and web browsers with malicious javascript running.
Aside from the obvious smoke tests (are settings changing without your knowledge? Does your router expose access logs you can check?), I'm not sure there's any general purpose way to check, but 2 things you can do are:
1. search for your router's model number to see if it's known to be vulnerable, and replace it with a brand-new reputable one if so (and don't buy it from Amazon).
2. There are vendors out there selling "residential proxy IP databases", (e.g., [1]) no idea how good they are, but if you have a stable public IP address you could check whether you're on that.
But I think what OP is implying is insecure hardware being infected by malware and access to that hardware sold as a service to disreputable actors. For that buy a good quality router and keep it up to date.
It seems to me to be just as likely that people are installing LLM chatbot apps that do the occasional bit of scraping work on the sly, covered by some agreed EULA.
I can't provide evidence as it's close to impossible to separate the AI bots using residential proxies from actual users, and their IPs are considered personal data. But as the other reply shows, it's easy enough to find people selling this service.
Search for: "residential proxy" ai data scraping.
Start reading through thousands of articles.
Thanks for the info, wish I didn't know :-(
You said it yourself. If you're selling a cure, you might as well start a plague.