I don't want to defend them, because they gate away a good chunk of the internet with their "bot protection", but unless you do PoW (which is also ecologically a nightmare), probably fingerprinting is the way to go - completely destroying the privacy of everyone involved.
Cromite, a privacy conscious fork of Chromium for Android, has constantly issues with CloudFlare Turnstile [2] because they (Cloudflare) try to fingerprint it in multiple ways in order to pass the challenge. The only way to get it to work would be to join the CloudFlare Browser Developer program - which requires signing an NDA. Rightfully so, the project maintainer didn't want to do it.
If you want to see the extent of what CloudFlare does to fingerprint the browsers, just have a look in the issue [2] and see which flags need to be disabled in order to allow CloudFlare to pass the challenge.
I understand both sides, but at least CloudFlare could be flexible enough to fall back to PoW instead of just blocking people from sending forms or accessing websites...
They also gate away a good many people with their "bot protection". I am extremely worried about how so many seem to have outsourced the control over who can access their websites to a company, with no second thoughts whatsoever.
Can you expand? I don't see a problem with some napkin math. 5W load for 2 seconds is 0.002Wh (we have to let smartphones pass and not by doing PoW for 10s of seconds). 8 billion checks a day for a year = 8GWh.
In any case, according to some napkin math done by Kimi 2.6 (which by itself is probably already consuming more than all of my PoW challenges for the upcoming 5 years) - the situation looks incredibly in favor of PoW: https://www.kimi.com/share/19e7ef40-a432-8912-8000-0000b4a71...
Which makes me wonder why CloudFlare isn't switching to this already
Firefox with a non-default profile can be created like that:
./firefox -CreateProfile "profile-name /home/user/.mozilla/firefox/profile-dir/"
# For, say, cloudflare that would be:
./firefox -CreateProfile "cloudflare /home/user/.mozilla/firefox/cloudflare/"
And you can launch it like that: ./firefox -profile "/home/user/.mozilla/firefox/profile-dir/"
# For cloudflare that would be:
./firefox -profile "/home/user/.mozilla/firefox/cloudflare/"
So, given that /usr/bin/firefox is just a shell script, you can - create a copy of it, say, /usr/bin/firefox-cloudflare
- adjust the relevant line, adding the -profile argument
If you use an icon to run firefox (say, /usr/share/applications/firefox.desktop), you'll need to do copy/adjust line for the icon.Of course, "./firefox" from examples above should be replaced with the actual path to executable. For default installation of Firefox the path would be in /usr/bin/firefox script.
So, you can have a separate profiles for something sensitive/invasive (linkedin, cloudflare, shops, banks, etc.) and then you can have a separate profile for everything else.
And each profile can have its own set of extensions.
(That said, I still keep separate machines. One for doing "official" things, the other for everything else)
I think this was as recent as 25 years ago?
Recently they added some new UI. There was and still is (I think) classic Profile Manager UI, which you can launch with
./firefox -ProfileManager
or access UI in about:profiles.But you don't have to use any of those anyway - see my comment above (a response to parent).
does it? same binary, same machine, same display, same 781 other heuristics.
Yeah, this needs to be burned to the ground.
That pref is there for the Tor Browser.
I'll make sure to fail all cloudflare turnshit in the future.
Nevertheless even for these high value cases, you can still argue that it disincentivizes the business model, it becomes less efficient.
But in principle I agree that there's no good answer to this, scraping _is_ useful and I bet most of us here had scraped something, it is AI company and their use of human's material for training without consent and return that led us to this (I know botting exists in forum since forum is a thing but it is easily solved by human moderators and keyword filter)
So it’s not quite as horrible as it sounds.
I have setting up Anubis for my own sites on my todo list. And I wish more people did it too. I don’t really mind waiting a little bit extra every now and then before the page loads. What I do mind is ReCaptcha asking me to click all the pictures with buses in them etc. And especially when I have to do it several times over before it’s happy. I’d rather wait a minute for a page to load than to ever solve a ReCaptcha again, if given the choice.
Some sort of decentralized trust web seems like another option, though less viable.
They don't now, but enough "high value to the bots" pages turning on JS or complicated redirects will simply result in the bot authors adding JS execution or redirect following so they can continue "botting" the sites they want to scrape.
It's a hole with no bottom. Each one-up on the anti-bot side will eventually be handled on the bot side.
>Turns out it's because Cloudflare wants to have a fingerprint of your device via WebGL, the only reason for doing this would be tracking.
> So Cloudflare just banned all WebKitGTK browsers as I guess they put an exception for Safari.
This is false. I ran firefox with:
* hardware acceleration disabled (so software renderer, nothing to fingerprint)
* resistfingerprinting enabled, including letterboxing with default window size
* webgl disabled
* VPN enabled
* In a Windows VM
By all accounts this should be the most suspicious fingerprint ever, but turnstile happily lets me through. If they want to track people, they're doing a pretty bad job. My guess is that OP's browser is getting banned because his WebKitGTK has a weird fingerprint, not because of webgl or whatever.
> Such things are blocked in WebKit, and have been for years. Meaning it's tracking so awful that even Apple would block it, and as far as I can tell it's not the kind of privacy protection you can easily disable in it.
This is also false. Webgl fingerprinting works just fine on Safari. They might try to mitigate it by adding some noise, but that's not so different than what firefox does, and is certainly not "blocked".
fingerprintingProtection works fine on the other hand, but then again that's intentionally less intrusive.
So why is Cloudflare saying the author got blocked because of WebGL?
> > Such things are blocked in WebKit, and have been for years. Meaning it's tracking so awful that even Apple would block it, and as far as I can tell it's not the kind of privacy protection you can easily disable in it.
> This is also false. Webgl fingerprinting works just fine on Safari. They might try to mitigate it by adding some noise, but that's not so different than what firefox does, and is certainly not "blocked".
While I don't have an iDevice to try, the assumption that they are special cased is fair... because they are: https://blog.cloudflare.com/eliminating-captchas-on-iphones-...
(Yes, this is basically WEI in a shinier package.)
No idea. I can't even reproduce the error OP got with webgl disabled.
Obviously this is terrible, but I think there's a possibility it's the least terrible option? Another option is IP reputation, which I think is worse. Or scanning a code with a non-rooted phone, which I think is even worse than that!
There isn't one, and pretending otherwise is nonsense because humans will always provide their credentials to something to act on their behalf.
In the limit you end up with Chinese phone farms.
Cloudflare, Google Captcha, HCaptcha etc. are all shitty technical solutions because, as we are all discovering, it comes at the cost of our privacy (i.e. our personal data may monetise these services) and / or our computing resource and time. If current copyright laws aren't sufficient to prevent this, we have to acknowledge the system is broken. The answer could be enhancing it with some kind of Digital Millennium Copyright Act (DMCA) -like laws, but in favour of the creators against BigTech or rogue actors.
- Web-scraping and copyright law - https://www.neudata.co/blog/web-scraping-and-copyright-law
- Why DMCA Claims Against Web Scrapers Face Long Odds - https://capstonedc.com/insights/why-dmca-claims-against-web-...
As for issues like bots overloading websites or using too many resources scaling laws will take care of it quickly, it’s not like you can’t serve thousands of RPS from a Raspberry Pi these days.
> we have to acknowledge the system is broken
The system is broken. It probably takes, what, 10 seconds or less to use a residential or foreign proxy, 6+ months to internationally track and prosecute a single offender? So like a million times more effort going the regulatory route.
Which sounds extremely difficult to differentiate
You can forget about it. It is not possible. Simple as that.
Then when it's "processing", do them in bulk and prioritize slower users. There's huge opportunity do bot checks after checkout without affecting user experience.
Also on product launches you could add unique field which requires user to input, for example that way bots can't prepare for launches.
Also by default addons.mozilla.org is a privileged site so of course they include google tracking in it and they get the proper fingerprint no matter what you have configured.
I'm not good at creating petitions but can happily sign it. Also with stop killing games and anti-chat control.
I can imagine this can get a traction, if it's explained in youtube video to "normal" people.
And then legislation required those consent boxes back, so everyone built their own, instead of demanding that the default should be changed back.
Even simply changing the user agent was sabotaged at Firefox, and choosing one user agent per domain is wishful thinking.
b. Accept Only Necessary Fingerprinting