But I like this review of techniques, even the simplest ones are very effective, that surprised me.
Perhaps my provider’s just great at filtering spam - but I kind of doubt it’s better than the major players (for years I’ve used Zoho for email - and it’s ‘okay’ enough that it’s not worth switching).
because harvesters don't care until one technique gets massive use. if you come up with a unique but simple enough scheme for your sites and keep a few dozen email addresses out of their reach.. they've still gathered a million addresses. it's not really worth their effort to get the last 0.0001% of extra email addresses
so it's best to just not advertise your solution and make sure it doesn't get n any outside traction - if it gets popular the harvesters will defeat it
However, LLMs are quite good at generating spam and I think soon will evade most filters.
If an LLM and make a plausible email, the best another LLM can do is to rank it as plausible. Blackbox creation and detection have to be on the same level.
Perhaps if you said the detection LLM had all your context and websearch. That it could know that a Penny Pollytree at Coco Co isn’t a real person, but… that just seems like burning a ton of coal to detect fraud where the creation LLM was able to easily come up with the fictitious spam cheaply.
The real story here is this will go beyond email verification. That every system we have is going to need to up its security. Paper birth certificates and social security cards and email addresses and all manner of identity is going to need new systems of auth. The challenge will be to prevent authoritarian centralization.
Or buy/rent domains/IPs that have good reputations, as there are services that specializes in just bringing up the reputation for stuff so they can sell it once "good". Same exists for user accounts for various platforms like reddit and so on.
Yes, that is indeed the point of those; "build up reputation -> sell/rent -> someone uses it to burn reputation -> rinse and repeat".
I never got SpamAssassin working very well, but since moving my email hosting to Apple (from my own server), spam has not been a problem.
When I view a commit on the github UI using view source, I can see the commit author's email address just as text with no special handling. It's bracketed by "<" and ">", so maybe that's enough to confuse harvesters.
I just looked at the spam folder of one my personal accounts (where I sign up for services), and it has got tons of stuff, most recently 2 or 3 with the subject "YOU PERVERT! I RECORDED YOU!".
It seems spammers are doing less harvesting and more purchasing of email lists from service vendors.
- git@mydomain.com
Presumably harvested from GitHub or gitlab
- contact@mydomain.com / admin@mydomain.com
Not actually an email address ever used, presumably people just guessing these exist from convention.
- <first name>@mydomain.com
I mean, if you know my name you can probably guess this but also this has been my primary email address for outbound email and so has ended up in marketing lists etc.
- ap@mydomain.com, finance@mydomain.com
This is a very recent trend but I've been getting emails to made up addresses like these ones quoting forged emails from myself (with various titles like CEO or CFO attached) claiming to authorize payments to other parties, usually backdated, and then asking that I process their invoice ASAP because look how long ago the CEO said it should be paid. I guess my website has ended up in some list of businesses despite being a personal site.
Ironically, the address that was in plain text in my HN profile for like 15 years gets very minimal spam.
(Similarly, I'm sure most links can be found by searching the bytestring for "href" and taking what's to the right of it.)
This would explain why HTML entities are so effective.
On the other hand, surely the TLS handshake is far more expensive than HTML parsing? Maybe it's to avoid parser failure modes that consume a lot of resources?
Could also be that they learned that sending spam to obfuscated addresses doesn’t gets much response. Such messages might get filtered out more and/or addressees might be less inclined to reply to it.
A dog will keep biting long after that is a disastrous plan.
Then you can hand each recipient an absolutely unique email which isn't just ole "name.morewords@" period trick — block those which receive SPAM.
----
OR: the even "easier" lifestyle of just not using email (like me). Obviously this is difficult for modern living, but that's what temp email is best for [i.e. circumventing ubiquitous `REQUIRED` email address fields].
Then I hit upon a simpler solution. Have one email address. Happily share publicly. And whitelist the sender's email addresses. Emails not in the whitelist go into a quarantine folder that I glance at once in a while.
It's almost equivalent in efficacy, but much simpler to implement.
This article however is talking about publishing your email address on a public website. It matches my experience, that simple javascript concatenation stops 100% of spam. Not that I would or ever did trust my primary email address to that.
When configured correctly each family member can reach you at a custom handle@, even seeing this custom reply address in response emails from you.
----
But yes, you're correct about the purpose of OP's article (website obfuscation). The topic-overlap is so close that it's still worth mentioning, IMHO.
Something like:
``` <a href="#" class="js-mailto ${className}" data-email-user="${local}" data-email-host="${host}" data-email-subject="${sub}" > ${children} </a> ```
And then some light vanilla JS to stitch it together. Works in the browser, and spam has dropped off a cliff since.
I occasionally get spam from people who took the time to create gmail accounts. Based on this advice, the honey pot email address would get spam from a Gmail account and your script would block Gmail servers.
Contact details: [any mailbox] [at] [the domain name of this web site]. Please don’t ask me to give interviews, sign books, appear on podcasts, attend conferences or conventions, or provide feedback or endorsements for works of fiction, scientific theories, or slabs of text disgorged by chatbots.
I have no idea how to decipher this obfuscation.
Like others mentioned, though, personally i haven't bothered by email harvesting for years now since spam filters seem to do a decent job. I have my email posted in plaintext here (which i bet is harvested very often) and in various other places and the occasional spam i get is eclipsed from "spam" from services i've actually signed up for (coughlinkedincough).
Imagine someone visiting your blog who wants to e-mail you can burn some CPU cycles to "earn" an address that hasn't been given out to anybody else, e.g. user+TOKEN@example.com, where it is algorithmically-unlikely for them to be able to guess a different TOKEN that will work. Then if abuse occurs, you can just retire that one address. (In a non-interactive context, like a paper ad, you could just generate one yourself.)
Naturally, this would be best with an e-mail client that is aware of the scheme, and with a mail-service that has some API for generating new addresses, such as if you want to cold e-mail somebody and use a new from/return address.
Some years ago I had the fanciful idea of doing it with a phone-app, where it manages creating new addresses as-needed, disabling them, and keeping notes about who you gave them to.
I use it all the time in conjunction with Bitwarden to generate unique emails per site. You can have notes in each email, and they show up in a small banner on in the forwarded email. And each one is individually disable-able, so you can easily cut it off if you see spam from it.
I was really interested in this space and made my own homegrown tool for this. I used it for a while until I discovered Addy and switched over. IIRC there are similar services by Mozilla, Apple, and Proton.
Basically each email gets written as a brainf*ck program and stored in a "data-" attribute. The html only includes a more primitively obfuscated statement "Must enable Javascript to see e-mail." by default which then gets replaced by another brainf*ck interpreter (in JS) with the output of the brainf*ck code. Since we only output ASCII we can reduce the size of the brainf*ck code by always adding 32 to each value it outputs. The Javascript is loaded from what seemingly looks like a 3rd party domain. There we filter basing on heuristics and check if the "referer" matches before sending out the actual interpreter code.
Of course all this would not help if a scraper properly runs things through Javascript too.
Recently I read you soon will be able to run DOOM via CSS, so certainly it should be possible to have a brainf*ck interpreter in CSS? That would be the next step… just to get rid of the Javascript, but then I'm okay with all the downsides of using Javascript just for the e-mail obfuscation.
Anyway… I also regularly (at least once a year) rotate those public contact addresses.
/edit
And you can combine both approaches: XOR'ing the code first for good measurements. :)
<span class="hidden email"><b>999a8f84898f98</b>aa<b>878b8386c4</b>999a8f84898f988785989e8f84998f84c4898587</span>Anecdotal, but I’ve used HTML entities on a public static website for a long time using an href tag with mailto, and yet I’ve not seen any spam.
I guess any spammer who uses some level of GenAI to process and extract email addresses would have a lot more success against all the methods listed in this article.
For a similar reason I dislike ip2ban, my objective is not to block all attack attempts, I prefer receiving them acknowledging them and being immune to them.
The idea of ignoring attack attempts isn't very safe when you think about it, your body doesn't do that, it creates antibodies upon subclinical expositions. Complete isolation means your immune system is weak and you are more vulnerable to the lightest of exposures.
The data-source are the enormous data breach that are more and more frequent. There is more intensive to collect more information on someone you already know something about than spamming an email you don't even know if it's a valid one.
The spam can also be very more effective as it present itself with personal information about the spammed.
Edit: that’s not to deny that big data leaks are a serious problem
It's obvious to any non native english speaker, when you have a spam in english, it is because they toke the email from the web. When it's in you native language, it's usually from a data breach.
I'm vastly more spammed by the later. I can confirm it with unique email addresses of the "+" form (but not with the + character).
Also when I'm spammed in english, it's for Web3 crypto stuff and from a data breach it's a phishing attempt.
But yeah, I’d say most junk mail is coming to (1) an address leaked from one Russian bank (!) I used, (2) the address listed in public business databases (I have a company in Estonia).
That solution doesn't apply to the use case in the article.
Also, a note to those who make fancy "me+someservice@somedomain.com" addresses: make really sure you are in control and these work. Some services (including mine) will need to E-mail you one day, for example to tell you that your account will be deleted because of inactivity. If you don't receive that E-mail because of your fancy spam defenses, your account will be deleted. I've seen people hurt themselves like this and it makes me sad.
On a constructive note: what works very well is spam filtering using LLMs. We have AI to help us with this problem today. I wrote an LLM despammer tool which processes my inbox via IMAP using a local LLM (for privacy reasons). I see >97% accuracy in my benchmarks on my (very difficult) testing corpus. It's nearly perfect in real life usage. I've tested many local models in the 4-32B range and the top practical choice is gpt-oss:20b (GGUF, I run it from LM Studio, MLX quantizations are worse) — not only does it perform very well, but it's also really fast.
If you use a catch-all on a domain, i.e. someservice@somedomain.com, I guess in theory that might break. But it seems about as likely as messing up the overall domain setup.
Also, my account on your service is likely much more disposable to me than my email address/domain. Anything I care about, I'd back up. Not just assume some random website is going to preserve it for me forever.
Also, the two can be complementary, anyways, so I am not sure what your point is.
Just wait until one of these companies demands an email from the registered email address of your account!