IIRC this particular rule is a SHOULD because MUAs often send messages without a Message-ID to their submission server, and the submission server adds one if necessary. https://www.rfc-editor.org/rfc/rfc6409.html#section-8.3 The SHOULD lets those messages be valid. Low-entropy devices that can't generate a good random ID are rare these days, but old devices remain in service, so the workaround is IMO justified.
I once had a job where reading standards documents was my bread and butter.
SHOULD is not a requirement. It is a recommendation. For requirements they use SHALL.
My team was writing code that was safety related. Bad bugs could mean lives lost. We happily ignored a lot of SHOULDs and were open about it. We did it not because we had a good reason, but because it was convenient. We never justified it. Before our code could be released, everything was audited by a 3rd party auditor.
It's totally fine to ignore SHOULD.
3. SHOULD This word, or the adjective "RECOMMENDED", mean that there
may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.
SHOULD is effectively REQUIRED unless it conflicts with another standards requirement or you have a very specific edge case.Note the use of the word "must" used twice there. Barring a sufficiently good reason and accepting the consequences, this becomes a very poorly worded "required".
The spec would have been far better starting with SHALL and then carving out the allowance for exceptions.
Those reasons can be anything. Legal, practical, technological, ideaological. You don't know. All you know is not using it is explicitly permitted.
SHOULD - Should really be there. It's not MUST, you can ignore it but do not come crying if your email is not delivered to some of your customers ! you should have though about that before.
for the client. If you're implementing a server, "the client SHOULD but didn't" isn't a valid excuse to reject a client either.
You can do it anyway, you might even have good reasons for it, but then you sure don't get to point at the RFC and call the client broken.
Yes it absolutely is: https://www.rfc-editor.org/rfc/rfc2119 is quite clear.
3. SHOULD This word, or the adjective "RECOMMENDED", mean that there
may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.
If the client SHOULD do something and doesn't, and your server does not know why, you SHOULD disconnect and move on.If the server has considered fully the implications of not having a Message-ID header, then it MAY continue processing.
In general, you will find most of the Internet specifications are labelled MUST if they are required for the protocol's own state-processing (i.e. as documented), while specifications are labelled SHOULD if they are required for application state-processing in some circumstances (i.e. other users of the protocol).
That is not a rule.
In this situation the server can reject any message if it wants to, and not doing a SHOULD tests the server's patience, but it's still ultimately in the "server wanted to" category, not the "RFC was violated" category.
The RFC is a request for comments. The specific one in question is about Internet Mail.
If server implementers want their mail to be delivered these are things they SHOULD do.
That's it.
It isn't something you can give to your lawyer, and nobody cares about your opinion about what you think "should" means you can make someone else do. This is how it is.
And the line of yours I quoted is still not supported by anything.
How does Google know whether or not the sender has a valid reason? They cannot know that so for them to reject an email for it means they would reject emails that have valid reasons as well.
You and I have different definitions of "clearly"
It is not required for the protocol of one SMTP client sending one message to one SMTP server, but it is required for many Internet Mail applications to function properly.
This one for example, is where if you want to send an email to some sites, you are going to need a Message-ID, so you SHOULD add one if you're the originating mail site.
> How does Google know whether or not the sender has a valid reason?
If the Sender has a valid reason, they would have responded to the RFC (Request For Comments) telling implementers what they SHOULD do, rather than do their own thing and hope for the best!
Google knows the meaning of the word SHOULD.
> it means they would reject emails that have valid reasons as well.
No shit! They reject spam for example. And there's more than a few RFC's about that. Here's one about spam that specifically talks about using Message-ID:
The server "considers" nothing. The considerations are for the human implementers to make when building their software. And they can never presume to know why the software on the other side is working a certain way. Only that the RFC didn't make something mandatory.
The rejection isn't to be compliant with the RFC, it's a choice made by the server implementers.
I don’t care what the protocol rfc says, the client arbitrarily rejecting an email from the server for some missing unimportant header (for deduction detection?) is silly.
Yes. https://www.rfc-editor.org/rfc/rfc2821#section-6.3 refers to servers that do this and says very clearly:
These changes MUST NOT be applied by an SMTP server that
provides an intermediate relay function.
That's Google in this situation.> Stop hiding behind policy and think for yourself.
Sometimes you should think for yourself, but sometimes, and friend let me tell you this is one of those times, you should take some time to read all of the things that other people have thought about a subject, especially when that subject is as big and old as email.
There is no good reason viva couldn't make a Message-ID, but there's a good reason to believe they can't handle delivery status notifications, and if they can't do that, they are causing bigger problems than just this.
“MAY This word, or the adjective "OPTIONAL", mean that an item is truly optional… An implementation which does not include a particular option MUST be prepared to interoperate with another implementation which does include the option, though perhaps with reduced functionality. In the same vein an implementation which does include a particular option MUST be prepared to interoperate with another implementation which does not include the option (except, of course, for the feature the option provides.)”
Note how it explicitly calls out interoperation with implementations that do or do not implement MAY. As a exception that proves the rule, we can reasonably assume that not interoperating with a system ignoring a SHOULD rule is a correct implementation and it is the fault of whoever is not implementing SHOULD.
The standards, to my observation, tend to lag the CVEs.
Side-note: If someone has built a reverse-database that annotates RFCs with overriding CVEs that have invalidated or rendered harmful part of the spec, I'd love to put that in my toolbox. It'd be nice-to-have in the extreme if it hasn't been created yet.
CVE classify a lot of things that have nothing to do with security.
Not having a Message-ID can cause problems for loop-detection (especially on busy netnews and mailing lists), and with reliable delivery status notification.
Dealing with these things for clients who can't read the RFC wastes memory and time which can potentially deny legitimate users access to services
> It seems that Gmail is being pedantic for no reason
Now you know that feeling is just ignorance.
That should have already happened. Google is not the "first stop".
> hard ban the sender server version until they confirm
SMTP clients do not announce their version.
Also I don't work for you, stop telling me what to do.
> A midway point that involves a doom switch is not a good option.
No shit. That's almost certainly a big part of why Google blocks messages from being transited without a Message-ID.
Is it still a strong spam signal? Hard to say. Sources disagree. But as with laws, heuristics, once added, are often sticky.
"SHOULD" is basically, if you control both sides of conversation, you can decide if it's required looking at your requirements. If you are talking between systems where you don't control both sides of conversation, you should do all "SHOULD" requirements with fail back in cases where other side won't understand you. If for reason you don't do "SHOULD" requirement, reason should be a blog article that people understand.
For example, "SHOULD" requirement would be "all deployable artifacts SHOULD be packaged in OCI container". There are cases where "SHOULD" doesn't work but those are well documented.
I’m doing some work with an email company at the moment. The company has been in the email space for decades. Holy moly email is just full of stuff like this. There is an insane amount of institutional knowledge about how email actually works - not what the specs say but what email servers need to actually do to process real emails and deal with real email clients and servers.
The rfcs try to keep up, but they’re missing a lot of details and all important context for why recommendations are as they are, and what you actually need to do to actually process real email you see in the wild. (And talk to the long tail of email software).
This conversation makes me think about cornering some of the engineers with a microphone. It’d be great to talk through the specs with them, to capture some of that missing commentary.
Note "the full implications must be understood and carefully weighed before choosing a different course". Gmail and the other big hosters have full-time spam teams who spend a lot of time weighing implications, so I assume the implications of this was weighed.
Must = external requirement
I cannot fathom how you think should* would act as a requirement in any sense of the world.
3. SHOULD This word, or the adjective "RECOMMENDED", mean that there
may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.
Not sure how the people at Google interpreted this about the message-idI think it's a gray area
- If the receiver declines your message because "Message-id" is required - then I blame the receiver; because that's not true
- If the receiver declines your message because "most systems do include it, and it's lack of presence is highly correlated with spam email", then it's on the sender
Admittedly, the end result is the same.
Now, let's assume that if it is the latter (it's spam related), and Google were to accept the message, but then internally bin the message, it would be worse. At least in this case, they are bouncing the message. Because of this, the sender is at least aware that the message wasn't delivered.
Also, the author was able to get their mail delivered to a personal gmail.com address. The issue was with a Google Workspace custom email domain. This further makes me think of this as a security/spam related issue. Google is clearly capable of processing the message without a Message-id, they are just refusing for business customers.
My takeaway is that I think that Google is doing the least-wrong thing. And by being explicit in how they are handling it, it at least made the debugging for the author possible.
Also note: in a quick reading of RFC5321 (SMTP), rejecting messages for "policy reasons" is an acceptable outcome. I'm not sure if it applies completely here. The author should probably also be taking into account RFC5321 (SMTP) instead of just 5322 (message format).
That's the annoying part to me.
An email is an email. By applying different rules for rejection on different mailboxes, gmail creates a system where it's harder for would-be implementers to test compliance.
If tomorrow gmail creates a new type of mailbox, will there be a third set of rules to have your message delivered?
This here is a trivial case of simply not testing deliverability at all.
That is, when some hotline tell me that they just sent and email with the information, I ensure they hold the line until I got the actual email and checked it delivers the relevant information to fulfill the intended process. And when I want to make sure an email was received, I call the person and ask to check, waiting until confirmation.
It’s not that much SMTP/IMAP per se as the whole ecosystem. People can legitimately get fatigue of "is it in my junk directory", "it might be relayed but only after the overloaded spam/junk analyzer accept it", or whatever can go wrong in the MUA, MSA, MTA, MX, MDA chain. And of course people can simply lie, and pretend the email was sent/received when they couldn’t bother less with the actual delivery status.
There are of course many cases where emails is fantastic.
They once made all emails from my very reputable small German email provider (a company that has existed and provided email services long before Google existed) go into a black whole - not bounce them back or anything like that, mind you, their servers accepted them and made them disappear forever. I was in contact with the technicians then to get the problem fixed and they told me it's very difficult for them to even reach anyone at Google. It took them several days to get the problem fixed.
Of course, no one will ever be able to prove an intention behind these kind of "technical glitches." Nothing of significance ever happened when Google had large optics fiber connections with NSA installed illegally and claimed to have no knowledge of it, so certainly nothing will happen when small issues with interoperability occur and drive more people to Gmail.
For what it's worth: having seen some of how the sausage is made, Google isn't particularly interested in screwing over a small reputable German provider. But they also aren't particularly interested in supporting such a provider's desire to route messages to their users because the provider is small. At their scale, "I've forgotten how to count that low" is a real effect. And email, as a protocol, has been pretty busted for decades; it's not Google that made the protocol so open-ended and messy to implement in a way that providers will agree is correct.
> Nothing of significance ever happened when Google had large optics fiber connections with NSA installed illegally and claimed to have no knowledge of it
Nothing of significance outside Google. Inside, Google initiated a technical lift that turned their intranet into an untrusted-by-default ecosystem so that data was encrypted on the fiber (as well as between machines within a datacenter, to head off future compromised-employee attacks). That process took at least five years; I suppose there's a scenario where it was all smoke and mirrors, but being on the inside in the middle of the process, I watched several C-suite who are not particularly good actors be bloody pissed at the US government for putting itself into Google's "threat actor" box and making that much work for the system engineering teams.
Also, an engineer at Google then made an end-to-end email crypto plugin for Chrome, including a flag that was a nod-and-middle-finger to the information revealed in the Snowden documents. https://techcrunch.com/2014/06/04/nsa-mocking-easter-egg-fou...
After all, linguistics is full with examples of words that are spelled the same, but have different meaning in different cultures. I'm glad the RFC spelled it out it for everyone.
On the other end, we may receive messages with or without. Both are valid. We MUST therefore accept both variations.
The second one is a consequence of the former. So yes Google is the violating party.
When the docs disagree with the reality of threat-actor behavior, reality has to win because reality can't be fooled.
So, it's fairly explicit that the sender should use message-id unless there's a good reason to not do so. The spec is quiet about the recipients behavior (unless there's another spec that calls it out).
(No seriously, I’m asking; are there examples of where it’s actually different from a MUST)?
Also this reminds me of something I read somewhere a long time ago: when specifying requirements don’t bother with SHOULD. Either you want it or you don’t. Because if it’s not a requirement, some people won’t implement it.
I guess the one time it’s good is if you want an optional feature or are moving towards requiring it. In this case Google has decided it’s not an optional feature.
backward compatibility makes it hard to add MUST. using SHOULD is a good alternative
No it absolutely does not mean that. It means, by explicit definition which is right here, that text is exactly that definition, that no one requires it. They can't require it, and still be conforming to the spec or rfc. That's the entire point of that text is to define that and remove all ambiguity about it.
It's not required by anyone.
The reason it's there at all, and has "should" is that it's useful and helpful and good to include, all else being equal.
But by the very definition itself, no people require it. No people are allowed to require it.
Any that do, are simply violating the spec.
SHOULD means that if you don't that, bad things are likely to happen, but it will not immediately break at the protocol level and during discussion in the IETF some people thought there could be valid reasons to violate the SHOULD.
Typically, IETF standards track RFCs consider the immediate effects of the protocol but often do not consider operational reality very well.
Sometimes operational reality is that a MUST gets violated because the standard is just wrong. Sometimes a SHOULD becomes required, etc.
Certainly for email, there is a lot you have to do to get your email accepted that is not spelled out in the RFCs.
Should just means the thing is preferred. It's something that is good and useful and helpful to do.
That is not "must unless you can convince me that you should be excused".
For most cases you should use three points of contact. However, there may be other situations for example if someone is giving you a leg up, or you can pole vault, where another solution is preferred.
Anyway, in general you can expect that doing unusual but technically valid things with email headers will very often get your messages rejected or filtered as spam.
For consumers, ignoring a SHOULD mostly affects their own robustness.
But here Google seems to understand it as a MUST... maybe the scale of spam is enough to justify it. Users are stuck between two parties that expect the other to behave.
This is 100 percent the case, and why these things are this way.
If you wanted to make email two point oh, I dont think it would look a lot like what we have today.
But gmail accepts emails without message-id on personal mailboxes apparently.
Would this make mass emails and spam harder, absolutely. Would it be a huge burden for actual communications with people, not so much. From there actual white/black listing processes would work all that much better.
It would simply close the loop and push the burden of the messages onto the sender's system mostly.
And yes, you can decide from the envelope, and a higher chance of envelope validity.
> If you wanted to make email two point oh, I dont think it would look a lot like what we have today.
Rspamd and spamassassin have missing MID check in their default rules, I am sure that most antispam software is same.
So at that point the ID has no value to me except being obliged to carry it around with the message, so maybe the originating system can at some point make sense of it. But then there is obviously no reason to ever reject mail without it, it's an ID valid for the sender and the sender didn't care to include one, great, we save on storage.
Your framework of analysis is based on someone else's database key ids being irrelevant to you. That's true.
But another framework of analysis is tracking statistical correlations of what spam looks like. Lots of spam often don't have message ids. Therefore it's used as a heuristic in scoring it as potential spam. That's why other postmasters even without SpamAssassin independently arrive at the same answer of trying to block messages without a message id. Example: https://serverfault.com/questions/629923/blocking-messages-w...
2. You can't really implement mail stuff just based on RFCs:
- There docent overlapping RFCs which can sometimes influence each other and many of them obsolete older versions why others still relevant RFCs reference this older versions. This makes it hard to even know what actually is required/recommendation.
- Then you have a lot of "irrelevant" parts, which where standardized but are hardly supported/if at all. You probably should somewhat support them as recipient but should never produce them as sender today (mostly stuff related to pro-"everything is utf8" days). Like in general the ideas of "how mail should probably work" in old RFCs and "how it is done IRL today" are in some aspects _very_ far away.
- Lastly RFCs are not sufficient by themself. They don't cover large parts of the system for "spam detection/suspicious mail rejection". So it's a must have to go to the support pages of all large mail providers and read through what they expect of mails. And "automated mails need a message id" is a pretty common requirement. In addition you have to e.g. make sure the domain you use isn't black listed (e.g. due to behavior of a previous user), and that your servers IP addresses aren't black listed (they never should be black listed long term, but happens anyway, and e.g. MS has based on very questionable excuses "conveniently" black listed smaller local data center competition while also being one of the most widely used providers for commercial mail in that area).
An unrelated frustration of mine is that Message-ID really should not be overridden but SES for instance throws away your Message-ID and replaces it with another one :(
Should in most RFCs also mean "do it as long as you don't have a very good technical reason not to do it". Like it's most times a "weak must". And in that case the only reason it isn't must is for backward compatibility with older mail system not used for sending automated mails.
And it is documented if you read any larger mail providers docs about "what to do that my automated mails don't get misclassified as spam". And spam rejection is a whole additional non-standardized layer on top of RFCs anyone working with mail should be aware of. In any decades old non centralized communication system without ever green standards having other "industry standard/de-factor" but not "standardized" requirements is pretty normal btw.
Sure, you can send email with whatever headers you want, use weird combos, IP addresses, reply-to, and it might be still a technically valid email, but not something that should land in people's inboxes.
Also, a payment processor not testing their email on the most popular email provider in the world is quite ridiculous.
3. SHOULD This word, or the adjective "RECOMMENDED", mean that there
may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.
[0] https://www.rfc-editor.org/rfc/rfc2119- In most cases, you are expected to follow it.
- You can choose not to follow it, but you must have a very good reason.
For example, RFC 7231 say that there should be DATE header but some embedded devices have no real-time clock so it ok not to implement.
Having said that, I regret my original characterization of the Message-ID header as a "requirement" and have updated the blogpost to be fair to all sides.
Thank you for bringing this up.
For example, why does Google handle this differently for consumer and enterprise accounts? Well it's Google so the answer could always just be "they are disorganized" but there's a good chance that in both cases, it was the pragmatic choice given the slightly different priorities of these types of customers.
Once you deviate a bit from the standard, you're down a slippery slope. Its not that difficult to use pragmatism to justify wrongdoing.
Battle with spam has been for long part just trying to algorithmically fingerprint the scam bots and reject the message if it looks like it wasn't sent by "real" mail server/client.
So a lot of things that are optional like SPF/DKIM are basically "implement this else your mail have good chance of being put into spam automatically".
So these are mostly quality-of-life reasons, it’s not a reason to reject an email.
That is the difference.
Another example may be a lightweight implementation of a spec in a limited and/or narrow environment, which remains technically compliant with full implementations of a spec but interaction with such a limited/narrow environment comes with awareness about such limitations.