undefined

points

by FranklinJabar10 hours ago |

comments

by Terr_8 hours ago|

[-]

> horrifying.

IMO it's worth distinguishing between different points on the spectrum of "email object", ex:

1. Here is an Email object with detailed properties or methods for accessing its individual portions, changing things to/from canonical forms (e.g. lowercase Punycode domain names), running standard (or nonstandard) comparisons, etc.

2. Here is a immutable Email object which mainly wraps an arbitrary string, so that it isn't easily mixed-up with other notable strings we have everywhere.

__________

For e-mails in particular, implementing the first is a nightmare--I know this well from recent tasks fixing bad/subjective validation rules. Even if you follow every spec with inhuman precision and cleverness, you'll get something nobody will like.

In contrast, the second provides a lot of bang for your buck. It doesn't guarantee every Email is valid, but you get much better tools for tracing flows, finding where bad values might be coming from, and for implementing future validation/comparison rules (which might be context-specific) later when you decide you need to invest in them.

by FranklinJabar8 minutes ago|

parent|

[-]

> IMO it's worth distinguishing between different points on the spectrum of "email object"

If it's neither, who cares? This is an obvious nightmare for all involved

by squeaky-clean9 hours ago|

prev|

[-]

The easiest and most robust way to deal with email is to have 2 fields. string email, bool isValidated. (And you'll need some additional way to handle a time based validation code). Accept the user's string, fire off an email to it and require them to click a validation link or enter a code somewhere.

Email is weird and ultimately the only decider of a valid email is "can I send email to this address and get confirmation of receipt".

If it's a consumer website you can so some clientside validation of ".@.\\..*" to catch easy typos. That will end up rejecting a super small amount of users but they can usually deal with it. Validating against known good email domains and whatnot will just create a mess.

by lock18 hours ago|

parent|

[-]

In the spirit of "Parse, Don't Validate", rather than encode "validation" information as a boolean to be checked at runtime, you can define `Email { raw: String }` and hide the constructor behind a "factory function" that accepts any string but returns `Option<Email>` or `Result<Email,ParseError>`.

If you need a stronger guarantee than just a "string that passes simple email regex", create another "newtype" that parses the `Email` type further into `ValidatedEmail { raw: String, validationTime: DateTime }`.

While it does add some "boilerplate-y" code no matter what kind of syntactical sugar is available in the language of your choice, this approach utilizes the type system to enforce the "pass only non-malformed & working email" rule when `ValidatedEmail` type pops up without constantly remembering to check `email.isValidated`.

This approach's benefit varies depending on programming languages and what you are trying to do. Some languages offer 0-runtime cost, like Haskell's `newtype` or Rust's `repr(transparent)`, others carry non-negligible runtime overhead. Even then, it depends on whether the overhead is acceptable or not in exchange for "correctness".

by squeaky-clean5 hours ago|

parent|

[-]

I would still usually prefer email as just a string and validation as a separate property, and they both belong to some other object. Unless you really only want to know if XYZ email exists, it's usually something more like "has it been validated that ABC user can receive email at XYZ address".

Is the user account validated? Send an email to their email string. Is it not validated? Then why are we even at a point in the code where we're considering emailing the user, except to validate the email.

You can use similar logic to what you described, but instead with something like User and ValidatedUser. I just don't think there's much benefit to doing it with specifically the email field and turning email into an object. Because in those examples you can have a User whose email property is a ParseError and you still end up having to check "is the email property result for this user type Email or type ParseError?" and it's very similar to just checking a validation bool except it's hiding what's actually going on.

by mejutoco8 hours ago|

parent|

prev|

[-]

My preferred solution would be:

You have 2 types

UnvalidatedEmail

ValidatedEmail

Then ValidatedEmail is only created in the function that does the validation: a function that takes an UnvalidatedEmail and returns a ValidatedEmail or an error object.

by squeaky-clean7 hours ago|

parent|

[-]

That can work in some situations. One thing I won't like about it in some other situations is that you now have 2 nullable fields associated with your user, or whatever that email is associated with. It's annoying or even impossible in a lot of systems to have a guaranteed validation that user.UnvalidatedEmail or user.ValidatedEmail must exist but not both.

by mejutoco5 hours ago|

parent|

[-]

I see. In my example they would be just types and internally a newtype string.

So an object could have a field

email: UnvalidatedEmail | ValidatedEmail

Nothing would be nullable there in that case. You could match on the type and break if not all cases are handled.

by cogman109 hours ago|

prev|

[-]

I've seen some devs prefer that route of programming and it very often results in performance problems.

An undiscussed issue with "everything is a string or dictionary" is that strings and dictionaries both consume very large amounts of memory. Particularly in a language like java.

A java object which has 2 fields in it with an int and a long will spend most of it's memory on the object header. You end up with an object that has 12 bytes of payload and 32bytes of object header (Valhala can't come soon enough). But when you talk about a HashMap in java, just the map structure itself ends up blowing way past that. The added overhead of 2 Strings for each of the fields plus a Java `Long` and `Integer` just decimates that memory requirement. It's even worse if someone decided to represent those numbers as Strings (I've seen that).

Beyond that, every single lookup is costly, you have to hash the key to lookup the value and you have to compare the key.

In a POJO, when you say "foo.bar", it's just an offset in memory that Java ends up doing. It's absurdly faster.

Please, for the love of god, if you know the structure of the data you are working with it, turn it into your language's version of a struct. Stop using dictionaries for everything.

by ronjakoi9 hours ago|

parent|

[-]

I work with PHP, where classes are supposedly a lot slower than strings and arrays (PHP calls dictionaries "associative arrays").

by cogman109 hours ago|

parent|

[-]

Benchmark it, but from what I can find this is dated advice. It might be faster on first load but it'd surprise me if it's always faster.

Edit: looking into how PHP has evolved, 8 added a JIT in 2021. That will almost certainly make it faster to use a class rather than an associative array. Associative arrays are very hard for a JIT to look through and optimize around.

by esafak9 hours ago|

prev|

[-]

Obviously one where no-one who cared or knew better had any say.