undefined

points

[-]

I've seen a mix between stringly typed apps and strongly typed apps. The strongly typed apps had an upfront cost but were much better to work with in the long run. Define types for things like names, email address, age, and the like. Convert the strings to the appropriate type on ingest, and then inside your system only use the correct types.

by 8 hours ago|

parent|

[-]

deleted

by FranklinJabar8 hours ago|

prev|

[-]

> On your precise exemple, I can even say that I never saw something like an "Email object".

Well that's.... absolutely horrifying. Would you mind sharing what industry/stack you work with?

by Terr_6 hours ago|

parent|

[-]

> horrifying.

IMO it's worth distinguishing between different points on the spectrum of "email object", ex:

1. Here is an Email object with detailed properties or methods for accessing its individual portions, changing things to/from canonical forms (e.g. lowercase Punycode domain names), running standard (or nonstandard) comparisons, etc.

2. Here is a immutable Email object which mainly wraps an arbitrary string, so that it isn't easily mixed-up with other notable strings we have everywhere.

__________

For e-mails in particular, implementing the first is a nightmare--I know this well from recent tasks fixing bad/subjective validation rules. Even if you follow every spec with inhuman precision and cleverness, you'll get something nobody will like.

In contrast, the second provides a lot of bang for your buck. It doesn't guarantee every Email is valid, but you get much better tools for tracing flows, finding where bad values might be coming from, and for implementing future validation/comparison rules (which might be context-specific) later when you decide you need to invest in them.

by squeaky-clean7 hours ago|

parent|

prev|

[-]

The easiest and most robust way to deal with email is to have 2 fields. string email, bool isValidated. (And you'll need some additional way to handle a time based validation code). Accept the user's string, fire off an email to it and require them to click a validation link or enter a code somewhere.

Email is weird and ultimately the only decider of a valid email is "can I send email to this address and get confirmation of receipt".

If it's a consumer website you can so some clientside validation of ".@.\\..*" to catch easy typos. That will end up rejecting a super small amount of users but they can usually deal with it. Validating against known good email domains and whatnot will just create a mess.

by lock16 hours ago|

parent|

[-]

In the spirit of "Parse, Don't Validate", rather than encode "validation" information as a boolean to be checked at runtime, you can define `Email { raw: String }` and hide the constructor behind a "factory function" that accepts any string but returns `Option<Email>` or `Result<Email,ParseError>`.

If you need a stronger guarantee than just a "string that passes simple email regex", create another "newtype" that parses the `Email` type further into `ValidatedEmail { raw: String, validationTime: DateTime }`.

While it does add some "boilerplate-y" code no matter what kind of syntactical sugar is available in the language of your choice, this approach utilizes the type system to enforce the "pass only non-malformed & working email" rule when `ValidatedEmail` type pops up without constantly remembering to check `email.isValidated`.

This approach's benefit varies depending on programming languages and what you are trying to do. Some languages offer 0-runtime cost, like Haskell's `newtype` or Rust's `repr(transparent)`, others carry non-negligible runtime overhead. Even then, it depends on whether the overhead is acceptable or not in exchange for "correctness".

by squeaky-clean3 hours ago|

parent|

[-]

I would still usually prefer email as just a string and validation as a separate property, and they both belong to some other object. Unless you really only want to know if XYZ email exists, it's usually something more like "has it been validated that ABC user can receive email at XYZ address".

Is the user account validated? Send an email to their email string. Is it not validated? Then why are we even at a point in the code where we're considering emailing the user, except to validate the email.

You can use similar logic to what you described, but instead with something like User and ValidatedUser. I just don't think there's much benefit to doing it with specifically the email field and turning email into an object. Because in those examples you can have a User whose email property is a ParseError and you still end up having to check "is the email property result for this user type Email or type ParseError?" and it's very similar to just checking a validation bool except it's hiding what's actually going on.

by mejutoco6 hours ago|

parent|

prev|

[-]

My preferred solution would be:

You have 2 types

UnvalidatedEmail

ValidatedEmail

Then ValidatedEmail is only created in the function that does the validation: a function that takes an UnvalidatedEmail and returns a ValidatedEmail or an error object.

by squeaky-clean5 hours ago|

parent|

[-]

That can work in some situations. One thing I won't like about it in some other situations is that you now have 2 nullable fields associated with your user, or whatever that email is associated with. It's annoying or even impossible in a lot of systems to have a guaranteed validation that user.UnvalidatedEmail or user.ValidatedEmail must exist but not both.

by mejutoco3 hours ago|

parent|

[-]

I see. In my example they would be just types and internally a newtype string.

So an object could have a field

email: UnvalidatedEmail | ValidatedEmail

Nothing would be nullable there in that case. You could match on the type and break if not all cases are handled.

by cogman107 hours ago|

parent|

prev|

[-]

I've seen some devs prefer that route of programming and it very often results in performance problems.

An undiscussed issue with "everything is a string or dictionary" is that strings and dictionaries both consume very large amounts of memory. Particularly in a language like java.

A java object which has 2 fields in it with an int and a long will spend most of it's memory on the object header. You end up with an object that has 12 bytes of payload and 32bytes of object header (Valhala can't come soon enough). But when you talk about a HashMap in java, just the map structure itself ends up blowing way past that. The added overhead of 2 Strings for each of the fields plus a Java `Long` and `Integer` just decimates that memory requirement. It's even worse if someone decided to represent those numbers as Strings (I've seen that).

Beyond that, every single lookup is costly, you have to hash the key to lookup the value and you have to compare the key.

In a POJO, when you say "foo.bar", it's just an offset in memory that Java ends up doing. It's absurdly faster.

Please, for the love of god, if you know the structure of the data you are working with it, turn it into your language's version of a struct. Stop using dictionaries for everything.

by ronjakoi7 hours ago|

parent|

[-]

I work with PHP, where classes are supposedly a lot slower than strings and arrays (PHP calls dictionaries "associative arrays").

by cogman107 hours ago|

parent|

[-]

Benchmark it, but from what I can find this is dated advice. It might be faster on first load but it'd surprise me if it's always faster.

Edit: looking into how PHP has evolved, 8 added a JIT in 2021. That will almost certainly make it faster to use a class rather than an associative array. Associative arrays are very hard for a JIT to look through and optimize around.

by esafak7 hours ago|

parent|

prev|

[-]

Obviously one where no-one who cared or knew better had any say.

by hathawsh5 hours ago|

prev|

[-]

Python has an "email object" that you should definitely use if you're going to parse email messages in any way.

https://docs.python.org/3/library/email.message.html

I imagine other languages have similar libraries. I would say static typing in scripting languages has arrived and is here to stay. It's a huge benefit for large code bases.

by Thaxll7 hours ago|

prev|

[-]

Trying to parse email will result in bad assumptions. Better be a plain string than a bad regex.

For examples many website reject + character, which is totally valid and gmail uses that for temporary emails.

Same for adresses.

by jghn7 hours ago|

parent|

[-]

A lot of posts in this thread are conflating two separate but related topics. Statically typing a string as EmailAddress does not imply validating that the string in question is a valid email address. Both operations have their merits and downsides, but they don't need to be tied together.

Having a type wrapper of EmailAddress around a string with no business logic validation still allows me to take a string I believe to be an email address and be sure that I'm only passing it into function parameters that expect an email address. If I misorder my parameters and accidentally pass it to a parameter expecting a type wrapper of UserName, the compiler will flag it.

by abnercoimbre7 hours ago|

parent|

prev|

[-]

Recently got a bank account which allowed my custom domain during registration, but rejected it as invalid during login. The problem? Their JS client code has a bad regex rejecting TLDs longer than 4 chars (trivial for a dev to bypass, but wow.)

by tracker18 hours ago|

prev|

[-]

What's funny, is this is exactly one of the reasons I happen to like JavaScript... at its' core, the type coercion and falsy boolean rules work really well (imo) for ETL type work, where you're dealing with potentially untrusted data. How many times have you had to import a CSV with a bad record/row? It seems to happen all the time, why, because people use and manually manipulate data in spreadsheets.

In the end, it's a big part of why I tend to reach for JS/TS first (Deno) for most scripts that are even a little complex to attempt in bash.

by rileymichael8 hours ago|

prev|

[-]

this is likely an ecosystem sort of thing. if your language gives you the tools to do so at no cost (memory/performance) then folks will naturally utilize those features and it will eventually become idiomatic code. kotlin value classes are exactly this and they are everywhere: https://kotlinlang.org/docs/inline-classes.html

by gr4vityWall7 hours ago|

parent|

[-]

Haxe has a really elegant solution to this in the form of Abstracts[0][1]. I wonder why this particular feature never became popular in other languages, at least to my knowledge.

0 - https://code.haxe.org/category/abstract-types/color.html

1 - https://haxe.org/manual/types-abstract.html

by Boxxed8 hours ago|

prev|

[-]

Well that's terrifying

by mattmanser5 hours ago|

prev|

[-]

Clearly never worked in any statically typed language then.

Almost every project I've worked on has had some sort of email object.

Like I can't comprehend how different our programming experiences must be.

Everything is parsed into objects at the API layer, I only deal with strings when they're supposed to be strings.

by eptcyka8 hours ago|

prev|

[-]

My condolences, I urge you to recover from past trauma and not let it prohibit a happy life.

by krick6 hours ago|

prev|

[-]

At first I had a negative reaction to that comment and wanted to snap back something along the lines of "that's horrible" as well, but after thinking for a while, I decided that if I have anything to contribute to the discussion, I have to kinda sorta agree with you, and even defend you.

I mean, of course having a string, when you mean "email" or "date" is only slightly better than having a pointer, when you mean a string. And everyone's instinctive reaction to that should be that it's horrible. In practice though, not only did I often treat some complex business-objects and emails as strings, but (hold onto yourselves!) even dates as strings, and am ready to defend that as the correct choice.

Ultimately, it's about how much we are ready to assume about the data. I mean, that's what modelling is: making a set of assumptions about the real world and rejecting everything that doesn't fit our model. Making a neat little model is what every programmer wants. It's the "type-driven design" the OP praises. It's beautiful, and programmers must make beautiful models and write beautiful code, otherwise they are bad programmers.

Except, unfortunately, programming has nothing to do with beauty, it's about making some system that gets some data from here, displays it there and makes it possible for people and robots to act on the given data. Beautiful model is essentially only needed for us to contain the complexity of that system into something we can understand and keep working. The model doesn't truly need t be complete.

Moreover, as everyone with 5+ years of experience must known (I imagine), our models are never complete, it always turns out that assumptions we make are naïve it best. It turns out there was time before 1970, there are leap seconds, time zones, DST, which is up to minutes, not hours, and it doesn't necessarily happen on the same date every year (at least not in terms of Gregorian calendar, it may be bound to Ramadan, for example). There are so many details about the real world that you, brave young 14 (or 40) year old programmer don't know yet!

So, when you model data "correctly" and turn "2026-02-10 12:00" (or better yet, "10/02/2026 12:00") into a "correct" DateTime object, you are making a hell lot of assumptions, and some of them, I assure you, are wrong. Hopefully, it just so happens that it doesn't matter in your case, this is why such modelling works at all.

But what if it does? What if it's the datetime on a ticket that a third party provided to you, and you are providing it to a customer now? And you get sued if it ends up the wrong date because of some transformations that happened inside of your system? Well, it's best if it doesn't happen. Fortunately, no other computations in the system seem to rely on the fact it's a datetime right now, so you can just treat it as a string. Is it UTC? Event city timezone? Vendor HQ city timezone? I don't know! I don't care! That's what was on the ticket, and it's up to you, dear customer, to get it right.

So, ultimately, it's about where you are willing to put the boundary between your model and scary outer world, and, pragmatically, it's often better NOT to do any "type-driven design" unless you need to.