In 99% of the projects I worked on my professional life, anything that is coming from an human input is manipulated as a string and most of the time, it stays like this in all of the application layers (with more or less checks in the path).
On your precise exemple, I can even say that I never saw something like an "Email object".
Well that's.... absolutely horrifying. Would you mind sharing what industry/stack you work with?
IMO it's worth distinguishing between different points on the spectrum of "email object", ex:
1. Here is an Email object with detailed properties or methods for accessing its individual portions, changing things to/from canonical forms (e.g. lowercase Punycode domain names), running standard (or nonstandard) comparisons, etc.
2. Here is a immutable Email object which mainly wraps an arbitrary string, so that it isn't easily mixed-up with other notable strings we have everywhere.
__________
For e-mails in particular, implementing the first is a nightmare--I know this well from recent tasks fixing bad/subjective validation rules. Even if you follow every spec with inhuman precision and cleverness, you'll get something nobody will like.
In contrast, the second provides a lot of bang for your buck. It doesn't guarantee every Email is valid, but you get much better tools for tracing flows, finding where bad values might be coming from, and for implementing future validation/comparison rules (which might be context-specific) later when you decide you need to invest in them.
Email is weird and ultimately the only decider of a valid email is "can I send email to this address and get confirmation of receipt".
If it's a consumer website you can so some clientside validation of ".@.\\..*" to catch easy typos. That will end up rejecting a super small amount of users but they can usually deal with it. Validating against known good email domains and whatnot will just create a mess.
If you need a stronger guarantee than just a "string that passes simple email regex", create another "newtype" that parses the `Email` type further into `ValidatedEmail { raw: String, validationTime: DateTime }`.
While it does add some "boilerplate-y" code no matter what kind of syntactical sugar is available in the language of your choice, this approach utilizes the type system to enforce the "pass only non-malformed & working email" rule when `ValidatedEmail` type pops up without constantly remembering to check `email.isValidated`.
This approach's benefit varies depending on programming languages and what you are trying to do. Some languages offer 0-runtime cost, like Haskell's `newtype` or Rust's `repr(transparent)`, others carry non-negligible runtime overhead. Even then, it depends on whether the overhead is acceptable or not in exchange for "correctness".
Is the user account validated? Send an email to their email string. Is it not validated? Then why are we even at a point in the code where we're considering emailing the user, except to validate the email.
You can use similar logic to what you described, but instead with something like User and ValidatedUser. I just don't think there's much benefit to doing it with specifically the email field and turning email into an object. Because in those examples you can have a User whose email property is a ParseError and you still end up having to check "is the email property result for this user type Email or type ParseError?" and it's very similar to just checking a validation bool except it's hiding what's actually going on.
You have 2 types
UnvalidatedEmail
ValidatedEmail
Then ValidatedEmail is only created in the function that does the validation: a function that takes an UnvalidatedEmail and returns a ValidatedEmail or an error object.
So an object could have a field
email: UnvalidatedEmail | ValidatedEmail
Nothing would be nullable there in that case. You could match on the type and break if not all cases are handled.
An undiscussed issue with "everything is a string or dictionary" is that strings and dictionaries both consume very large amounts of memory. Particularly in a language like java.
A java object which has 2 fields in it with an int and a long will spend most of it's memory on the object header. You end up with an object that has 12 bytes of payload and 32bytes of object header (Valhala can't come soon enough). But when you talk about a HashMap in java, just the map structure itself ends up blowing way past that. The added overhead of 2 Strings for each of the fields plus a Java `Long` and `Integer` just decimates that memory requirement. It's even worse if someone decided to represent those numbers as Strings (I've seen that).
Beyond that, every single lookup is costly, you have to hash the key to lookup the value and you have to compare the key.
In a POJO, when you say "foo.bar", it's just an offset in memory that Java ends up doing. It's absurdly faster.
Please, for the love of god, if you know the structure of the data you are working with it, turn it into your language's version of a struct. Stop using dictionaries for everything.
Edit: looking into how PHP has evolved, 8 added a JIT in 2021. That will almost certainly make it faster to use a class rather than an associative array. Associative arrays are very hard for a JIT to look through and optimize around.
https://docs.python.org/3/library/email.message.html
I imagine other languages have similar libraries. I would say static typing in scripting languages has arrived and is here to stay. It's a huge benefit for large code bases.
For examples many website reject + character, which is totally valid and gmail uses that for temporary emails.
Same for adresses.
Having a type wrapper of EmailAddress around a string with no business logic validation still allows me to take a string I believe to be an email address and be sure that I'm only passing it into function parameters that expect an email address. If I misorder my parameters and accidentally pass it to a parameter expecting a type wrapper of UserName, the compiler will flag it.
In the end, it's a big part of why I tend to reach for JS/TS first (Deno) for most scripts that are even a little complex to attempt in bash.
0 - https://code.haxe.org/category/abstract-types/color.html
Almost every project I've worked on has had some sort of email object.
Like I can't comprehend how different our programming experiences must be.
Everything is parsed into objects at the API layer, I only deal with strings when they're supposed to be strings.
I mean, of course having a string, when you mean "email" or "date" is only slightly better than having a pointer, when you mean a string. And everyone's instinctive reaction to that should be that it's horrible. In practice though, not only did I often treat some complex business-objects and emails as strings, but (hold onto yourselves!) even dates as strings, and am ready to defend that as the correct choice.
Ultimately, it's about how much we are ready to assume about the data. I mean, that's what modelling is: making a set of assumptions about the real world and rejecting everything that doesn't fit our model. Making a neat little model is what every programmer wants. It's the "type-driven design" the OP praises. It's beautiful, and programmers must make beautiful models and write beautiful code, otherwise they are bad programmers.
Except, unfortunately, programming has nothing to do with beauty, it's about making some system that gets some data from here, displays it there and makes it possible for people and robots to act on the given data. Beautiful model is essentially only needed for us to contain the complexity of that system into something we can understand and keep working. The model doesn't truly need t be complete.
Moreover, as everyone with 5+ years of experience must known (I imagine), our models are never complete, it always turns out that assumptions we make are naïve it best. It turns out there was time before 1970, there are leap seconds, time zones, DST, which is up to minutes, not hours, and it doesn't necessarily happen on the same date every year (at least not in terms of Gregorian calendar, it may be bound to Ramadan, for example). There are so many details about the real world that you, brave young 14 (or 40) year old programmer don't know yet!
So, when you model data "correctly" and turn "2026-02-10 12:00" (or better yet, "10/02/2026 12:00") into a "correct" DateTime object, you are making a hell lot of assumptions, and some of them, I assure you, are wrong. Hopefully, it just so happens that it doesn't matter in your case, this is why such modelling works at all.
But what if it does? What if it's the datetime on a ticket that a third party provided to you, and you are providing it to a customer now? And you get sued if it ends up the wrong date because of some transformations that happened inside of your system? Well, it's best if it doesn't happen. Fortunately, no other computations in the system seem to rely on the fact it's a datetime right now, so you can just treat it as a string. Is it UTC? Event city timezone? Vendor HQ city timezone? I don't know! I don't care! That's what was on the ticket, and it's up to you, dear customer, to get it right.
So, ultimately, it's about where you are willing to put the boundary between your model and scary outer world, and, pragmatically, it's often better NOT to do any "type-driven design" unless you need to.
I think you have a very rose-tinted view of the past: while on the academic side static types were intended for proof on the industrial side it was for efficiency. C didn't get static types in order to prove your code was correct, and it's really not great at doing that, it got static types so you could account for memory and optimise it.
Java didn't help either, when every type has to be a separate file the cost of individual types is humongous, even more so when every field then needs two methods.
> In most strong statically typed languages, you wouldn't often pass strings and generic dictionaries around.
In most strong statically typed languages you would not, but in most statically typed codebases you would. Just look at the Windows interfaces. In fact while Simonyi's original "apps hungarian" had dim echoes of static types that got completely washed out in system, which was used widely in C++, which is already a statically typed language.
I think they also forgot the entire Perl era.
(Decades later, my "magnum opus" has been through multiple mental redesigns and unsatisfactory partial implementations. This time, for sure...)
It's tricky because `class` conflates a lot of semantically-distinct ideas.
Some people might be making `Date` objects to avoid writing defensive code everywhere (since classes are types), but...
Other people might be making `Date` objects so they can keep all their date-related code in one place (since classes are modules/namespaces, and in Java classes even correspond to files).
Other people might be making `Date` objects so they can override the implementation (since classes are jump tables).
Other people might be making `Date` objects so they can overload a method for different sorts of inputs (since classes are tags).
I think the pragmatics of where code lives, and how the execution branches, probably have a larger impact on such decisions than safety concerns. After all, the most popular way to "avoid writing defensive code everywhere" is to.... write unsafe, brittle code :-(
There's nothing natural about this. It's not like we're born knowing good object-oriented design. It's a pattern that has to be learned, and the linked article is one of the well-known pieces that helped a lot of people understand this idea.
The bitter lesson of programming languages is that whatever clever, fast, safe, low-level features a language has, someone will come along and create a more productive framework in a much worse language.
Note, this framework - perhaps the very last one - is now ‘AI’.
One bug was in a system that had an Email type but didn't actually enforce the invariants of emails. The one that caused the problem was it didn't enforce case insensitive comparisons. Trivial to fix, but it was encased in layers of stuff that made tracking it down difficult.
The other was a home grown ORM that used the same optional / maybe type to represent both "leave this column as the default" and "set this column to null". It should be obvious how this could go wrong. Easy to fix but it fucked up some production data.
Both of these are failures to apply "parse, don't validate". The form didn't enforce the invariants it had supposedly parsed the data into. The latter didn't differentiate two different parsing.
As per [RFC 5321](https://www.rfc-editor.org/rfc/rfc5321.html):
> the local-part MUST be interpreted and assigned semantics only by the host specified in the domain part of the address.
You're not allowed to do that. The email address `foo@bar.com` is identical to `foo@BAR.com`, but not necessarily identical to `FOO@bar.com`. If we're going to talk about 'commonly applied normalisations at most email providers', where do you draw that line? Should `foo+whatever@bar.com` be considered equal to `foo@bar.com`? That souds weird, except - that is exactly how gmail works, a couple of other mail providers have taken up that particular torch, and if your aim is to uniquely identify a 'recipient', you can hardcode that `a@gmail.com` and `a+whatever@gmail.com` definitely, guaranteed, end up at the same mailbox.
In practice, yes, users _expect_ that email addresses are case insensitive. Not just users, even - various intermediate systems apply the same incorrect logic.
This gets to an intriguing aspect of hardcoding types: You lose the flex, mostly. types are still better - the alternative is that you reliably attempt to write the same logic (or at least a call to some logic) to disentangle this mess every time you do anything with a string you happen to know is an email address which is terrible but gives you the option of intentionally not doing that if you don't want to apply the usual logic.
That's no way to program, and thus actual types and the general trend that comes with it (namely: We do this right, we write that once, and there is no flexibility left). Programming is too hard to leave room for exotic cases that programmers aren't going to think about when dealing with this concept. And if you do need to deal with it, it can still be encoded in the type, but that then makes visible things that in untyped systems are invisible (if my email type only has a '.compare(boolean caseSensitive)' style method, and is not itself inherently comparable because of the case sensitivity thing, that makes it _seem_ much more complicated than plain old strings. This is a lie - emails in strings *IS* complicated. They just are. You can't make that go away. But you can hide it, and shoving all data in overly generic data types (numbers and strings) tends to do that.
If you don’t want to lose potentially important email then you need to make sure your own systems are case-insensitive everywhere. Otherwise you’ll find out the hard way when a customer or supplier is using a system that capitalises entire email addresses (yes, I have seen this happen) & you lose important messages.
Java makes it a pain though, so most code ends up primitive obsessed. Other languages make it easier, but unless the language and company has a strong culture around this, they still usually end up primitive obsessed.
record PhoneNumber(String value) {}
Huge pain.We tried “typed” strings like this on a project once for business identifiers.
Overall it worked in making sure that the wrong type of ID couldn’t accidentally be used in the wrong place, but the general consensus after moving on from the project was that the “juice was not worth the squeeze”.
I don’t know if other languages make it easier, but in c# it felt like the language was mostly working against you. For example data needs to come in and out over an API and is in string form when it does, meaning you have to do manual conversions all the time.
In c# I use named arguments most of the time, making it much harder to accidentally pass the wrong string into a method or constructor’s parameter.
But the context this type of an alias should exist in is one where a string isn't turned into a PhoneNumber until you've validated it. All the functions taking a string that might end up being a PhoneNumber need to be highly defensive - but all the functions taking a PhoneNumber can lean on the assumptions that go into that type.
It's nice to have tight control over the string -> PhoneNumber parsing that guarantees all those assumptions are checked. Ideally that'd be done through domain based type restrictions, but it might just be code - either way, if you're diligent, you can stop being defensive in downstream functions.
Yeah, I can't relate at all with not using a type for this after having to write gross defensive code a couple of times e.g. if it's not a phone number you've got to return undefined or throw an exception? The typed approach is shorter, cleaner, self-documenting, reduces bugs and makes refactoring easier.
Even if you don't do any validation as part of the construction (and yeah, having a separate type for validated vs unvalidated is extremely helpful), universally using type aliases like that pretty much entirely prevents the class of bugs from accidentally passing a string/int typed value into a variable of the wrong stringy/inty type, e.g. mixing up different categories of id or name or whatever.
void callNumber(string phoneNumber);
void associatePhoneNumber(string phoneNumber, Person person);
Person lookupPerson(string phoneNumber);
Provider getProvider(string phoneNumber);
I pass in "555;324+289G". Are you putting validation logic into all of those functions? You could have a validation function you write once and call in all of those functions, but why? Why not just parse the phone number into an already validated type and pass that around? PhoneNumber PhoneNumber(string phoneNumber);
void callNumber(PhoneNumber phoneNumber);
void associatePhoneNumber(PhoneNumber phoneNumber, Person person);
Person lookupPerson(PhoneNumber phoneNumber);
Provider getProvider(PhoneNumber phoneNumber);
Put all of the validation logic into the type conversion function. Now you only need to validate once from string to PhoneNumber, and you can safely assume it's valid everywhere else.That's what I see as the primary value to this sort of typing. Enforcing the invariants is a separate matter.
You can get ever so gradually stricter with your types which means that the operations you perform on on a narrow type is even more solid
It is also 100% possible to do in dynamic languages, it's a cultural thing
When the type is more complex, specific contraints should be used. For a real live example: I designed a type for the occupation of a hotel booking application. The number of occupants of a room must be positiv and a child must be accompanied by at least one adult. My type Occupants has a constructor Occupants(int adults, int children) that varifies that condition on construction (and also some maximum values).
Note that this does not violate the "Parse, Don't Validate" rule. This rule does not prevent you from doing stupid things with a "parsed" type.
In other cases, I use its cousin unchecked on int values, when an overflow is okay, such as in calculating an int hash code.
So things stay as maps or arrays all the way through.
Email honestly seems much more straightforward than dates... Sweden had a Feb 30 in 1712, and there's all sorts of date ranges that never existed in most countries (e.g. the American colonies skipped September 3-13 in 1752).
Essentially the article says that each data type should have a single location in code where it is constructed, which is a very class-based way of thinking. If your Java class only has a constructor and getters, then you're already home free.
Also for the method to be efficient you need to be able to know where an object was constructed. Fortunately class instances already track this information.
This is one of the really key ideas behind OOP that tends to get overlooked. A constructor's job is to produce a semantically valid instance of a class. You do the validation during construction so that the rest of the codebase can safely assume that if it can get its hands on a Foo, it's a valid Foo.
Like, my preferred alternative is not "return an invalid Email object" but "return a sum type representing either an Email or an Error", because I like languages with sum types and pattern matching and all the cultural aspects those tend to imply.
But if you are writing Python or Java, that might look like "throw an exception in the constructor". And that is still better than "return an Email that isn't actually an email".
I definitely agree returning a sum type is ideal.
If you have onerous validation on the constructor, you will run into extremely obvious problems during testing. You just want a jungle, but you also need the ape and the banana.
`String -> Result<Email, Error>` shouldn't need any other parameters?
But you should ideally still have some simple field-wise constructor (whatever that means, it's language-dependent) anyways, the function from String would delegate to that after either extracting all of the necessary components or returning/throwing an error.
C-like languages have this a little bit, in that you'll probably make a struct/class from whatever you're looking at and pass it around rather than a dictionary. But dates are probably just stored as untyped numbers with an implicit meaning, and optionals are a foreign concept (although implicit in pointers).
Now, I know that this stuff has been around for decades, but it wasn't something I'd actually use until relatively recently. I suspect that's true of a lot of other people too. It's not that we forgot why strong static type checking was invented, it's that we never really knew, or just didn't have a language we could work in that had it.