Stop writing jack-in-the-box code06 Apr 2020
In my previous article I explained why it’s time to move away from using
string as the type of things in your code just because they’re meant to contain text. That article was more of an appetizer, most of what matters is repeated here so feel free to continue reading even if you haven’t read the previous one.
Jack-in-the-box, the toy
Jack-in-the-box is a 14th-century children’s toy that looks like a box with a crank that can be turned to play music. If one keeps turning the crank eventually something pops out to startle them. The good ones pop out randomly, as opposed to at the end of the song.
Jack-in-the-box, the code
I like to think of loosely typed variables — such as an email field typed as
string — as little jacks-in-the-box in your code. Most of the time the box plays a beautiful song, but turn that crank for long enough and you hit an edge case, and a nasty exception springs out of the box because that email string turned out to be an empty string. After all that’s why you take your laptop when going on vacation.
Your vacation deserves better
Just like the better quality versions of jack-in-the-box, invalid content in your domain usually breaks at unexpected places. This is not a problem with your architecture. Most (if not all) your code assumes that an email field is not an empty string, is not multiline, does not contain spaces, etc, because assuming the opposite means validating everywhere which is just not realistic.
Of course you can just validate the known entry points such as user input, but start adding shady APIs and morally flexible document DBs into the mix, and soon enough you’ll have more “validation” code than “code” code in your solution.
Goodbye Jack, you won’t be missed
Saying goodbye to Jack is easy, and it even has a name, it’s called making illegal states unpresentable. It’s a mouthful, but as far as mouthfuls go, it’s one of the most important ones I’ve come across. It means that if a thing is not supposed to be a different thing, design your domain so it can’t possibly be that different thing.
Think about it, you type integer fields as
int and text fields as
string precisely so that integer doesn’t turn out to be a string when you need to use it. But why settle for this rudimentary safety. If an email is always different than a country code field, why give them the same type. They are different types in real life, it’s time for them to be different types in your code.
Don’t just do it in the one place where you expect errors, do it everywhere. A string is never a string. Look closely and that endpoint that expects a string probably actually expects a string no longer than 32,768 characters with no tabs, so why use
string when there’s a readily available
32KnoTabString type for you to use, or at least there will be after you create it.
Ok, it requires some code
So the idea is to create types. Lots of types. Creating lots of types might not sound appealing depending on your programming language of choice, but this is absolutely a language limitation, not a technical one.
This is how little code is necessary to declare a new type that represents non empty strings in F#, ready to replace all your strings that were probably never meant to be empty anyway. Good riddance.
// Non-empty string, single-case union style type ActualText = private ActualText of string with static member New = function | s when String.IsNullOrEmpty s -> None | s -> Some (ActualText s) member x.Value = let (ActualText s) = x in s // DISCLAIMER: // Meant to illustrate the point above, it's not particularly good code
While this is more verbose than not declaring anything and using strings everywhere, think of all the exceptions these 5 lines of code prevent, and all the content-checking and validation code they render useless.
It’s usually cheaper (both in lines of code and in potential errors) to fix the problem at the source, and the source is your domain — where you define what a thing is.
Tell it like it isn’t
So it all boils down to making domains more explicit about what things are (an email is a string) and aren’t (an email is not a multiline string), and this is done using types with all the necessary validation embedded into them. The concept is simple, and so is the code. Let’s take a look at one way (of many possible ways) to define an
// embedding validation in the type itself (object oriented style) type Email private (s:string) = class end with static member Validate = function | s when String.IsNullOrWhiteSpace s -> Error "input is empty" | s when Regex.IsMatch(s, "[^\w-+_.@]") -> Error "input contains invalid characters" | s when Regex.IsMatch(s, "^[^@]+@\w+.\w+$") -> Ok (Email(s)) | _ -> Error "invalid email" member x.Value = s
Note that here a conscious choice was made to have more validation cases than necessary in order to have more meaningful errors. If brevity is your concern you can always have a single validation case with a universal “invalid” error message. It’s still safer than using strings, albeit not particularly user friendly.
Using your shiny new types
These types require some care. Remember, we got rid of jack-in-the-box, but we still have a box, and we still don’t know what’s in it until we open it. Call it an appropriately labeled container. Or call it
Result. You may still find
Result less convenient to use than
string bindings, but consider the following two things:
Resultwill never blow up in your face (this is a good thing)
- Functional languages have things that start with ‘M…’ but shall not be named that allow you to write almost exactly the same code that you would using strings (also a good thing)
I’m not going to go deeper into the topic of
Result, it’s a huge topic and beyond the scope of this article. For now we’ll just use a plain old match expression (POME) to unpack that box. Turns out this particular one would’ve blown up in our face. Bloody linebreakers…
// safely create and consume an email using embedded validation let result = Email.Validate "do\email@example.com" match result with | Ok x -> consumeValidEmail x.Value | e -> printfn "%A" e // OUTPUT> Error "input contains control characters"
There will be blocks
There’s not a lot of code in this article, and it’s not particularly good code either. The next one will have more and better code, but hopefully it’s enough to illustrate the concept of designing with types.
If you enjoyed it please consider retweeting this article’s tweet to support the blog!