April 18, 2012 / cdsmith

Why Do Monads Matter?

(A Side Note: I’ve been formulating the final thoughts on this post for about a week now. In an entirely unrelated coincidence, a good friend of mine and fellow Haskell programmer, Doug Beardsley, ended up writing two posts about monads over the weekend as well. Weird! But don’t fret; this isn’t really the same thing at all. I’m not writing to teach Haskell programmers how to use monads. I’m writing about a kind of intuition about why these concepts turn out to matter in the first place. You won’t find much here by way of how to program in Haskell.)

Category Theory for Software Development?

Match made in heaven? Or abstraction distraction?

If you’re a software developer, have you heard about monads and wondered what they were? Have you tried to learn Haskell, and struggled with them? Wondered why people worry so much about them? Have you watched the videos from Microsoft’s “Channel 9” and heard a bunch of researchy Microsoft folk talk about them, but had trouble relating them to your day-to-day programming experience?

Or if you’re interested in mathematics, have you heard murmurs in the past about how category theory interests computer science people? Looked for some clear statement of why we care, and what problems we might be interested in? Wondered if it’s really true at all? Perhaps you are like a friend of mine (and a first-rate algebraist, too, so it’s entirely reasonable to have these questions) who asked me about this a year or so ago, remembered hearing a lot of excitement in the early 90s about category theory and computer science, but never heard whether it had really panned out or was a dead end?

These are the kinds of questions I begin with. My goal is to demonstrate for you, with details and examples:

Where category-based intuition and ideas, and monads in particular, come from in computer programming.
Why the future of programming does lie in these ideas, and their omission in today’s mainstream languages has cost us dearly.
What the state of the art looks like in applying category-based ideas to problems in computer programming.

If you’re coming into this without a knowledge of category theory, never fear; this may be one of the gentlest introductions to the idea of categories and monads that you will find. But you’ll want to slow down and take a moment to understand the definition of a category and related ideas like function composition; these are absolutely crucial. Then you want to completely skip or just skim through the section called “What’s This Got To Do With Monads?” where I tell you how what we’re talking about here relates to the traditional math meaning of monads. Don’t worry, you don’t need to know that at all.

On the other hand, if you’re a mathematician, you may want to skim the bits where I review basic category theory, and just dig in where I am talking about the computer programming perspective. Just be forewarned, my introduction to monads will be via Kleisli categories, so take a minute when we get to that part and make sure you’re familiar with how the relationship works out.

Ready? Here goes!

Computer Programming and Functions: A Tenuous Relationship

Quick quiz: Do computer programmers use functions?

Ask any computer programmer you know, and you will hear: YES! Functions are some of the most basic tools computer programmers use. Then you’ll get odd looks, for asking such a silly question. Of course computer programmers use functions. That’s like asking if carpenters use nails! Right?

The truth, though, is a bit more complicated. To a mathematician, a function is just an association of input values to output values… and that is all! Any two functions that associate the same input values to the same output values are the same. Yes, you can represent functions by formulas (sometimes, anyway), but you can also represent them with just tables of inputs and outputs, or if they are functions between real numbers, as graphs. If you ask computer programmers for examples of functions, though, you will start hearing about some pretty bizarre things. I call these the “I must have skipped that day of calculus” functions. These are things that computer programmers are quite happy referring to as functions, but that to a mathematician are not really functions at all!

“Functions” that return randomly chosen numbers… and if evaluated several times, will give a different answer each time.
“Functions” that return one answer on Sundays, but a different answer on Mondays, yet another on Tuesdays, and so on.
“Functions” that cause words to appear on some nearby computer screen every time you calculate their values.

What’s going on here? Most computer programmers go about their lives happily calling these things functions, but really they are something different. But wait a second! They do have quite a lot in common with functions. Namely, they have: (a) parameters, representing their domain; and (b) return values, representing their range. (Many computer programmers are happy to talk about functions that have no parameters, or no return values… but there’s no need to be overly picky here. We can just regard their domains and ranges as one-element sets, so that no actual information is conveyed, but we can keep up appearances.)

Even more importantly, these “functions” share one more thing with the functions of mathematicians: they are constantly being composed by taking the result from one function and passing it along as a parameter to another function. When I say composed, I mean it almost exactly in the basic mathematics sense of function composition: (f · g)(x) = f(g(x)). In fact, the whole reason our “functions” exist at all is to be composed with each other! Once upon a time, in the early days of computers, we liked to keep track of information by just sticking it in known places in the computer’s memory; but all this shared knowledge about where to find information made it hard to write parts separately and fit them together, so we mostly switched to this idea of functions and composition instead.

Here’s the executive summary so far:

When computer programmers talk about functions, they do not mean exactly what mathematicians do.
What they do mean is the idea of having inputs (domains), outputs (ranges), and most importantly composition.

Along Came The Category…

So in the previous section, we ended up with our hands full of things that sort of look like functions. They have domains and ranges, and they can be composed. But at the same time, they are not functions in the mathematics sense. Baffling? No, not really. Mathematicians deal with stuff like that a lot. They have a name for systems of function-esque things of exactly that form. That name is… cue the drumroll, please… CATEGORIES!

In math-speak, categories are:

collections of “objects” (you should think of sets),
and “arrows” (you should think of functions between sets),
where each arrow has a domain and a range,
each object has an “identity” arrow (think of the identity function, where f(x) = x)
and arrows can be composed when the domains and ranges match up right.

Before we agree to call something a category, we also throw in a few rules, such as if you compose any function with an identity, it doesn’t actually change, and composing functions obeys the associative property. These should be unsurprising, so if they seem strange to you, please take a moment, grab a pencil, and try working it out using the definition of function composition earlier: (f · g)(x) = f(g(x)), and simplifying.

The nice thing about categories is this: it’s not just some pointless abstraction that a bunch of mathematicians made up. Categories are defined that way because people have looked at literally hundreds of things that all look sort of like functions with domains and ranges and compositions. Things from algebra, like groups and rings and vector spaces; things from analysis, like metric spaces and topological spaces; things from combinatorics, like elements of partially ordered sets and paths in graphs; things from formal logic and foundations, like proofs and propositions. Almost without fail, they can be described using all the ideas we just looked at! In short, categories are the right intuition for talking about composing things with domains and ranges, which is exactly the situation we’re in.

The Four Horsemen of the Catapocalypse

Now you can see why categories come into the picture: they are the right intuition for things that maybe aren’t functions, but can be composed like functions. But just because a category exists doesn’t mean it’s worth talking about. What makes this worth talking about is that the category-related ideas aren’t just there, but actually express common concerns for computer programmers.

It’s now time to get a little more specific, and introduce the four examples that will guide us the rest of the way through this exploration. Each example highlights one way that the “functions” used by computer programmers might be different from the functions that mathematicians talk about. These examples represent actual kinds of problems that computer programmers have run into and solved, and we’ll look more at the practical side of them later. For now, we’ll just be happy getting familiar with the general ideas.

The First Horseman: Failure

The first problem is failure. Computer programmers do lots of things that might fail. Reading from files (they might not exist, or on a computer with more than one user, they might not be set to allow you to read them), talking over the internet (the network might be broken or too slow), even just doing plain old calculations with a large amount of data (you might run out of memory). Because of this, dealing with failure is a constant concern.

In general, in modern computer programming tools, it’s always understood that a function might fail. You may get an answer, but you also may get a reason that the task could not be completed. When that happens, programmers are responsible for dealing with it responsibly: letting someone know, cleaning up the leftover mess in computer memory from a half-complete task, and just otherwise putting the pieces back together. A major factor in programming techniques or tools is how easy they make it for programmers to cope with the constant possibility of failure.

The Second Horseman: Dependence

The second problem is dependence on outside information. While functions of mathematics are nice and self-contained, computer programmers often don’t have that luxury. Computer programs are messes of configuration. Even simple mobile phones have pages and pages of settings. What language does the user speak? How often should you save off a copy of their work? Should you encrypt communication over the network? Rare is the application today that doesn’t have a “Settings” or “Preferences” menu item. In many other contexts, too, computer programs depend on information that is a sort of “common knowledge” throughout the application, or some part of the application.

Ways of dealing with this have progressed through the ages. When everything was stored in well-known memory locations anyway, it was easy enough to just look there for information you need; but that led to problems when different parts of a program needed different information and sections of programs could step on each other’s toes. The massively influential technique known as object-oriented programming can be seen as partly an attempt to solve exactly this problem by grouping functions into a context with information that they depend on. The simplest and most flexible answer would be to just pass the information around to all the functions where it is needed… but when that’s a lot of places, passing around all those parameters can be very, very inconvenient.

The Third Horseman: Uncertainty

The third problem is uncertainty, also known as non-determinism. A normal function associates an input to an output. A non-deterministic function associates an input to some number of possible outputs. Non-determinism is less well-known than the first two problems, but possibly only because it hasn’t yet seen a convincing solution in a general purpose language! Consider:

Theoretical computer science talks about non-determinism all the time, because it’s the right approach for discussing a lot of computational problems, ranging from parsing to search to verification. That language just hasn’t made its way into the programming practice.
Non-determinism comes up when querying, searching, or considering many possible answers. These are precisely the places that programmers end up relying on a variety of domain specific languages, ranging from SQL to Prolog, and more recently language-integrated technologies like LINQ.
Even with specialized languages for heavy-duty querying and search tasks, we still end up writing a lot of our own nested looping and control structures for the purpose of looking through possibilities when it’s not worth crossing that language barrier. This kind of thing is responsible for some of the more complex code structures you find these days.

While the first two problems of failure and dependence are at least partly solved by current mainstream programming languages, non-determinism is as yet solved mostly by special-purpose sub-languages, with LINQ as the notable exception.

The Fourth Horseman: Destruction

Finally, the fourth problem is destruction. Evaluating a math-type function is observable only in that you now know the answer. But in computer programming, functions can have permanent effects on the world: displaying information, waiting on responses from other computers or people, printing documents, even quite literally exploding things, if they are running on military systems! Because of this, things that aren’t specified in mathematics, like the order in which evaluation happens, matter quite a lot here.

The destructive nature (by which we just mean having effects that can’t be undone) of computer programming functions has plenty of consequences. It makes programming more error-prone. It makes it harder to divide up a task and work on different parts simultaneously, such as you might want to do with a modern multi-core computer, because doing the parts in the wrong order might be incorrect. But at the same time, these destructive effects are in a sense the whole point of computer programming; a program that has no observable effects would not be worth running! So in practically all mainstream programming languages, our functions do have to cope with the problem of destruction.

Back To The Function

Now we’ve seen the faces of some problems we find in the computer programming world. We build software that might fail, has to deal with a ton of extra context, models non-deterministic choice, and sometimes has observable effects on the world that constrain when we can perform the computation.

It may now seem that we’ve left the nice and neat world of mathematical functions far behind. We have not! On closer inspection, we’ll see that if we can just squint hard enough, each of these quasi-functions can actually be seen as true, honest-to-goodness functions after all. There is a cost, though. To turn them into real functions, we need to change the range of those functions to something else. Let’s see how it works for each of our function types in turn:

Functioning With Failure

Our first example of pseudo-functions were those that might fail. It’s not hard to see that a function that could fail is really just a function whose results include two things:

successes, which are the intended possible results; and
failures, which are descriptions of why the attempt failed.

So for any set A, we’ll define a new set called Err(A) to be just A together with possible reasons we might have failed. Now a possibly failing function from a set A to a set B is really just an ordinary function from A to Err(B).

Functioning With Dependence

Our second type of pseudo-functions were those that depended on information that they got from the world around them: perhaps preferences or application settings. We play a similar trick here, but for a set A, we will define the set Pref(A) to be the set of functions from application settings to the set A. Now watch closely: a function in context from A to B is just an ordinary function from A to Pref(B). In other words, you give it a value from the set A, and it gives you back another function that maps from application settings to the set B.

As confusing as that might sound, a function whose range is another function is really just a function of two parameters, except that it takes its parameters one at a time! Take a minute to convince yourself of this. The conversion between these two equivalent ideas is sometimes called “currying”. So by changing the range of our function, we actually effectively added a new parameter, and it now receives the application settings as a parameter. Remember that except for being inconvenient (we’ll deal with that later), that’s exactly what we wished for.

Functioning With Uncertainty

This is perhaps the most obvious example of all. Our third type were those that represent non-determinism: instead of one specific answer, they have many possible answers. This is easy enough: for each set A, define P(A) to be the power set of A, whose members are themselves sets of values of A. Then a non-deterministic function from A to B is just an ordinary function from A to P(B).

Functioning With Destruction

Our final trick is to deal with functions that have destructive effects. Here we’ll need to be a bit more elaborate in constructing a new range: for each set A, we define IO(A) (standing for input/output, which captures the notion of effects that interact with the rest of the world). An element of the set IO(A) is a list of instructions for obtaining a member of A. It is not a member of A, merely a way to obtain one, and that procedure might have any number of observable effects.

Now we play the same trick and change the range: a destructive function from A to B is just an ordinary plain old mathematical function from A to IO(B). In other words, if you give me an A, then as a plain old function I can’t actually do the steps to get a B, but I can certainly tell you what they are.

But what about composition? It’s great to be back in the world of plain functions, but remember what got us here in the first place? We liked functions because we liked composition; but it seems we’ve now lost composition! If I have a possibly failing function from A to B, and another from B to C, well now I’ve turned them into functions from A to Err(B) and then B to Err(C). Those function domains and ranges don’t match up, and I can’t compose them!

Oh no…

Hold Your Horses, Heinrich Kleisli to the Rescue!

Well, all is not lost. I just haven’t yet told you how to compose these “special” functions.

Because some math dude found these things before us, we call our “special” functions by a name: Kleisli arrows. There are two things going on here at once, so keep your eyes open: first, Kleisli arrows are just plain old ordinary functions, but with weird-looking ranges. Since they are just functions, you can compose them as functions, and that’s just fine. But at the same time, they are “special”, and we can compose them as Kleisli arrows, too.

Remember what we decided earlier? The right way to think about composition is by talking about a category. Sets are a category, and that’s fine if you want plain function composition. But now we want a new kind of category, too. It’s called the Kleisli category. If you don’t remember what all the parts of a category are, take a second to review them. To define a category, I need objects, arrows, identities, and composition.

To keep things simple, the objects in this new category will be the same: they are just sets of things.
The arrows in this category are, unsurprisingly, the Kleisli arrows.
I haven’t told you yet what the identities and composition look like, so let’s do that next.

First, we look at failure. We’re given a failure Kleisli arrow from A to B, and one from B to C. We want to compose them into a Kleisli arrow from A to C. In other words, we have an ordinary function from A to Err(B), and a function from B to Err(C), and we want one from A to Err(C). Take a minute to think about what to do.

The central idea of error handling is that if the first function gives an error, then we should stop and report the error. Only if the first function succeeds should we continue on to the second function, and give the result from that (regardless of whether it’s an error or a success).

To summarize:

If g(x) is an error, then (f · g)(x) = g(x).
If g(x) is a success, then (f · g)(x) = f(g(x)).

To complete the definition of a category, we also need to decide about the identity Kleisli arrows. These are the ones that don’t do anything, so that if you compose them with any other Kleisli arrow, it doesn’t change the other one. Identities are functions from A to Err(A), and it turns out these are just the functions f(x) = x, just like for sets. Notice that means they never return an error; only a successful result.

I’ll run more briefly through the remaining three examples, but I encourage readers who aren’t yet clear on how this will work to write them out in more detail and use this opportunity to become more comfortable with defining a second category of Kleisli arrows.

Next we have Klesli arrows for dependence, which are functions from A to Pref(B). Recall that adding the Pref to the range is equivalent to adding a new parameter for the application preferences. The key idea here is that if I have two functions that both need to know the application preferences, I should give the same preferences to both. Then composing two of these Kleisli arrows just builds a new function that gets the extra preferences parameter, and passes the same one along to the two component parts. And identities? A Kleisli identity will get that extra preferences parameter, but will ignore it and just return its input anyway.

The Kleisli arrows for uncertainty, or non-determinism, are functions from A to P(B), the power set of B. The key idea for non-determinism is that at each stage, we want to try all possible values that might exist at this point, and collect the results from all of them. So the composition calculates the second function for each possible result of the first, then the resulting possibilities are merged together with a set union. The identities, of course, aren’t really non-deterministic at all, and just return one-element sets containing their input.

Finally, Kleisli arrows for destructive effects are functions from A to IO(B). The key idea here is to combine instructions by following them in a step-by-step manner: first do one, then the next. So the composition writes instructions to perform the first action, look up the second action from the result, and then perform the second action, in that order. A Kleisli identity here is just an instruction to do nothing at all and announce the input as the result. So for each of the four motivating examples, we created a new category, the Kleisli category.

These new categories have their own function-like things, and related ideas of composition and identities, that express the unique nature of each specific problem. By using the appropriate notion of composition in the right Kleisli category, you can solve any of these long-standing computer programming problems in a nice composable way.

And that’s why you should care about monads.

Monads?!? Oh yes, I should mention that we’ve just learned about monads. We simply forgot to use the word.

What’s This Got To Do With Monads?

This section is for those of you who want to know how the stuff we said earlier are related to monads as they are understood in mathematics. If you open Wikipedia, or most category theory textbooks, and look up monads, they won’t look very much like what we just did. You’ll see something about an endofunctor, and two natural transformation, and properties about commuting triangles and squares.

We haven’t talked about functors at all, much less natural transformations… so how could we have possibly learned about monads? It turns out there’s more than one way to describe monads. The one we’ve just gone through is an entirely valid one. The shifts we made to the ranges of our functions earlier — Err, Pref, P, and IO — are actually examples of monads. To make sure they are monads in the conventional math way, we’d have to work pretty hard: first, prove that they are functors. Then build two natural transformations called η and µ, and prove that they are natural. Finally, prove the three monad laws.

But wait, there’s an easier way! Heinrich Kleisli, whom we’ve already met from the categories earlier, pointed out that if you can build a category like the ones we did in the last section, whose arrows are just functions with a modified range, then your category is guaranteed to also give you a monad. That’s quite convenient, because as computer programmers, we tend to care a lot more about our Kleisli arrows than we do about a mathematician’s idea of monads. Remember, those Kleisli arrows are exactly the modified notion of functions that we were already using, long before we ever heard a word about category theory! And Kleisli tells us that as long as composition works the way we expect with our Kleisli arrows (namely, that it’s associative and the identities act like identities), then all that other stuff we’re supposed to prove to show we have a monad just happens for us automatically.

Still, it’s an interesting side question to look at the relationship between the two. I won’t give all the details, but I’ll give the structure, and then leave the interested reader with some familiarity with category theory to fill in the proofs of the relevant properties. We’ll use Err as our monad, just to pick a specific example, but nothing here is specific to Err.

We start with Err, which is already a map from objects to objects. But the traditional definition of a monad also requires that it be a functor. That is, given a function f from A to B, I need a way to construct a function Err(f) from Err(A) to Err(B). I do it as follows: in the underlying category (not the Kleisli category, just the category of sets), I find an identity function from Err(A) to Err(A). Then I find a Kleisli identity from B to Err(B). I compose that Kleisli identity in the underlying category with f, and get a function from A to Err(B). I can now do a Kleisli composition of the identity from Err(A) to Err(A) and the function from A to Err(B), and get a function from Err(A) to Err(B). That’s the one I’ll call Err(f).
Next, I need a natural transformation η, from the identity functor to Err. This is easy: the components of η are the Kleisli identities.
Finally, I need a natural transformation µ from Err² to Err. To get the component of µ at A, I take the identity functions in the underlying category from Err (Err A) to Err (Err A), and then from Err A to Err A, and I combine them with Kleisli composition to get a function from Err (Err A) to Err A. This is the component of µ.

The construction in the opposite direction is easier. Given a monad Errwith ? and µ, the Kliesli category is constructed as follows.

The identities are just the components of η.
Given a function f from A to Err(B) and a function g from B to Err(C), I compose the two as µ · Err(g) · f.

Again, the details and the proofs of the appropriate monad and category laws are left to the reader. I hope this brief aside has been useful. I now return to using the word “monad” but talking about monads via Kleisli categories.

Joining The Monadic Revolution

Once again, let’s pause to sum up.

Computer programmers like to work by composing some things together, which we call functions.
They aren’t functions in the obvious way… but they do make up a category.
Actually, they are functions after all, but only if you squint and change the ranges into something weirder.
The category that they form is called a Kleisli category, and it’s basically another way of looking at monads.
These monads / Kleisli categories nicely describe the techniques we use to solve practical problems.

It’s not just about those four examples, either. Those are typical of many, many more ideas about programming models that can be described in the same framework. I think it’s fair to sum up and say, at this point, that someone interested in studying and analyzing programming languages and models should be familiar with some ideas from category theory, and with monads in particular.

But still, what about the humble computer programmer, who is not designing a new language, is not writing research papers analyzing programming languages, but just wants to solve ordinary everyday problems? That’s a fair question. As long as monads remain just a mathematical formalism for understanding what computer programmers mean by functions, the practicing computer programmer has a good claim to not needing to understand them.

It’s becoming clear, though, that monads are on their way into practical programming concerns, too. In the past, these Kleisli arrows, the modified notions of “function” used by computer programmers, were built into our programming languages. Functions in C used a Kleisli arrow, and C++ functions used a different one. The language specification would tell us what is and what is not possible using a function in this language, and if we wanted something different, too bad. Maybe once a decade, we’d make the swap to a brand new programming language, and bask in the warm rays of some new language features for a while.

The Past: Error Handling

Consider the Err monad, which gave us functions that might fail and report their failure in structured ways. Modulo some details and extensions, this is basically structured exception handling. Looking to history, programmers worked without exception handling in their programming languages for many years. Of course, languages like C are all Turing complete, and can solve any possible computational problem, proper error handling included. But we don’t apply categories to think about possible computations; categories are for thinking about composition. Without exception handling in the notion of a “function” that’s provided by languages like C, programmers were left to do that composition by hand.

As a result, any C function that could fail had to indicate that failure using a return value. In many cases, conventional wisdom built up saying things like “return values are for indicating success or failure, not for giving back answers”. Coding conventions called for most if not all function calls to be followed with if statements checking for failure, and the resulting code was borderline unreadable. This was the heyday of flowcharts and pseudo-code, because no one expected to be able to understand real code at a glance! In reality, though, programmers only checked for errors when they thought they was possible, and a lot of errors went undetected. Programs were often unreliable, and likely untold billions of dollars spent on extra development work and troubleshooting.

What was the reason for this? It’s quite simple: the C programming language and others of its time provided an insufficient kind of Kleisli arrow! If their Kleisli arrow had included the functionality from the Err monad we defined above, this could have been avoided. But the notion of what a function means in C is fixed, so the answer was to deal with it, and eventually migrate to a different programming language, rewriting a lot of software, and likely costing another untold billions of dollars.

The Present: Global Variables and Context

What about the Pref monad, and others like it? As discussed earlier, this is about defining computations in a larger context of available information and state of the world.

In the past, we had global variables, the slightly more modern equivalent of just storing information at a known place in computer memory. Quick and dirty, but even 30 years ago, programmers knew they were the wrong answer, and wouldn’t be manageable for larger programs. Object oriented programming tried to alleviate the problem a little, by having functions run in a specific “object” that serves as their context, and that was implicitly passed around at least within the implementation of the object itself. To get this, everyone effectively had to change programming languages to get a better Kleisli arrow again. But even so, object-oriented languages don’t give a perfect answer to this problem.

The Near Future (/ Present): Purity, Effects, Parallelism, Non-Determinism, Continuations, and More!

This point is about the future, but I’ll start out by pointing out that everything here is already possible, but just requires an appropriate choice of programming language!

One current challenge for the computer programming community is finding effective ways to handle parallelism. Ironically, while past examples have focused on the problem of putting too little power into a language’s Kleisli arrow, the problem this time is too much! Plain (also known as “pure”) functions present lots of opportunities for parallelism. When code is executed in parallel, it may run faster, or if the parallelism is poorly designed it may even run slower, but in any case it will certainly still give the same answer. But when the Kleisli arrow incorporates destructive updates, that is no longer the case. Now parallelism is risky, and might give unexpected or incorrect results due to so-called race conditions.

We can’t just remove destructive updates from a language’s Kleisli arrow, though. A program that has no observable effects at all isn’t useful. What is useful is the ability to separate the portions of code that perform destructive update from those that just compute pure functions. So for the first time, we need a language with more than one kinds of Kleisli arrow, in the same language!

There is already at least one language that offers precisely this. Programmers in the Haskell language can build their own monads, and work in the Kleisli category of a monad of their choosing. The programming language offers a nice syntax for making this approach readable and easy to use. If something might fail, you can throw it in Err. If it needs access to the application settings, throw it in Pref. If it needs to do input or output, throw it in IO. Haskell web application frameworks and similar projects start by defining an appropriate monad with the appropriate features for that kind of application.

Another current trend in the computer programming community is toward building more domain-specific programming models. The language Erlang became popular specifically for providing a new programming model with advantages for parallelism. Microsoft’s .NET framework incorporates LINQ, which offers a programming model that’s better for bulk processing and querying of collections of data. Rails popularized domain-specific languages for web applications. Other languages offer continuations as a way to more easily build specify computations in a more flexible way. All of these are examples of working in new and different Kleisli arrows that capture exactly the model appropriate for a given task.

It comes down to this: If we believe that there is one single notion of “function” that is most appropriate for all of computer programming, then as practical programmers we can find a language that defines functions that way, and then forget about the more general idea of monads or Kleisli arrows as a relic of theoreticians. But it’s not looking that way. The programming community is moving quickly toward different notions of what a function means for different contexts, for different tasks, even for different individual applications. So it’s useful to have the language, the tools, and the intuition for comparing different procedural abstractions. That’s what monads give us.

Abstraction Over Monads

Using a language with a choice of monads offers some other advantages here, too. It gives us back our abstraction. In Haskell, for example, it’s possible to write code that is applicable in multiple different monads. A surprising amount of the programming done with one monad in mind actually has meaning in very different monads! For example, consider the following Haskell type:

sequence :: Monad m => [m a] -> m [a]

What this means is that for any monad, which we’ll call M, sequence converts from a list of values of M(A) into M(List(A)), the monad applied to lists themselves. Let’s take a minute to consider what this means for each of our four examples. For Err, it takes a list of results that might be failures, and if any of them are failures, it fails; but if not, then it gives back a list of all the results. It’s basically a convenient way to check a whole list of computations for a failure. For Pref, it takes a single set of application preferences, and distributes that to everything in the list, giving back a list of the results. For the power-set monad, P, it would take a list of sets, and give back a set of all the ways to choose one item from each set. And for IO, it takes a list of instruction cards, and gives back the single card with instructions for doing all of them in turn. Amazingly, this one function, which had only one implementation, managed to make sense and do something useful for all four of our examples of monads!

Along with a choice of monads comes the ability to abstract over that choice, and write meaningful code that works in any monad that you do end up choosing.

Between all of these forces, I predict that within the next ten years, software developers will be expected to discuss monads in the same way that most developers currently have a working vocabulary of design patterns or agile methodologies.

Beyond Monads: More Categorical Programming

While most of this has been about monads, I don’t want to leave anyone with the impression that monads are the only influence of categories in computer programming. All of the following ideas have found their way into programming practice, mostly (so far) within the Haskell programming language community because of its flexibility and a deep academic culture and history.

Monad transformers are a powerful technique for combining the effects of more than one monad to build rich and powerful programming models.
Functors and applicative functors (a.k.a. strong lax monoidal functors for mathematicians) are weaker than monads, but more widely applicable.
Other kinds of categories that are not Kleisli categories can often be defined and composed to solve specific problems. Freyd categories are also useful.

I’ll stop there, but only as an encouragement to look more into the various abstractions from category theory that programmers have found useful. A good starting point is the (Haskell-specific) Typeclassopedia by Brent Yorgey. That’s just a door into the many possibilities of applying category-based ideas and intuitions in computer programming.

But I hope I was able to convey how these ideas aren’t just made up, but are actually the natural extension of what computer programmers have been doing for decades.