Monads

I have been doing some pretty serious Scala coding these days, all of which really come from working with Apache Spark on this project or that. From the very start, a term that I have not encountered before (in 12 years or so of building things) began to surface in surplus: monads.

Now researching about monads have been a relatively bad experience. Here are a list of sources that I have studied and have found them insufficient one way or another:

  1. Wiki Article on Monads
  2. Stack Overflow - Haskell
  3. Haskell Wiki
  4. More Monad and Haskell
  5. Mark Toberdorf’s Paper
  6. Python and Monad
  7. Monads, Haskell and Pictures
  8. Multi-Part Monads
  9. Maybe Monad and Javascript
  10. Monads and C++

And the list goes on, with (10) perfectly summarising the problem at hand:

I don’t know if I’m exaggerating but it seems like every programmer who gets monads posts a tutorial about them. (And each post begins with: There’s already a lot of monad tutorials on the Internet, but...)

Here is the thing: It doesn’t matter how many people who understand monads write about them. At the time of writing this piece, I was still having trouble understanding them. It is not that the above expositions are poorly written. On the contrary some of them are quite enjoyable and beaming with individual creativity (see 7, which is excellent!). What I find most unsatisfying is that the following questions remained unanswered for me either because the authors fail to address them or they were answered sans the authority that come from unifying the various angles about this subject.

  1. What do people mean when they mean X is a monads? They seem to mean something that is too aberrated from each other for the word to have a single focused meaning.

  2. What is the connection with the category theoretical monad in mathematics? There is a perfectly good definition in category theory, and if I am worth a damn at what I do, I better be able to explain the connection to this seemingly unrelated concept.

  3. What is the unifying idea that summarizes all the monad examples related? Maybe, List, State, Future, IO, etc etc. They behave overtly differently as code constructs (esp. IO and List). How are IO or List (as monads) a “programmable semicolon”?

  4. What problem do monads solve, specifically?

More than that, this is intentionally meant to very mathematical. We want an unified understanding that only comes into focus when we bring more mathematical machinery — those of categories, functors, and natural transformations — into the picture. We apologise here that any such choices would render an otherwise practical note on the subject into one that is rather uselessly verbose.

A Note About Notation

Most pseudocode is in the Scala syntax. I find that, without its more arcane notations, Scala is rather straight forward to read, and for a typed language that isn’t Haskell, about which there have been numerous complaints, it is the most succinct at dealing with notational demands of functional programming.

I do accept the criticism that it is meaningless to talk about monad in a language known for imperative elements. After all, Seq("a").map {s => sidesEffects(); s} is not particularly “monadic” since s => sideEffect(); s is not actually a “morphism” in the functional universe. However, if any language, divorced from its implementations, can be functional in that it is equipped with not only all the functional features but also readable ones, Scala is certainly right for the job, and its support for the functional features in this lengthy exposition is more than sufficient for our purposes: it has clear type-delineations, and very compact notations for anonymous functions.

Finally, those unfamiliar with Scala should not fear. The following code block demonstrates all the Scala language features essential to understanding the remainder of this post. You may find the embedded comments helpful:

// this is a comment

// foo is a function that accepts two arguments; 
// one of type S and one of type T, returns an 
// instance of type U
def foo(s: S, t: T): U = ...

// bar is a function with the same signature 
// as foo
val bar: (S, T) => U = foo

// currying
val baz: S => (T => U) = {
  // takes some s and returns a function which 
  // takes an instance t of T and returns 
  // foo(s, t)
  s => foo(s, _)
}

// generics/templates: [U] means of class U
// morp is an instance of a List of type U, and
// it is obtained by transforming (via map) a 
// list of a single item t of type T via the 
// curried baz(s)
val morp: List[U] = List[T](t).map[U](baz(s))

What do people mean when they say M is a monad?

Monads as chainable, flattenable wrappers

Most prevalent explanations of monads seem to have one thing in common: monads are wrappers, augmenters, extensions, containers that expose some kind of “flattening” and “chaining”. In essense, a monad is a generic class Monad[?] that exposes the following interface patterns

  • Monad[T].wrap - construct an instance of Monad[T] from a type T. I will sometimes write this as wrap in the case that the context can be inferred.

  • Monad[S].map[S] - takes a method with signature T => S and a method with signature Monad[T] => Monad[S], sometimes written as map, especially if the context can be inferred

  • Monad[Monad[T]].flatten - take an instance of Monad[Monad[T]] and get an instance of Monad[T], or flatten if the context is

Such that for any foo of signature T => S,

  • Naturality of wrap For all instance t: T:
// wrapping t then mapping foo into it is the
// same as applying foo first, and wrapping that
Monad[T].map[S](foo)(wrap(t)) == 
    Monad[S].wrap(foo(t))
  • Naturality of flatten For all instance dblWrapped: Monad[Monad[T]]:
// the two operations on dblWrapped are the same:
// 1. flatten dblWrapped and then map foo to it
// 2. flatten the result of lifting foo to dblWrapped
map(foo)(flatten(dblWrapped)) == 
    flatten(map(map(foo))(dblWrapped))
  • Distributivity of map over flatten For any instance triWrapped: Monad[Monad[Monad[T]]]]:
// the two operations on dblWrapped are the same:
// 1. flatten dblWrapped and then map foo to it
// 2. flatten the result of lifting foo to dblWrapped
flatten(flatten(triWrapped)) == 
    flatten(map(flatten)(triWrapped))
  • Right inverse of flatten For any instance wrapped: Monad[T]:
// The following are all equal to wrapped itself
// 1. mapping wrap to wrapped and then flattening 
//    the result
// 2. apply wrap to wrapped directly, and then flatten
flatten(map(wrap)(wrapped)) == 
    flatten(Monad[Monad[T]].wrap(wrapped)) ==
    wrapped

In literature (especially of the Haskell persuasion), the method wrap is often referred to as return, while for the method foo: T => Monad[S]

def bind(f: T => Monad[S], m: Monad[T]) = flatten(map(f)(m))

is oftened referred to as bind. The definition very common amongst the Haskell literature is that a monad is any type that emits bind and return. This definition is equivalent to our notion, since flatten can be recovered by bind(identity[Monad[T]]).

As the name wrap may suggest, when I code up monads in practice, I think of them as predominantly wrappers. However, those familiar with the Maybe monad in Haskell may take issue with that, since Nothing is an instance of Maybe that can’t be easily thought of as wrapping anything in particular. Hence “predominantly”. However, the intuition is mostly correct. Monads conjure up the idea of containers because in most of the cases, container types such as List, Set are monads.

However, if one were to ask for the most general understanding, monads are adapters that exposes the above methods as a way to substitute a more semantically appropriate type for the original one.

For example, Option[T] (the Scala equivalent of the Haskell Maybe) is used in the place of T whenever null may occur during the course of some computation purported to return T. Using Option is in many ways better than checking for null explicitly because Option allows for compiler-time checking that the null case is handled appropriately and promotes flexibility by allowing the null-case to be handled at a time of the developer’s choosing.

The flexibility afforded the Option[T] to handle the null case is really attributed to the monadic map. The method handles the null-case within the abstraction layer, allowing the developer to work with an instance of Option[T] as if it were that of T, deferring any null-case handling till necessary.

In general, a more semantically appropriate type is always preferred to a lesser one, souped-up with additional code because whenever a library or a language enforces semantics at a syntactic level, the compiler can then make semantic verifications, which would otherwise be only possible during runtime; the compilable code is thereby more robust or would not pass muster at the compilation stage.

Understanding monad as semantic modifiers of types make sense for Maybe, List, and Future. How about for IO and State? In this way, monads in practice actually bear two meanings. One as containers, and the other way as vessels for functional execution. I must admit that this is one major gap for me to overcome, that the term “monad” is really a theoretical construct that unifies two distinct patterns in the practice. To see how these two patterns are related requires a theoretical viewpoint. Before writing at length about that, let me talk about the second code pattern found only in functional programming and attributed — rightly so — to monads.

Monads as “programmable semicolons”

Pure functions have no side effects; that is, pure functions are referentially transparent in that any expression can be replaced by its value without altering the behavior of the programme. For example, the following is referentially transparent:

// the following is the same as 
//
//   "hello world!"
//
({x: List[String] => x.reduce(_ + _)})
    (List[String]("hello", " world", "!"))

while the following is not

// the following is NOT the same as 
//
//   val str = "hello world!"
//
// because of the output to stdout, 
// the HTTP request to google to
// search for "yodel", and writing the 
// response string to stdout
List[String]("hello", " world", "!").reduce {
  x, y => 
    println(x, y)
    Http(url("http://bit.ly/1TfvCeN")
         OK as String).foreach(println)
    x + y
}

How can you modify state that is external to the application (such as via IO functions) and still maintain referential transparency? (Is this even possible for applications that have external dependent processes whose states can also be affected by other sources?) One way to enforce RT is by mandating that a book-keeping object (say, an IOTranscript type) persist for the duration of the application to represent the external state change affected by the application.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// now we are (sort of) referentially 
// transparent! If you find the following 
// code ugly, you're meant to...
val kitchenSink: Tuple4[String, 
                        IOTranscript, 
                        HTTPCall, 
                        IOTranscript] =
  List[String]("hello", " world", "!").fold(
      ("", 
       IOTranscript.empty, 
       HTTPCall.empty, 
       IOTranscript.empty)) {
    x, y => 
      val stdoutXY: IOTranscript = 
        PrintLn(x._1, y._1)
      val googleCall: HTTPCall = 
        new HTTPCall("http://bit.ly/1TfvCeN")
      val stdoutGoogleResponse: IOTranscript = 
          PrintLn(googleCall.response)
      (x._1 + y, 
       x._2 ++ stdoutXY, 
       x._3 ++ googleCall, 
       x._4 ++ stdoutGoogleResponse)
  }
    
val str: String = kitchenSink._1

Barring the obvious problems — that the code doesn’t handle failures gracefully, that the code is monstrous because there’s all this extraneous IOTranscript and HTTPCall transcript objects that we don’t much care about, not to mention all this extra code that we would have to write to wrap dispatch.Http and println — there is one major flaw with this: assignment like the one we just made on line 14 are not strictly referentially transparent!

If one tries to eliminate the assignments by (say) moving their construction into the returned tuple on line 20, one would find that we cannot get around the fact that the instance of HTTPCall is needed to create the second PrintLn. While it is possible to eliminate all assignments in the previous code block, some nontrivial refactoring seems to be in order.

However, a much cleaner way to achieve this would be to design an abstraction that:

  1. keeps State during all those IO calls
  2. encapsulates transcript objects except the ones we care about
  3. we can use to maintain referential transparency
  4. looks a lot cleaner

That abstraction layer is IO (and more generally, the State monad):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// how's this for a slice of fried gold?
val strIO: IO[String] =
  List[String]("hello", " world", "!").fold(
      IO[String].wrap("")) {
    x, y => 
      x.flatMap[IOTranscript] {
        // now I'm an IO[IOTranscript]
        _ => x.map(PrintLn(_, y))
      }
      .map[HTTPCall] {
        // now I'm an IO[HTTPCall]
        _ => new HTTPCall(
          "http://bit.ly/1TfvCeN") 
      }
      .map[IOTranscript] {
        // now I'm an IO[IOTranscript] again
        googleCall => 
          PrintLn(googleCall.response)
      }
      .flatMap[String] {
        // and now I'm flattened from 
        // IO[IO[String]]
        _ => x.map(_ + y)
      }
  }

What just happened? and how is that referential transparency? Fried gold or not, the code looks cleaner, and it seems to encapsulate those darn transcript objects almost to the point of obscurity. As for referential transparency, the map and by extension flatMap methods of the IO know how to keep a record of all the return values of the function arguments. At each iteration of map/flatMap, the IO object is recording state via the transcript objects, and the resulting object strIO is functionally equivalent to each of its composite IO invocations, and thereby preserving the functional purity of the program. While the hand-wavy colloquy above is not a “proof” of referential transparency, readers who seek rigor are now equipped with enough to verify this for themselves.

(What if I tell you that Scala out of the box provides sugaring for the above:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
val strIO: IO[String] =
  List[String]("hello", " world", "!").fold(
      IO[String].wrap("")) {
    x, y => 
      for {
        printXY <- x.map(PrintLn(_, y))
        googleCall <- 
          new HTTPCall(
            "http://bit.ly/1TfvCeN") 
        printSearchResult <- 
          PrintLn(googleCall.response)
      } 
      yield printSearchResult.flatMap { 
        _ => x.map(_ + y) 
      }
  }

It is quite a neat language.)

In this way, the methods map and flatMap are binding together what would otherwise be imperative statements. Thus originates a second prevalent way of thinking about monads: as programmable semicolons.

In many ways, the pattern is quite elegant. However, the elegance is framed in a restrictive programming paradigm. The imperative programmer would ask “why do this to yourself?” The merits of functional programming are well-known, and deserve no more mention here. Any one interested in the topic can find plenty of references online and on amazon.

Is there some conceptual connection between these two views?

The two views: a wrapper and the programmable semicolon. How do we connect them? It is difficult to say that they are part of the same phenomenon or that they share a conceptual ancestor. It has been suggested to me quite vehemently that the semicolon notion is ultimately the unifying idea; that at the root of it, whenever we are using monads, we are really doing one thing, and that is to define the semantic context under which operations are performed. In the case of State or IO, this is especially apparent. In the case of List, this is less so, but a bit of thinking on it would almost convince me that there is a semantic context in this case, and it is that of iteration. Whenever we use List, we are in some sense saying to the executor: “How you execute these functions (via map) is to perform them on each object in this collection.”

In this sense, wrap is a way to embed some object into that context, and flatten is a way to realize nested contexts as a (less nested) context. If we look at the wikipedia article on this subject, the phrase

to build pipelines that process data in steps, in which each action is decorated with additional processing rules provided by the monad

captures this idea well. Especially regarding monads like List or Option, where their map method is enabling this construction of sequences of steps by defining how each step should be applied to the underlying objects, if any.

What problem do monads solve specifically?

We have already seen that monads arose naturally as an elegant way to allow for IO operations in pure functional programming languages. However, what about imperative languages?

I cannot take credit for this answer, which is really one that my coworker Brett provided. Paraphrasing, “monadic pattern is useful for designing libraries.”

Thinking on it, it makes a lot of sense. For languages that support generics (otherwise — in a language like Go, say — one cannot reasonably speak of this patterns), it is really useful to design data structures in the form of a monad. I’m merely speculating here, but the notion of a monad must have played a key role in the design and implementation of the Apache Spark project, where the RDD abstract class implements parallelize (the wrap equivalent), map, and flatMap to specifically leverage the power of monads.

We have definitely seen how monads simplify code. However, I have more than just succinctness in mind. Instead, I want to speculate and analyze what monads provide along the following dimensions:

Delegating Control

Library that strikes a good balance between conceding and allowing control are often robust without diminishing reuse. Let us look at the Java Collection library for example, in the following buggy code to prune “bad names” out and replicate “good names”:

1
2
3
4
5
6
7
8
9
10
11
List<String> names = getLotsOfNames(somehow);
for(int i = 0; i < names.size(); ++i) {
    String name = names.get(i);
    if (goodName(name)) {
        // uh oh, infinite loop???
        names.add(name);
    } else if (badName(name)) {
        // concurrent modification??
        names.remove(name);
    }
}

There are two problems: infinite loops and the infamous concurrent modification exception. But what caused these problems? What we really want to do is operate on the element itself, and not the list. Making changes to the list is what one might have thought as a first pass, permitted by accessing the methods themselves. If, instead, the library allowed only the map and flatten (rolled up as flatMap) then the code may go something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
List<String> names = getLotsOfNames(somehow);
names = names.flatMap(
    // using the Java 8 Lambda expression,
    // not as clean as I would have liked
    (String name) -> 
        goodName(name)? 
            new ArrayList<String>(
                Arrays.asList(
                    new String[] {name, name})) :
            badName(name)? 
                new ArrayList<String>() : 
                new ArrayList<String>(
                    Arrays.asList(
                        new String[] {name}));

Efficiency aside, the code is now more robust, because only the control to access the elements are needed, and that level of access is encouraged. This doesn’t prevent you from modifying names during the flatMap stage, but in most of the use cases, the developer’s first instinct is to write element-level functions rather than attempting to modify the iterated list.

Promoting readability

As this is my personal speculation, I hesitate to speak without qualifications: don’t believe this if you have good evidence not to. Code is more readable if there are fewer forks. Code is more readable if the forks are more local (within the same method, or definitely within the same file). We have already pointed to the fact that Option defers handling of the null case (and even ensure that it is handled), which maximizes readability by allowing for the most convenient place to handle such a branching.

What about other monads? The List monad’s map can be thought of as functionally equivalent to:

1
2
3
4
5
6
7
// applying func to the collection list: List<T>
func => 
    val result: List<T> = List()
    for (i <- list) {
        result += func(i)
    }
    result

There isn’t any trouble with this version per se, except that there are lines here that deal specifically with iterating, which can be replaced with a single method that speaks to the function of the code.

To moderate this point-of-view, the map-flatten-wrap pattern often require the developer familiarity. The vocabulary is at times subtle — what does it mean to map an Option? In truth, much of the streamed-line feeling to monads are due to the developers themselves who must expend sometimes not an insignificant amount of energy to structure or refactor their code to fit the pattern. Whether or not this effort is worthwhile is subject to the discretion of the engineer.

Simplifying Reasoning

This is equally speculative as the last: what makes for ease of reasoning is domain isolation. What I mean by this is, if two pieces of code do exactly the same thing, but one requires the developer to keep less of the logic in her mind, then that code is easier to reason about. This definition is rather naïve, but allows me to make my point: monads simplify reasoning by allowing the developer to substitute an object for a semantically appropriate wrapper, but still operate on the wrapper as if it were the original object.

The advantage is twofold. First, semantically more appropriate wrappers simplify reasoning because the quirks of working with the original type is now encapsulated within the type information. Second, because the wrapper abstraction layer is thin and that it allows for a nearly transparent way to work with the original type, one needs only to deal with the logic of processing the underlying object.

As an example, consider the scenario of retrieving an object from a map, first without Option:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// retrieves grades from some kind of record
Grade getGrade(String name, Semester semester) 
    throws NoSemesterGradesException, NoSuchStudentException
{
    Map<String, Record> records = getAllGrades();
    if(grades.containsKey(name)) {
        Record record = records.get(name);
        if record.grades.containsKey(semester) {
            return record.grades.get(semester);
        }
        throw NoSemesterGradesException(semester);
    } 
    throw NoSuchStudentException("No such student: " + name);
}

...
// processing the record
try {
    Grade sonGrade = getGrade("George-Michael Bluth", 
        new Semester(2009, "Fall"));
    process(record);
} catch (NoSuchStudentException|NoSemesterGradesException ex){
    alertClient();
}

and then with Option:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// this is Scala
def getGrade(name: String, semester: Semester): 
    Option[Record] = {
  grades: Map<String, Record> = getGrades()
  grades.get(name)
        .flatMap(_.grades.get(semester))
}

...
getGrade("George-Michael Bluth", 
  new Semester(2009, "fall")).match {
    case Some(record) => process(record)
    case None => alertClient()
  }

The two examples illustrate the main difference between the two approaches: one forcing the developer to deal with the possibility of keys not being present in a Map, and also with what happens when the key is not present. Without a type like Option, one is forced to handle the different cases explicitly, forcing the developer to reason not only about how to process a record to retrieve the grade, but also to handle the various pathological cases that may arise. The choice to throw exceptions adds additional complexity, since now the developer must also keep track of how to handle these exceptions whenever getGrade is called. In effect, the pathological cases in getGrade has escaped encapsulation of the method via the exceptions.

The semantic change to Option addresses these issues perfectly by pushing the responsibility of book-keeping the pathological cases to the compiler. Via flatMap, we need only to focus on retrieving the grade object from the record with some syntactic tweaks (made even simpler via for comprehension). The word “uncomplecting” comes to mind: Option uncomplected the getGrade logic from the pathological cases, and both the author of getGrade and the consumer of the method no longer need to know that the student’s record may be absent or the student’s grade for a particular semester may be missing.

What is the connection with the monad of category theory?

In the last few frames, I would like to explain the subject, mostly to myself and to those who are curious, what the connection is with the category theoretical monad. I was pleased to learn that there is actually a rather elegant adaptation of monads to homological algebra, which I will not be going into — though I will allow myself to say that it is an endofunctor on the suitable abelian category exhibiting the monadic laws. The following is, I would say, not technical, but may be more mathematically theoretical and require some prior familiarity with category theory.

The concept of category is developed around mid-20th century by the mathematicians Eilenburg and Maclane to formalize the relationship between geometric and topological objects and algebraic structures like modules of rings. They wanted to express the still nascent idea that geometry and algebra were actually intricately related by examining and comparing the properties of maps — functions that respected the underlying mathematical structures — between geometric and algebraic objects respectively. Their discovery would revolutionize mathematics, and the framework that they invented to present their results evolved into the present day abstract category theory, an important tool in the repertoire of many research mathematicians today. Later, Alonzo Church adapted the language of category theory to present the theory of \lambda-calculus, the theoretical beginning of the functional programming paradigm.

So what is a category? There are several very good expositions that answer the question better than I would hope to: wikipedia (free), MacLane’s Category Theory for a Working Mathematician, and Awody’s Category Theory. Roughly speaking, categories are classifications. A category is composed of objects and relations, where the relations are often interpreted as specific mappings from one object to another, and they satisfy very specific laws: closed under composition, associative, and equipped with units.

For example, one can define the category of numbers (where the objects are real numbers), and the relation between numbers x and y is the difference between them: x - y. Or the category of words (sequence of letters) on the English alphabet, where the relations between words A and B are words that begin with A and ends with B. In the latter example, there are infinitely many relations between “abc” and “cde”: “abcde” and “abccde” are two obvious examples. These examples are rather contrived, and in practice, examples abound. However most of these are beyond our scope.

To formalize the study of categories — how categories relate to each other, and what are the properties of those relations — category theorists introduced the notions of functors and natural transformations. The language is versatile enough to describe very concrete types of objects, but is also capable of capturing the mathematical essence of very abstract concepts to provide the backbone for modern algebraic geometry.

The application of category theory to computer science is in defining a suitable category of data types, i.e. a category whose objects are data structures, and whose morphisms between data structures S and T are functions with signature S \Rightarrow T.

The category of types must satisfy the following criteria

  1. Terminal Objects - There is a data type \mathbf{Unit} that all other type maps to uniquely.

  2. Closed under finite product - If S and T are types, then there is a product type subsuming S and T. (The word “product type” is actually derived from the notion of products of two objects in category theory.)

  3. Closed under exponents - If S and T are types, then there is a a type to represent functions with signature S \Rightarrow T.

The above criteria describes the defining characteristics of the so-called Catesian Closed Category. The type system that we use in practice for typed languages is usually a variation of this, subject to the natural constraints and limitations of compilers, interpreters and runtime engines. In practice, we would want objects like arrays and dictionaries, so modern type system are also equipped with a “arbitrary coproduct” requirement, so type systems for certain languages are also equipped with “sum types” — from the notion of sums or “coproduct”. (For example, an array of type T is actually an arbitrary sum type of all finite tuples of type T.)

What is the connection between math monads and code monads?

In the last section, we hand-wavingly discussed the category of types as the category whose objects are data types (a.k.a. classes) and whose morphisms are the single-input functions. To save on exposition, let \mathbf{Types} represent this category.

In category theory, a monad is a (covariant) endofunctor M: \mathcal{C} \to \mathcal{C} together with two natural transformations \eta: \mathrm{Id} \to M and \mu: M^2 \to M such that for any object A in \mathcal{C}:

  1. \mu_{A} \circ \mu_{M(A)} = \mu_{A} \circ M(\mu_{A}) as natural transformations from M^3 \to M

  2. \mu_A \circ \eta_{M(A)} = \mu_{A} \circ M(\eta_A) = 1_{M(A)}, where 1_{M(A)} is the identity natural transformation from M \to M

At the outset, it is unclear how the above relates to the monad commonly discussed in blogs and forums. However, let’s unravel the definition in the context of the category of types.

What is an endofunctor M on \mathbf{Types}? It an association from one type T to another type M(T) such that for all morphisms \varphi: T \to S, there exists a morphism M(\varphi): M(T) \to M(S) preserving composition, associativity and unity. In code-speak: a monad is a way to associate a type T with another type M[T] that is equipped with a way to “lift” a function S => T to M[S] => M[T]. This latter requirement is satisfied precisely by map, which is actually a function of type (T => S) => (M[T] => M[S]).

How about the bit about natural transformations? In \mathbf{Types}, the natural transformation \mathrm{Id} \to M are morphisms — one for each type Sw(S): S \to M(S) such that for each map \varphi: T \to S, the composition

w(S) \circ \varphi: T \to S \to M(S)

is equal to

M(\varphi) \circ w(S): T \to M(T) \to M(S).

Furthermore, the natural transformation M^2 \to M are morphisms f(S): M(M(S)) \to M(S) such that each morphism \varphi: T \to S the composition

M(\varphi) \circ f(S): M(M(S)) \to M(S) \to M(T)

is the same as

f(T) \circ M(M(\varphi)): M(M(S)) \to M(M(T)) \to M(T).

So what do these mean in terms of code? As mentioned in the previous section, morphisms are single-parameter functions. As such, saying that there are two natural transformations is the same as saying for each type T, there are specific single-parameter functions with signatures T => M[T] and M[M[T]] => M[T]. These will correspond to the wrap and flatten methods respectively. The fact that these are natural transformations means that the functions must satisfy some additional criteria, namely:

  1. for any given function foo: T => S, then the function map(foo)(wrap(_)) (of signature T => M[S]) is the same as wrap(foo(_)). This is the naturality of wrap.

  2. for any given function foo: T => S, map(foo)(flatten(_)) (of signature M[M[T]] => M[S]) is the same as flatten(map(map(foo))(_)). This is the naturality of flatten.

Finally, it is a matter of interpreting the two laws, which correspond exactly to the criteria that we have mentioned:

  • For any instance triWrapped: Monad[Monad[Monad[T]]]]:
flatten(flatten(triWrapped)) == 
    flatten(map(flatten)(triWrapped))
  • For any instance wrapped: Monad[T]:
flatten(map(wrap)(wrapped)) == 
    flatten(Monad[Monad[T]].wrap(wrapped)) ==
    wrapped

Afterword - Why is it so hard to get a straight answer about monads?

Having gone through all this math and code, I thought I’d reclaim a bit of my own space and speculate. I think the titular question is a tough one to answer. The following is a speculation at best. Monad is a very abstract concept. People who aren’t used to that level of conceptual abstractness may explain it poorly. Even if experts understand something very well, e.g. \infty-categories, they may not always be able to give a succinct and meaningful answer to questions explicitly and often implicitly asked of them about the subject.

Monads seem to be tangible because, thanks to Scala and Haskell, they have concrete representations that illustrate the fundamental elements and help demonstrate how specific monads may be implemented and used in practice. However, a fish is not a cod, and a monad is not a Maybe, or a IO, no matter how clearly these examples are explained. What the examples seek to illustrate is a general pattern that is then obfuscated by one or two odd examples.

The problem is further compounded by the fact that most colloquial understanding of monads seems specific to Haskell, and those who contributed to the tutorials to explain the subject for Haskell have a different audience in mind: someone who develops in Haskell need not be a type theorist, and definitely need not to have studied category theory. Fortunately for me, the classes in Haskell (Monad, Functor, etc) are consistent with their namesakes in type theory. Unfortunately, though, the tutorials rely on very different concepts than those familiar to their imperative counterparts.

Having finally had some handle of the subject, I believe the acute problem seems to be that monads in practice actually capture two outwardly different patterns: one of container/wrapper, one of semantic binding for functional expressions. There is a tenuous link between these two notions if one moves past what happens to the underlying data structures, but instead focus on the syntactic appearance. The true link is in type theory, which is rarely discussed (if at all) in any discussion of lambda calculus. The bridge being conspicuously absent in any programming discourse, any hope of having a coherent narrative for “monad” is hopelessly lost.