There are plenty of efforts around to teach coding in schools. Most of them focus on standard imperative programming languages: for example, Python, or JavaScript, or even Java (which is a horrible choice, but is entrenched due to its role in the Advanced Placement curriculum and exams). Most of these efforts don’t think much about functional programming.

Regular readers of this blog are probably familiar with functional programming, but for those who aren’t, you should understand that it’s really a rather different paradigm from most typical programming. It’s not just another syntax, with a few different features. Instead, it’s a whole new way of breaking down problems and expressing solutions. Basic ideas taught in the first few weeks of traditional computer programming courses – for example, loops – just don’t exist at all. And other really central ideas, like functions and variables, have a completely different meaning.

I’m not quite alone in teaching functional programming, though. Matthias Felleisen and Shriram Krishnamurthi started sizable effort to teach Scheme at the K12 level in the 1990s, and Emmanuel Schanzer created a Scheme/Racket based curriculum called Bootstrap, which is heavily based on functional programming. I’ve made the same choice, and for much the same reason.

In the end, while functional programming is very different from the mainstream of computer programming, it is very similar to something else: mathematics. Functions and variables in the functional programming world may mean something different from the same words in Python or JavaScript; but they mean the **same** thing as functions and variables in mathematics.

In fact, I never set out to teach “coding” at all! My goal is to teach mathematics more effectively. But mathematics education suffers from the weakness that students who make a mistake often don’t find out about it until days later! By them time, whatever confusion of ideas led to the error has long been forgotten. CodeWorld began as my attempt to get students to directly manipulate things like functions, expressions, and variables, and get immediate feedback about whether the result makes sense, and whether it does what they intended. For that purpose, a functional programming language is perfect for the job!

Even after the switch to functional programming, I still surprise a lot of people by telling them I teach middle school students in Haskell! Let’s face it: Haskell has a bit of a reputation as a mind-bending and difficult language to learn, and it sometimes even deserves the reputation. This is, after all, the programming language community with more Ph.D. students per capita than any other, and where people hold regular conversations about applying the Yoneda lemma to help solve their coding challenges!

But it doesn’t have to be! Haskell also has some advantages over almost anything else, for someone looking to work with tangible algebra and mathematical notation.

First of all, the language semantics really are comparable to mathematics. Haskell is often called *purely *functional, meaning that it doesn’t just enable the use of functional programming ideas, but in fact embodies them! By contrast, most other widely used functional languages are i*mpure*. In an *impure* functional language, a function is *actually* the same complicated notion of a procedure or recipe that it is in an imperative language, but it is conventional (and the language offers powerful features to help with this) to stick to a subset that’s consistent with mathematics, most of the time. That’s often a fine trade-off in a software engineering world, where the additional complexity is sometimes needed; but in education, when I tell a student that a function *is really just a set of ordered pairs*, I don’t want to have to later qualify this statement with “… except for this magical function here, which produces a random number.”

Even more importantly, basic syntax looks almost exactly like mathematics (or at least, it can). Bootstrap, for example, gets the semantics right, but looking through sample student workbooks, there’s quite a bit of “here’s how you write this in math; now write it in Racket.” By contrast, when teaching with CodeWorld, we’ve been able to effectively explain the programming language as a set of conventions for typing math **directly** for the computer. There are obviously still some differences – both at the surface level, like using * for multiplication and ^ for exponents, and at a deeper level, like distinguishing between variables and constructors on the left-hand side of equations. But in practice, this has been easily understood by students as limitations and tweaks in which math notation CodeWorld understands. It feels like a dialect, not a new language.

(It’s worth pointing out that Racket also includes a purely functional language subset that’s used by Bootstrap, though the syntax is different. Shriram Krishnamurthi has mentioned Pyret, as well, which among other nice properties closes some of the ground between Scheme and mathematics notation, at least for expressions. You still can’t just write “f(x) = x + 5” to define a function, though.)

So what about the mind-bending parts of Haskell? It turns out most of them are optional! It took some effort, but as I’ll mention later, I have removed things like type classes (including the dreaded monads) and many unnecessary uses of higher-order functions. What’s left is a thin wrapper around notation that students are already learning in Algebra anyway.

Of course, a programming language by itself isn’t a complete tool. You also need libraries! The next big decision was to base CodeWorld on the programming model of Ben Lippmeier’s Gloss library.

Gloss is an interesting choice on its own. The programming model is very simple. Everything is a pretty comprehensible mathematical thing. It’s probably too simple for sizable projects, and you could make the case that teaching it is letting down students who want to be able to scale their programming skills up to larger projects. But again, it has two advantages that I believe outweigh this concern.

First, it’s *tangible*. Outside of Gloss, much of the current thinking around building interactive applications in functional programming environments centers around FRP (Functional Reative Programming). FRP defines a few abstract concepts (“events” and “behaviors”), and then hides when they look like or how they work. Of course, strong abstraction is a foundation of software engineering. But it’s not a foundation of learning, or of mathematics! Indeed, Elm also recently (and probably with even less justification, given its less educational audience) dropped FRP in favor of tangible functions, as well. The advantages of concrete and tangible types that students can get their heads around are hard to overstate.

Second, again, this choice better supports building an understanding of mathematical modeling. In addition to it being easier for a middle school student to understand a value of type Number -> Picture, than the more abstract Behavior Picture from FRP (or the even more obtuse non-terminating while-loop of the imperative world), it also gives them experience with understanding how real phenomena are modeled using simple ideas from mathematics. Later programs are built using initial values and step functions, along explicitly bundled state. This gently starts to introduce general patterns of thinking about change in ways that will come up again far down the road: in the study of linear algebra, calculus, differential equations, and dynamical systems!

Of course, there’s a cost here. I wouldn’t point someone to Gloss for a real-world project. Even something as simple as a single GUI component can be complicated and fragmented, since students have to separately connect the state, initial value, behavior over time, and event handling. But the cost in encapsulation is most keenly felt in larger projects by more experienced programmers who can find this sort of plumbing work tedious. Typical introductory programming students still have a lot to learn from connecting these pieces and understanding how to make them work together.

Once I had Haskell and Gloss in place, the next big choice made by CodeWorld was to replace the Haskell prelude with a customized version. GHC, the most popular Haskell compiler, provides a lot of power to customize the language by making changes to libraries. This extends even to the meaning of literal text and numbers in the source code!

One reason for replacing the Prelude was to keep the complexity of a first working program as low as possible. For students who are just starting out, every word or piece of punctuation is an obstacle. Haskell has always done better on this front than Java, which requires defining a class, and a member function with a variety of options. But adding import statements definitely doesn’t fit the vision articulated above of the programming language as a thin wrapper around mathematical notation. So the modified Prelude puts all of the built-in CodeWorld functions in scope automatically, without the need to import additional modules. As a result, a minimal CodeWorld program is one line long.

A second reason for replacing the Prelude was to remove a lot of the programming jargon and historical accidents in Haskell. Some of this is so entrenched that experienced programmers don’t even notice it any more. For example, even the word “string” to denote a bit of text is a holdout from how computer programmers thought of their work in the mid 20th century. (CodeWorld calls the analogous type Text, instead, and also keeps it separate from lists.) Haskell itself has introduced its own jargon, which is confusing to students as well.

But the most important consequence of replacing the Prelude is that advanced language constructs, like type classes and monads, can be hidden. These features haven’t actually been removed from CodeWorld, but they are not used in the standard library, so that students who don’t intend to use them will not see them at all. This made more changes necessary, such as collapsing Haskell’s numeric type class hierarchy into a single type, called Number. Perhaps the most interesting adaptation was the implementation of the (==) operator for equality comparison, without a type class constraint. This was done by Luite, by inspecting the runtime representation of the values in the GHCJS runtime (see below).

Sometimes, it seems that the dogma of the functional programming language community (and Haskellers in particular) is that programmers are corrupted by imperative languages, and that a programmer learning a functional language for their first experience would have a much easier time. I haven’t found that to be 100% true. Perhaps it’s because even students with no prior programming experience have still been told, for example, to think of a program as a list of instructions. Or perhaps it’s something more intrinsic in the human brain. I don’t know for sure.

But what I do know for sure is that even with no previous experience, middle school students will gravitate toward imperative semantics unless they are carefully held back! Because of this, another choice made by CodeWorld, and one of the main differences from Gloss, is that it makes some changes to intentionally trip up students who try to think of their CodeWorld expressions as an imperative sequence of instructions.

One example of such a change: in Gloss, a list of pictures is overlaid from back to front. In CodeWorld, though, the order is reversed. Combining pictures, whether via the pictures function, or the & operator, is done from front to back. The reason is that as I observed students in my classes, I realized that many of them had devised a subtly wrong understanding of the language semantics: namely, that circle(1) was not a circle, but instead a command to **draw** a circle, and that the & operator simply meant to do one thing, and then the next, and the pictures ended up overlaying each other because of the painter’s algorithm. Because of this misunderstanding, they struggled to apply or understand other operations, like translation or rotation, in a natural way. After swapping the order of parameters, students who form such a hypothesis will immediately have it proven wrong. (The analogous mistake now would be to assume that & means to do the second thing first, and no student I’m aware of has made that error.)

A similar situation exists with colors. In Gloss, the color function changes the color only of parts of a picture that don’t already have a color! This means that the semantic model of the Picture type in Gloss is quite complex indeed. Instead of just being a visual shape, a Gloss Picture is a shape where some parts have fixed color, but others have unspecified color, and the color function operates on that value by fixing any unspecified bits to the given color. Indeed, the **most** sensible way to understand these values is in terms of the implementation: that the color function sets a current color in the graphics context, which is used for that subtree, but only if it’s not changed first. This is a leaky implementation! It is fixed by CodeWorld, where applying a color to a picture overrides any existing coloring.

Another change that helped a lot with this was to carefully remove the use of verbs for function names in the CodeWorld standard library. I observed verbs misleading students many times. Sometimes, they expected that use of a function would permanently change the value of its parameter. Other times, they even expected a function like rotate to turn a picture into an animation that keeps moving! The key idea they are missing is that functions are not actions, but rather just relations between values. Such relations are better (even if it’s sometimes awkward) described somewhere on a scale between nouns and adjectives, rather than verbs. The way the code reads after this change once again acts as a roadblock to students who try to build on an incorrect understanding.

Beyond the programming language and libraries, another important choice in CodeWorld was to strongly adopt the web as a medium. The first version of the platform in 2010 was a relatively early adopter of web-based programming tools! However, the execution model (using SafeHaskell to run student code in a trusted way on the server and stream frames to the client) was definitely doomed from the start. It was a hack, which worked for one class, but was hardly scalable.

Things got better with the advent of Haskell-to-JavaScript compilers. I built a first prototype of this in 2012 using Fay, but ultimately settled on GHCJS, which is just an amazing project. Now students get very capable code implementing complete games and other applications, all running locally in their browsers with very reasonable performance.

This decision was important for a few reasons. The first is compatibility and universal access. Schools have whatever devices they have access to: Chromebooks, bring-your-own-device plans, etc. Students themselves are constantly switching devices, or leaving theirs at home. Depending on a locally installed application – or saving student projects on a local disk – for a class at the middle school level would be a disaster. Because CodeWorld is all web-based, they can work from any system they wish, and have full access to all of their saved projects.

The second reason a web-based environment was important is that sharing is a huge part of student motivation. Because the CodeWorld server remembers all compiled code by its MD5 hash, students can send projects to each other simply by copying and pasting an appropriate URL into an email, chat message, or text message. It is difficult to express how helpful this has been.

Despite the advantages of the web, though, I am hoping to soon have export of student projects to mobile applications, as well. The development environment will remain web-based, but created applications can be installed as apps. It’s likely that someone will be working on this feature over the summer.

Another big decision made by CodeWorld, and hinted at already, was to often sacrifice traditional computer programming education for better mathematics. This has been done with a hodge-podge of small changes, such as:

- De-emphasizing programming concepts like abstraction, maps and folds, and higher-order functions, in favor of approaches like list comprehensions that look more like mathematics.
- Uncurrying all functions in the standard library. This is easily the most controversial decision I’ve made for the Haskell community, but it’s really just a special case of de-emphasizing higher order functions. After uncurrying, functions can always be written in standard mathematical notation, such as f(x) or f(x, y).
- The coordinate plane uses a mathematical orientation. Gloss’s coordinate plane looks like computer screen coordinates, with (0, 0) in the top left. CodeWorld’s plane puts (0, 0) at the center, and it orients the positive y axis to point up. These just match conventions.
- CodeWorld also rescales the coordinates so that the plane extends from -10 to +10 in both dimensions, rather than counting in pixels. This turns out to have been an amazing choice! It simultaneously allows students to do low-precision placement of shapes on the plane without multi-digit artithmetic, and introduces decimals for added precision. In the end, this combination better supports middle school mathematics than the alternative.

Another change here was originally an accident. CodeWorld, from the beginning, did not implement using any kind of image file in a program. Originally, this was because I hadn’t bothered to implement a UI for uploading assets to use in the web-based programs! But after teaching with it, I don’t regret it at all. I’ve had other teachers tell me the same thing. By giving students only geometric primitives, not images copied and pasted from the web, as the tools for their projects, they are more creative and work with a lot more mathematics in the process.

The final big decision on my list doesn’t pertain to the web site or tools at all, but is about the organization of classes. There are a lot of efforts out there to encourage students to learn to code. Hour of Code encourages teachers to devote an hour to programming activities and games. Many organizations are running day-long activities in Processing or Scratch or Greenfoot. Bootstrap started with once-a-week after school programs using Racket, and has scaled up from there. I’ve volunteered as a mentor and team lead for weekend hackathons by organizations like Black Girls Code.

These are great! I wouldn’t discourage anyone from jumping in and doing what they can. But in many cases, they seem to miss the opportunity for student creativity. There’s a tendency for a lot of organizations to create very guided activities, or shy away from anything that might get a student away off the beaten path. Early versions of the Bootstrap curriculum, for example, encouraged kids to build games, but designed a game from start to finish (in terms of generic words like the “player”, “target”, and “danger”), and give students limited creative choices in the process. (Bootstrap has since expanded into a more open-ended Bootstrap 2 curriculum, as well.) Hour of Code consists almost entirely of scripted activities that feel more like playing a game than building one, which makes sense because they are intended to be completed in an hour. The BGC hackathon mentioned above was limited to use of a drag-and-drop GUI design tools, and devoted more time to having students sit in presentations about startup business models and UX design than letting them create something impressive of their own.

So one way that CodeWorld has been different from many of these activities is that I’ve tried to plan from the very beginning of the course for students to decide on, design, and implement their own ideas from the ground up. Sometimes that means taking longer, and taking smaller steps. From the very beginning, projects in the class aren’t plugging bits into a designed program, but rather creating things of their own choosing, at the level students are capable of doing creatively from scratch at that point. It means that I don’t even start talking about games until halfway through the class. But I think it’s important to let students dig in at each step and express themselves by creating something that’s deeply and uniquely theirs. Along the way, they spend a lot more time tinkering and trying out things; even trying out different possible overall organizations of their programs!

I think CodeWorld has been very successful at this. When students in CodeWorld create their own games, they really create their **own** games. They work differently, and have different designs.

Here are a few examples from various classes, all written by students between 12 and 14 years old:

- Gnome Maze Use WASD keys to help a gnome navigate the maze and find the gold.
- Donkey Pong One player uses W and S, the other uses the up and down cursor keys. Hit the ball back and forth.
- Dot Grab One player uses WASD, and the other uses the arrow keys. Race to eat the most dots.
- Yo Grandma! Save an old lady in a wheelchair from various hazards by dragging attachments onto her wheelchair.
- Jacob the Fish Help Jacob dodge sushi and eat minnows, and avoid becoming a snack for an even larger fish
- Knight-Wizard-Archer A twist on rock/paper/scissors, with fantasy characters
- Popcorn Cat Drop the cat to eat the popcorn, but dodge dogs

]]>

]]>

One slot in Summer of Haskell this year will specifically be chosen based on CodeWorld. If you plan to submit a proposal for CodeWorld, please feel free to contact me with any questions, concerns, or for early feedback. I’ll definitely try my best to help you write the best proposal possible. So far, I’m expecting three to four CodeWorld proposals that I’m aware of.

Summer of Haskell is a program by the Haskell.org committee to encourage students to spend the summer contributing to open-source projects that benefit the Haskell community. That encouragement comes in the form of a stipend of US$5500. More details are at http://summer.haskell.org.

The Haskell.org committee will choose a number of student projects based on their impact to the Haskell community. As part of this, one project will be chosen specifically relating to CodeWorld, and funded by CodeWorld maintainers.

It’s up to you, but I believe you should submit a proposal if:

- You are eligible (see the bottom of the Summer of Haskell info page).
- You are willing and available to take on an essentially full-time commitment for the summer.
- You have a realistic idea you’d like to work on to benefit the Haskell community.

Yes! Here are things you should keep in mind:

- Propose a project with immediate impact on real people. “If you build it, they will come” doesn’t work here. Unless you have an extremely good reason, don’t propose to build something speculative and hope people will just like it so much that they adopt it. Point to real people who already want this, and who will already be users and will find their lives better if and when it’s completed.
- Demonstrate that you understand the task. Provide enough detail to convince us that the project is feasible. A reasonable and concrete timeline with at least rough deliverables is a good idea. Poorly defined projects with a low probability of success are often not good fits for this format.
- Show that you are already becoming a part of the community you’ll be working with. Are you familiar with the project you’re proposing to contribute to? Do core people in the project and/or the Haskell community know who you are? Have you discussed your ideas with people already involved in the project? Do you know someone who would be your mentor?

You can browse successful projects from last year. There’s also some good advice by Edward Kmett in an old mailing list thread.

]]>

Are you a student, and interested in helping to build a platform for education in expressive mathematics and computer science? Want to work on a project with immediate impact teaching Haskell in multiple schools? Please propose a project at https://summer.haskell.org/ between now and May 6th.

A great source of CodeWorld project ideas is the bug tracker. Less well-defined projects are tagged as proposals, while more defined features are tagged as enhancements. A few big ones to think about are:

- Export of CodeWorld projects as mobile applications
- Better and more language-aware editor support for Haskell in CodeMirror.
- Implementing constructive geometry
- Building social, gallery, and/or showcase features to help student work be more visible.
- Building a purely functional block-based programming environment.
- Implementing visual tools to help students understand substitution, list comprehensions, and more.

I look forward to working with someone this summer building something cool!

By the way, HUGE thanks to Edward Kmett and other Haskell.org committee members for making this happen this year!

]]>

Empoder is a non-profit organization established to teach computer science to underprivileged kids. They are looking for volunteers to help with a coding club called Empower Girls Through Code, at Graham Middle School in Mountain View, CA. This will be 25 to 30 girls, of middle school ages. The club is led by a teacher at Graham, and we have some teaching assistants already there to help. Empoder would like a couple more volunteers, to make sure there are enough people to give one-on-one help when it’s needed.

- Who: A teacher, some fellow TAs, you, and 25 to 30 middle school girls excited about learning to code.
- Where: Graham Middle School, Mountain View, CA
- When: Wednesdays, 7:50 to 9:20 am, starting January 27
- Why: Because it’s amazing… easily the most fun thing I have ever done.

The class will use CodeWorld, a web-based programming environment using a dialect of Haskell. But you don’t need to know that to volunteer. We can all learn together.

Hope to see you there! If interested, email marissa.yanez@empoder.org.

]]>

Originally, I started this project using Haskell and the excellent gloss package, by Ben Lippmeier. CodeWorld has been moving slowly further and further away from the rest of the Haskell community. This has happened in a sequence of steps:

- Way back in 2011, I started “CodeWorld”, but at the time, I called it Haskell for Kids. At the time, I understood that the reasons I’d chosen Haskell as a language were not about cool stuff like type classes (which I love) and monads and categories and other commonplace uses of solid abstractions (which fascinate me). Instead, I chose Haskell for the simple reason that it looked like math. The rest of Haskell came with the territory. I built the first CodeWorld web site in a weekend, and I had to settle on a language and accept all that came with it.
- From the beginning, I made some changes for pedagogical reasons. For example, gloss defines rotation to be clockwise. I insisted on rotation working in the counter-clockwise direction, because that’s the convention universally used in math. Later, I resized the canvas to 20×20, so that typical programs would need to use fractions and decimals, which is a middle school math education goal. I made thes changes, even though they broke compatibility with a widely used package. Sorry for anyone that’s struggled with this.
- I rebranded “Haskell for Kids” as CodeWorld, and stopped explicitly depending on gloss in favor of just reproducing its general approach in a new Prelude. This was a deliberate attempt to get away from focusing on the Haskell language and libraries, and also to the accompanying import statements and such. This hid the ways that Haskell was a general purpose language with uses outside this toy environment. That is unfortunate.
- I rewrote the Haskell Prelude, to remove type classes. Along the way, I collapsed the whole numeric type class hierarchy into a single type, and even got Luite (the author of GHCJS) to help me with some deep black magic to implement equality on arbitrary Haskell types without type classes. This threw away much of the beauty of Haskell… in favor of dramatically improved error messages, and fewer things you need to know to get started. It was a real loss.
- Finally, I commited the unforgivable sin. I dropped curried functions, in favor of defining functions of multiple parameters using tuples. This finally makes CodeWorld feel like a completely different language from Haskell. That really sucks, and I know some people are frustrated.

First, I want to point out some things that are *not* the reason for any of this:

- I did
*not*do this because I think there’s something wrong with Haskell. I love type classes. I love currying, and especially love how it’s not just a convenient trick, but sometimes introduces whole new perspectives by viewing tedious functions of multiple parameters as simple, clean, and elegant higher-order functions. - I also did
*not*do this because I think anyone is incapable of learning full-fledged Haskell. In fact, I taught full-fledged Haskell to middle schoolers for a year. I know they can do it.

So why did I do it? Two reasons:

- Teaching mathematics has always been more important to me than teaching Haskell. While Haskell is an awesome programming language, mathematics is just an awesome perspective on life. For every student who benefits from learning an inspiring programming language, many students will benefit from learning that humanity has a method called mathematics for thinking about fundamental truths in a systematic, logical way that can capture things precisely. So any time I have to choose between pushing students further toward their math education or away from it, I’ll choose toward it.
- Details matter. Even though I know kids are capable of a lot, they are capable of a lot more without artificial obstacles in their way. I learned this the hard way teaching this class the first time. The smallest little things, with absolutely no great significance as a language, matter a lot. Having to put parentheses around negative numbers obscures students from reaching leaps of understanding. Confusing error messages mean the difference between a student who spends a weekend learning, and one who gives up on Friday afternoon and doesn’t think about it until the next school day. Different surface syntax means that a lot of kids never fully make the connection that functions here are the same thing as functions there.

In the end, I do think these were the right decisions… despite the frustration they can cause for Haskell programmers who know there’s a better way.

A couple weekends ago, though, I worked on something to hopefully restore some of this loss for Haskellers. You see, all the changes I’ve made, in the end, come from replacing the Prelude module with my own alternative. Specifically:

- I deliberately replaced functions from the Prelude with my modified versions.
- Because I provided an alternative Prelude, I had to hide the
*base*package, which made it impossible to import things like Control.Monad. This was*not*a deliberate decision. It just happened.

So I fixed this. I added to the *codeworld-base* package re-exports of all of the modules from *base*. I renamed *Prelude* to HaskellPrelude in the process, so that it doesn’t conflict with my own Prelude. And finally, I added a new module, CodeWorld, that exports all the really *new* stuff from CodeWorld like pictures, colors, and the interpreters for pictures, animations, simulations, etc. The result is that you can now start your programs with the following:

import Prelude()

import HaskellPrelude

import CodeWorld -- If you still want to do pictures, etc.

main = putStrLn "Hello, World"

At this point, you can write any Haskell you like! You aren’t even constrained to pure code, or safe code. (The exception: TemplateHaskell is still rejected, since the compiler runs on the server, so TH code would execute code on the server.)

In fact, it’s even better! You’re free to use GHCJS JavaScript foreign imports, to interact with the browser environment! See a brief example here. Now you’re out of the sandbox, and are free to play around however you like.

Right now, the CodeWorld module still uses uncurried functions and other CodeWorld conventions like Number for numbers, etc. There’s no reason for this, and it’s something that I should probably change. Anyone want to send a pull request?

]]>

Here are some things I intend to change in the near future. A more complete list is on the project issue tracker, but this is a summary with more details and reasoning about some of the changes.

An important goal of this project is to align with a standards-based U.S. middle school math education, as much as possible. To be clear, I still refuse to add complexity or turn the project into a patchwork of specific lessons that promote a specific narrow path of learning. First and foremost, this should be an environment for tinkering and encountering ideas in self-motivated way. But given alternative designs that could each be valid on their own, I’ll choose the one that pushes students toward the math standards.

It’s sometimes a tough line to draw. But I’ve become convinced that there are a few places where I can do better. Two of those are going to be major breaking changes, coming soon.

Haskell’s convention of currying functions is the wrong default for CodeWorld. Practically all of mathematics, especially at introductory level, is carried out with the notation *f*(*x*,*y*) = … . The interpretation is that a function of two parameters is a function whose domain is a product – a set of ordered pairs. The Haskell language makes a different choice. Applying a function to two parameters is more like *f*(*x*)(*y*) (the parentheses are optional in Haskell itself), and the interpretation is that *f*(*x*) denotes a partially applied *function* that’s still waiting for its second parameter.

If the goal were to teach about higher-order functions, there would be lots of great arguments for the latter. If the goal were convenience, you could argue for the latter pretty persuasively, as well. I think Haskell’s use of currying is great. But when the goal is to let students encounter and tinker with things they will see in school math, the right choice is to adopt the convention of mathematics.

Luckily, the assumption of curried multi-parameter functions isn’t baked into Haskell too deeply. By changing the standard library, it’s quite possible to write *f*(*x*,*y*) just as well. The parentheses on *f*(*x*) become optional, but this is actually true of mathematics in general (for example, operators in linear algebra are often written without parentheses, as are trig functions). I will adopt the convention of using parentheses around even single function parameters.

The only big source of awkwardness comes with binary operators. So long as we choose not to teach the notations `foo` (for turning a function into an operator) or (+) (for turning an operator into a function), this doesn’t come up much. Notably, sections still work fine, since they take only one argument.

A couple convenient side effects of this choice are nice, too:

- Students who routinely write parentheses around function arguments less often find themselves forced to surround negative numbers in parentheses for weird parsing reasons. As trivial as it might seem, this was a very real and significant learning obstacle the last time I taught the class, and I’ll be happy to see it go.
- Getting expression structure wrong sometimes gives much better error messages this way. It’s harder to accidentally mix up precedence between an operator and function application; and passing too few arguments to a function gives a clear error rather than inferring a function type and breaking in an obscure way elsewhere.

The second big change is to resize the canvas from 500×500 to 20×20.

The justification for a 500×500 canvas was generally about confusing pixels – little dots on the screen – with the general idea of a coordinate system. It’s convenient to blur the distinction at first, but it has in the past become a barrier to understanding the full nature of the coordinate plane with real (or even rational) coordinates. Many students were confused when later faced with fractional coordinates. At the same time, developing a full understanding of the rational number system is a big topic in 6th, 7th, and 8th grade mathematics, so it would be great to ask students to do more tinkering with these numbers.

By replacing this with a 20×20 grid (x and y coordinates ranging from -10 to 10), several goals are accomplished:

- Students early in the class are working with numbers in a range they can comprehend better.
- Students routinely work in fractions or decimals to fine tune their projects.
- The abstract coordinate plane, including fractional coordinates, becomes more familiar.

This is a big win overall.

On the less controversial side, I’m planning a number of changes to make the site more usable:

- Pervasive auto-complete, based on a pre-populated list of the standard library symbols as well as parsing the student code for declared names.
- More complete documentation, tutorials, and better examples. I admit that the current site is grossly lacking in documentation. I don’t envy anyone who tries to figure it out on their own!
- Better tools for playing around with results. At the very least, students will be given the chance to scroll, pan, and see coordinates of points in pictures, animations, and simulations.

I also have my wish list for things I’d love to see possible, but am not quite ready to build yet. This includes:

- Social features: sharing projects with friends, commenting on or expressing support for other projects.
- Collaborative projects with shared editing or exporting libraries for others to use.
- Better debugging tools, such as easy controls to move forward and back in time, fast-forward, pause, etc. for animations, simulations, and even games.
- Possibly grading features for teachers to grade projects and provide a scoring rubric and comments.

What else would you like to see? Let me know in the comments here, on codeworld-discuss, or by filing a feature request in the issue tracker.

]]>

- Doing their own original, creative work, instead of following instructions or reaching set answers.
- Getting instant feedback 24 hours a day, so they can tinker and learn in a self-directed way.
- Building confidence by working on their own ideas, inspiring pride and excitement.
- Experiencing how concepts from geometry, algebra, and physics can be springboards for creativity.
- Becoming creators, rather than just consumers, of technology.

That’s a lofty set of goals, but it was very successful. In the 2011-2012 school year, I taught a small class of six students, two to three hours per week. We had an awesome time. They built their own computer games throughout the year. We struggled together, worked our way through, and finished the school year with an awesome expo where the students showed off their work to local technology professionals and participated in a question-and-answer panel about their experiences. It was fascinating listening to this, because a few patterns arose:

- Students didn’t really think of what they were doing as math. This remained true, even when the skills they learned involved describing the behavior of systems using equations, functions, and variables; describing complex shapes in terms of geometry, the coordinate plane, and rotations, translations, and scaling; coming to grips with the meaning of probability and randomness; etc.
- The students who entered the year being “good at technology” weren’t necessarily the most likely to succeed. Talking to these students broke all of the stereotypical molds about computers and technology! Students took to the activity and wildly succeeded were very often girls, and had previously thought they were more the art-and-music type.

At the end of that year, I had plans to teach this program in multiple schools the following school year. Unfortunately, things then got a little sidetracked. I started a new job at Google over the summer, moved to California, and dropped the program. The web site that students had used to build their projects fell into disrepair, and stopped working. I stopped doing anything about it.

Over the last week and a half, though, that’s changed! CodeWorld is back!

The CodeWorld web site is (as always) at http://www.codeworld.info.

Any web browser will do, but you really need to use the latest version of whatever browser you choose. If you’ve been putting off upgrading Internet Explorer, it’s long past time!

You’ll also want a Google account. You can log in using your Google account, and save your programs to Google Drive. Because your programs are saved to the cloud, you can use the web site from any computer you like, even computer labs in a school, and your programs will follow where ever you go.

Using the web site is simple. Type your program on the left. Click Run to see it work on the right. You can sign in to open your existing projects and save your projects. You can also get links to share your projects with others. There are sample projects along the bottom of the screen, including Yo Grandma!, a game written by Sophia, one of my students from the original class.

Unfortunately, instructions on how to write the programs are still mostly missing. If you already know the language, a link to the generated documentation might help. Otherwise, hold on! Once the programming environment is stable, I plan to put together a comprehensive progression of exercises, tutorials, and examples.

Under the hood, I mostly recreated this from scratch, throwing away most of the original project from a few years ago. This new version of the environment has a lot of advantages: it runs your programs on your own computer, so your program runs a lot faster. It’s less restrictive. And I completely customized the language to make a lot of things simpler and easier to understand.

Changes:

- The programming language for CodeWorld is called Haskell. Haskell is an awesomely mathematical language, but parts of it are also notoriously complex. The new incarnation of CodeWorld still uses Haskell, but goes a lot further to hide the rough edges. In particular, you’ll rarely see any classes, and there’s an obvious type for most things (e.g., all text has the type Text, and all numbers have the type Number.)
- Previously, CodeWorld was based on a library called Gloss for the Haskell programming language. Gloss is great, and I saved as many ideas from it as I could. But CodeWorld is now its own library. This let me clean up some terminology, align the meaning of programs more closely with the goals of algebraic thinking and math concepts, and work with the simplified version of the language.
- The biggest change to how the web site works is that your programs now run on your own computer, instead of on the web server. This is using an awesome project called GHCJS, which converts the Haskell program into JavaScript, which is understood by web browsers.

I’ll try to keep posting here as I have learning material ready to use with this tool. Stay tuned!

]]>

For anyone new to the ideas, I should review a little about what CodeWorld is. CodeWorld is the name for the curriculum I built and taught last school year. It has the goal of teaching abstract and mathematical reasoning skills through computer programming. Throughout the course of the year, students use a web site to write descriptions of pictures, then animations, and finally video games, and run them in the web browser to immediately see the results.

I taught a pilot program last school year in a neighborhood school, and it was a huge success. I have high hopes of getting something going again in the future.

Many people have noticed that CodeWorld stopped in its tracks some time during the summer. What ended up happening was a fairly big life change for me: I’ve changed jobs (now working for Google on YouTube!), moved (to San Francisco from Colorado), and have been focused on personal changes for a while. Unfortunately, that meant I had to cancel the plans to teach CodeWorld at *three schools* in Colorado this school year. I hope, though, this is only a temporary setback! I still have high hopes for the future.

In terms of technology, CodeWorld is built on several things:

- The Haskell programming language. This is a language that’s very well suited for CodeWorld because it expresses things in a way that’s very declarative and consistent with the mental models students should bring into algebra and other abstract mathematics classes.
- The Gloss library. This library, originally developed for introductory college programming classes, is a great fit for many of the same reasons. It’s based on Haskell, and again does everything declaratively. It’s also very concise, not needing a lot of wordy boilerplate to get something quickly on the screen.
- The SafeHaskell feature of GHC. This allows the server to safely compile and run code written by students without worrying that it might delete critical files and such.

In some ways, the set of technology chosen for CodeWorld is perfect. Specifically, it let me get a web site up and running in about a week of work, right before the start of the class last school year. That saved a lot of hassle, and made the class a lot more successful. In other ways, though, each of these choices is lacking in a few ways.

- Haskell is a great choice of language, but it’s very difficult to target Haskell to run in web browsers. For this reason, CodeWorld actually runs students’ programs way over in a data center, far from their computers. This worked and was easy to do (great!), but it also means programs run very slowly, and the server gets bogged down if it’s used by too many people at once (not so good).
- Gloss is the closest thing we have to a perfect choice. Even that, though, has its limitations. Because Gloss has a number of users who want to do more advanced things (like use it to show off parallel processing APIs in Haskell), there are some technical pieces in the back end that are difficult to fit in with a web browser as well. Also, mainly due to backward compatibility, there are inconsistencies in the API (circle to draw the outline of a circle versus rectangleWire to do the same with a rectangle, for example), inconsistencies with what students will see in math classes (angles are clockwise, for example, in the official implementation), and lost opportunities to make small tweaks that will more clearly emphasize declarative thinking. We definitely want something
**like**Gloss, but there are a few details that it would help to tweak in some incompatible ways. - SafeHaskell turns out to be problematic for a variety of reasons. Aside from the performance problems from running on the server in the first place, it’s really only a small part of a solution to the problems of running untrusted code. For example, it has no protections against using excessive resources and slowing down the server until it’s unusable.

So I’ve been evaluating a new technical path. This new approach uses Chris Done’s language called Fay, which is a subset of Haskell specifically designed to run in web browsers. It has all of those advantages of Haskell that I mentioned above, but works better in a web browser. I’ve been porting the Gloss library over to Fay, but taking liberties to modify some of the inconsistent behaviors and lost opportunities as I go. SafeHaskell then becomes entirely unnecessary, as the code is no longer running on the server. It’s running in the student’s web browser instead, where JavaScript is already thoroughly sandboxed by companies that put lots of resources into the problem.

This technology migration isn’t done yet, and there’s still a lot of little syntax stuff that’s frustratingly unsupported by Fay… but I’m working on it, helping out the Fay project a little but, too, and it’ll get there!

In addition to changing technologies, I’ve been putting a lot of thought into what I want the future of the system to look like. I am looking at how to make the platform more appealing when I am not personally living right by the school and able to pop over and help with any students who are having problems.

One side of this is definitely a redesign of the web site. The site needs examples, it needs to look more finished, and it needs documentation! I’m working on a mock-up of the future web site, and while my web design skills are definitely lacking, I think I’m happy with the concept so far. You can take a look, if you want, and let me know what you think.

Another side actually was started over the summer, and that’s putting together a comprehensive teacher’s guide, including worksheets, exercises, and organization of the content. This needs to be more than just a web site; it needs to be an organized approach to presenting the ideas and guiding students through the learning process. I’m strongly considering making a somewhat involved video series to present the concepts, as well.

The third and final part of this is putting the pieces in place for a sustainable community of people working together. I don’t have any specific plans here, yet… but I’ll just say that I see positive things ahead. I’m now working with tens of thousands of brilliant people, many of whom care very much about education, in a company that encourages its employees to experiment with technology in ways that aren’t focused on just one product. I’m also within a short drive of the Khan Academy, and plenty of other groups that care a lot about math education.

Some things I’d like to try here:

- Getting together students who build things with CodeWorld to meet and share their creations.
- Helping students make the transition from developing things for “a school project” to sharing them with friends and others, and possibly even offering their creations in various app stores or markets. (Student creations in CodeWorld actually tend to be fairly similar to the casual game market for smartphones and such…)
- Creating more resources for teachers who want to incorporate this into their schools. I’d love to hold seminars and such for teachers! Please ask me to do this, once the pieces are in place!

So that’s where CodeWorld is, and where it’s hopefully going. I’m still very excited about the future of this effort, and looking forward to making some more progress here soon.

]]>

*Match made in heaven? Or abstraction distraction?*

If you’re a software developer, have you heard about monads and wondered what they were? Have you tried to learn Haskell, and struggled with them? Wondered why people worry so much about them? Have you watched the videos from Microsoft’s “Channel 9” and heard a bunch of researchy Microsoft folk talk about them, but had trouble relating them to your day-to-day programming experience?

Or if you’re interested in mathematics, have you heard murmurs in the past about how category theory interests computer science people? Looked for some clear statement of why we care, and what problems we might be interested in? Wondered if it’s really true *at all*? Perhaps you are like a friend of mine (and a first-rate algebraist, too, so it’s entirely reasonable to have these questions) who asked me about this a year or so ago, remembered hearing a lot of excitement in the early 90s about category theory and computer science, but never heard whether it had really panned out or was a dead end?

These are the kinds of questions I begin with. My goal is to demonstrate for you, with details and examples:

- Where category-based intuition and ideas, and monads in particular, come from in computer programming.
- Why the future of programming
*does*lie in these ideas, and their omission in today’s mainstream languages has cost us dearly. - What the state of the art looks like in applying category-based ideas to problems in computer programming.

If you’re coming into this without a knowledge of category theory, never fear; this may be one of the gentlest introductions to the idea of categories and monads that you will find. But you’ll want to slow down and take a moment to understand the definition of a category and related ideas like function composition; these are absolutely crucial. Then you want to completely skip or just skim through the section called “What’s This Got To Do With Monads?” where I tell you how what we’re talking about here relates to the traditional math meaning of monads. Don’t worry, you don’t need to know that *at all*.

On the other hand, if you’re a mathematician, you may want to skim the bits where I review basic category theory, and just dig in where I am talking about the computer programming perspective. Just be forewarned, my introduction to monads will be via Kleisli categories, so take a minute when we get to that part and make sure you’re familiar with how the relationship works out.

Ready? Here goes!

Quick quiz: Do computer programmers use functions?

Ask any computer programmer you know, and you will hear: YES! Functions are some of the most basic tools computer programmers use. Then you’ll get odd looks, for asking such a silly question. *Of course* computer programmers use functions. That’s like asking if carpenters use nails! Right?

The truth, though, is a bit more complicated. To a mathematician, a function is just an association of input values to output values… and *that is all*! Any two functions that associate the same input values to the same output values are the same. Yes, you can represent functions by formulas (sometimes, anyway), but you can also represent them with just tables of inputs and outputs, or if they are functions between real numbers, as graphs. If you ask computer programmers for examples of functions, though, you will start hearing about some pretty bizarre things. I call these the “I must have skipped that day of calculus” functions. These are things that computer programmers are quite happy referring to as functions, but that to a mathematician are not really functions at all!

- “Functions” that return randomly chosen numbers… and if evaluated several times, will give a different answer each time.
- “Functions” that return one answer on Sundays, but a different answer on Mondays, yet another on Tuesdays, and so on.
- “Functions” that cause words to appear on some nearby computer screen every time you calculate their values.

What’s going on here? Most computer programmers go about their lives happily calling these things functions, but really they are something different. But wait a second! They do have quite a lot in common with functions. Namely, they have: (a) parameters, representing their domain; and (b) return values, representing their range. (Many computer programmers are happy to talk about functions that have no parameters, or no return values… but there’s no need to be overly picky here. We can just regard their domains and ranges as one-element sets, so that no actual information is conveyed, but we can keep up appearances.)

Even more importantly, these “functions” share one more thing with the functions of mathematicians: they are constantly being *composed* by taking the result from one function and passing it along as a parameter to another function. When I say composed, I mean it almost exactly in the basic mathematics sense of function composition: (*f* · *g*)(*x*) = *f*(*g*(*x*)). In fact, the whole reason our “functions” exist at all is to be composed with each other! Once upon a time, in the early days of computers, we liked to keep track of information by just sticking it in known places in the computer’s memory; but all this shared knowledge about where to find information made it hard to write parts separately and fit them together, so we mostly switched to this idea of functions and composition instead.

Here’s the executive summary so far:

- When computer programmers talk about functions, they do
*not*mean exactly what mathematicians do. - What they
*do*mean is the idea of having*inputs*(domains),*outputs*(ranges), and most importantly*composition*.

So in the previous section, we ended up with our hands full of things that sort of look like functions. They have domains and ranges, and they can be composed. But at the same time, they are *not* functions in the mathematics sense. Baffling? No, not really. Mathematicians deal with stuff like that a lot. They have a name for systems of function-esque things of exactly that form. That name is… cue the drumroll, please… *CATEGORIES*!

In math-speak, categories are:

- collections of “objects” (you should think of
*sets*), - and “arrows” (you should think of
*functions*between sets), - where each arrow has a
*domain*and a*range*, - each object has an “identity” arrow (think of the identity function, where
*f*(*x*) =*x*) - and arrows can be
*composed*when the domains and ranges match up right.

Before we agree to call something a category, we also throw in a few rules, such as if you compose any function with an identity, it doesn’t actually change, and composing functions obeys the associative property. These should be unsurprising, so if they seem strange to you, please take a moment, grab a pencil, and try working it out using the definition of function composition earlier: (*f* · *g*)(*x*) = *f*(*g*(*x*)), and simplifying.

The nice thing about categories is this: it’s not just some pointless abstraction that a bunch of mathematicians made up. Categories are defined that way because people have looked at literally hundreds of things that all look sort of like functions with domains and ranges and compositions. Things from algebra, like groups and rings and vector spaces; things from analysis, like metric spaces and topological spaces; things from combinatorics, like elements of partially ordered sets and paths in graphs; things from formal logic and foundations, like proofs and propositions. Almost without fail, they can be described using all the ideas we just looked at! In short, categories are the* right* intuition for talking about composing things with domains and ranges, which is exactly the situation we’re in.

Now you can see why categories come into the picture: they are the right intuition for things that maybe aren’t functions, but can be composed like functions. But just because a category exists doesn’t mean it’s worth talking about. What makes this worth talking about is that the category-related ideas aren’t just there, but actually express common concerns for computer programmers.

It’s now time to get a little more specific, and introduce the four examples that will guide us the rest of the way through this exploration. Each example highlights one way that the “functions” used by computer programmers might be different from the functions that mathematicians talk about. These examples represent actual kinds of problems that computer programmers have run into and solved, and we’ll look more at the practical side of them later. For now, we’ll just be happy getting familiar with the general ideas.

The first problem is failure. Computer programmers do lots of things that might fail. Reading from files (they might not exist, or on a computer with more than one user, they might not be set to allow you to read them), talking over the internet (the network might be broken or too slow), even just doing plain old calculations with a large amount of data (you might run out of memory). Because of this, dealing with failure is a constant concern.

In general, in modern computer programming tools, it’s always understood that a function might fail. You may get an answer, but you also may get a reason that the task could not be completed. When that happens, programmers are responsible for dealing with it responsibly: letting someone know, cleaning up the leftover mess in computer memory from a half-complete task, and just otherwise putting the pieces back together. A major factor in programming techniques or tools is how easy they make it for programmers to cope with the constant possibility of failure.

The second problem is dependence on outside information. While functions of mathematics are nice and self-contained, computer programmers often don’t have that luxury. Computer programs are messes of configuration. Even simple mobile phones have pages and pages of settings. What language does the user speak? How often should you save off a copy of their work? Should you encrypt communication over the network? Rare is the application today that doesn’t have a “Settings” or “Preferences” menu item. In many other contexts, too, computer programs depend on information that is a sort of “common knowledge” throughout the application, or some part of the application.

Ways of dealing with this have progressed through the ages. When everything was stored in well-known memory locations anyway, it was easy enough to just look there for information you need; but that led to problems when different parts of a program needed different information and sections of programs could step on each other’s toes. The massively influential technique known as *object-oriented programming *can be seen as partly an attempt to solve exactly this problem by grouping functions into a context with information that they depend on. The simplest and most flexible answer would be to just pass the information around to all the functions where it is needed… but when that’s a lot of places, passing around all those parameters can be very, very inconvenient.

The third problem is uncertainty, also known as non-determinism. A normal function associates an input to an output. A *non-deterministic* function associates an input to some number of *possible *outputs. Non-determinism is less well-known than the first two problems, but possibly only because it hasn’t yet seen a convincing solution in a general purpose language! Consider:

- Theoretical computer science talks about non-determinism all the time, because it’s the right approach for discussing a lot of computational problems, ranging from parsing to search to verification. That language just hasn’t made its way into the programming practice.
- Non-determinism comes up when querying, searching, or considering many possible answers. These are precisely the places that programmers end up relying on a variety of domain specific languages, ranging from SQL to Prolog, and more recently language-integrated technologies like LINQ.
- Even with specialized languages for heavy-duty querying and search tasks, we still end up writing a
*lot*of our own nested looping and control structures for the purpose of looking through possibilities when it’s not worth crossing that language barrier. This kind of thing is responsible for some of the more complex code structures you find these days.

While the first two problems of failure and dependence are at least partly solved by current mainstream programming languages, non-determinism is as yet solved mostly by special-purpose sub-languages, with LINQ as the notable exception.

Finally, the fourth problem is destruction. Evaluating a math-type function is observable only in that you now know the answer. But in computer programming, functions can have permanent effects on the world: displaying information, waiting on responses from other computers or people, printing documents, even quite literally exploding things, if they are running on military systems! Because of this, things that aren’t specified in mathematics, like the order in which evaluation happens, matter quite a lot here.

The destructive nature (by which we just mean having effects that can’t be undone) of computer programming functions has plenty of consequences. It makes programming more error-prone. It makes it harder to divide up a task and work on different parts simultaneously, such as you might want to do with a modern multi-core computer, because doing the parts in the wrong order might be incorrect. But at the same time, these destructive effects are in a sense the whole point of computer programming; a program that has no observable effects would not be worth running! So in practically all mainstream programming languages, our functions do have to cope with the problem of destruction.

Now we’ve seen the faces of some problems we find in the computer programming world. We build software that might fail, has to deal with a ton of extra context, models non-deterministic choice, and sometimes has observable effects on the world that constrain when we can perform the computation.

It may now seem that we’ve left the nice and neat world of mathematical functions far behind. We have not! On closer inspection, we’ll see that if we can just squint hard enough, each of these quasi-functions can actually be seen as true, honest-to-goodness functions after all. There is a cost, though. To turn them into real functions, we need to change the *range *of those functions to something else. Let’s see how it works for each of our function types in turn:

Our first example of pseudo-functions were those that might fail. It’s not hard to see that a function that could fail is really just a function whose results include two things:

- successes, which are the intended possible results; and
- failures, which are descriptions of why the attempt failed.

So for any set *A*, we’ll define a new set called *Err(A)* to be just *A* together with possible reasons we might have failed. Now a possibly failing function from a set *A* to a set *B* is really just an ordinary function from *A* to *Err(B)*.

Our second type of pseudo-functions were those that depended on information that they got from the world around them: perhaps preferences or application settings. We play a similar trick here, but for a set *A*, we will define the set *Pref(A)* to be the *set of functions* from application settings to the set *A*. Now watch closely: a function *in context* from *A* to *B* is just an ordinary function from *A* to *Pref(B)*. In other words, you give it a value from the set *A*, and it gives you back *another **function* that maps from application settings to the set *B*.

As confusing as that might sound, a function whose range is another function is really just a function of two parameters, except that it takes its parameters one at a time! Take a minute to convince yourself of this. The conversion between these two equivalent ideas is sometimes called “currying”. So by changing the *range* of our function, we actually effectively added a new parameter, and it now receives the application settings as a parameter. Remember that except for being inconvenient (we’ll deal with that later), that’s exactly what we wished for.

This is perhaps the most obvious example of all. Our third type were those that represent non-determinism: instead of one specific answer, they have many possible answers. This is easy enough: for each set *A*, define *P(A)* to be the *power **set* of *A*, whose members are themselves sets of values of *A*. Then a non-deterministic function from *A* to *B* is just an ordinary function from *A* to *P(B)*.

Our final trick is to deal with functions that have destructive effects. Here we’ll need to be a bit more elaborate in constructing a new range: for each set *A*, we define *IO(A)* (standing for *input/output*, which captures the notion of effects that interact with the rest of the world). An element of the set *IO(A)* is a list of *instructions* for obtaining a member of *A*. It is not a member of *A*, merely a way to obtain one, and that procedure might have any number of observable effects.

Now we play the same trick and change the range: a *destructive function* from *A* to *B* is just an ordinary plain old mathematical function from *A* to *IO(B)*. In other words, if you give me an *A*, then as a plain old function I can’t actually do the steps to get a *B*, but I can certainly tell you what they are.

*But what about composition?* It’s great to be back in the world of plain functions, but remember what got us here in the first place? We liked functions because we liked composition; but it seems we’ve now lost composition! If I have a possibly failing function from *A* to *B*, and another from *B* to *C*, well now I’ve turned them into functions from *A* to *Err(B)* and then *B* to *Err(C)*. Those function domains and ranges don’t match up, and I can’t compose them!

Oh no…

Well, all is not lost. I just haven’t yet told you how to compose these “special” functions.

Because some math dude found these things before us, we call our “special” functions by a name: Kleisli arrows. There are two things going on here at once, so keep your eyes open: first, Kleisli arrows are just plain old ordinary functions, but with weird-looking ranges. Since they are just functions, you can compose them *as **functions*, and that’s just fine. But at the same time, they are “special”, and we can compose them *as Kleisli arrows*, too.

Remember what we decided earlier? The right way to think about composition is by talking about a category. Sets are a category, and that’s fine if you want plain function composition. But now we want a *new* kind of category, too. It’s called the Kleisli category. If you don’t remember what all the parts of a category are, take a second to review them. To define a category, I need objects, arrows, identities, and composition.

- To keep things simple, the objects in this new category will be the same: they are just sets of things.
- The arrows in this category are, unsurprisingly, the Kleisli arrows.
- I haven’t told you yet what the identities and composition look like, so let’s do that next.

First, we look at * failure*. We’re given a failure Kleisli arrow from

The central idea of error handling is that if the first function gives an error, then we should stop and report the error. Only if the first function succeeds should we continue on to the second function, and give the result from that (regardless of whether it’s an error or a success).

To summarize:

- If
*g*(*x*) is an error, then (*f*·*g*)(*x*) =*g*(*x*). - If
*g*(*x*) is a success, then (*f*·*g*)(*x*) =*f*(*g*(*x*)).

To complete the definition of a category, we also need to decide about the *identity* Kleisli arrows. These are the ones that don’t do anything, so that if you compose them with any other Kleisli arrow, it doesn’t change the other one. Identities are functions from *A* to *Err(A)*, and it turns out these are just the functions *f*(*x*) = *x*, just like for sets. Notice that means they never return an error; only a successful result.

I’ll run more briefly through the remaining three examples, but I encourage readers who aren’t yet clear on how this will work to write them out in more detail and use this opportunity to become more comfortable with defining a second category of Kleisli arrows.

Next we have Klesli arrows for * dependence*, which are functions from

The Kleisli arrows for * uncertainty*, or non-determinism, are functions from

Finally, Kleisli arrows for * destructive effects* are functions from

These new categories have their own function-like things, and related ideas of composition and identities, that express the unique nature of each specific problem. By using the appropriate notion of composition in the right Kleisli category, you can solve any of these long-standing computer programming problems in a nice composable way.

And that’s why you should care about monads.

*Monads?!?* Oh yes, I should mention that we’ve just learned about monads. We simply forgot to use the word.

This section is for those of you who want to know how the stuff we said earlier are related to monads as they are understood in mathematics. If you open Wikipedia, or most category theory textbooks, and look up monads, they won’t look very much like what we just did. You’ll see something about an endofunctor, and two natural transformation, and properties about commuting triangles and squares.

We haven’t talked about functors at all, much less natural transformations… so how could we have possibly learned about monads? It turns out there’s more than one way to describe monads. The one we’ve just gone through is an entirely valid one. The shifts we made to the ranges of our functions earlier — *Err*, *Pref*, *P*, and *IO —* are actually examples of monads. To make sure they are monads in the conventional math way, we’d have to work pretty hard: first, prove that they are *functors*. Then build two *natural transformations* called η and µ, and prove that they are natural. Finally, prove the three monad laws.

But wait, there’s an easier way! Heinrich Kleisli, whom we’ve already met from the categories earlier, pointed out that if you can build a category like the ones we did in the last section, whose arrows are just functions with a modified range, then your category is *guaranteed* to also give you a monad. That’s quite convenient, because as computer programmers, we tend to care a lot more about our Kleisli arrows than we do about a mathematician’s idea of monads. Remember, those Kleisli arrows are exactly the modified notion of functions that we were *already* using, long before we ever heard a word about category theory! And Kleisli tells us that as long as composition works the way we expect with our Kleisli arrows (namely, that it’s associative and the identities act like identities), then all that other stuff we’re supposed to prove to show we have a monad just happens for us automatically.

Still, it’s an interesting side question to look at the relationship between the two. I won’t give all the details, but I’ll give the structure, and then leave the interested reader with some familiarity with category theory to fill in the proofs of the relevant properties. We’ll use *Err* as our monad, just to pick a specific example, but nothing here is specific to *Err*.

- We start with
*Err*, which is already a map from objects to objects. But the traditional definition of a monad also requires that it be a*functor*. That is, given a function*f*from*A*to*B*, I need a way to construct a function*Err(f)*from*Err(A)*to*Err(B)*. I do it as follows: in the*underlying*category (not the Kleisli category, just the category of sets), I find an identity function from*Err(A)*to*Err(A)*. Then I find a Kleisli identity from*B*to*Err(B)*. I compose that Kleisli identity in the*underlying*category with*f*, and get a function from*A*to*Err(B)*. I can now do a Kleisli composition of the identity from*Err(A)*to*Err(A)*and the function from*A*to*Err(B)*, and get a function from*Err(A)*to*Err(B)*. That’s the one I’ll call*Err(f)*. - Next, I need a natural transformation η, from the identity functor to
*Err*. This is easy: the components of η are the Kleisli identities. - Finally, I need a natural transformation µ from
*Err*² to*Err*. To get the component of µ at*A*, I take the identity functions in the underlying category from*Err (Err A)*to*Err (Err A)*, and then from*Err A*to*Err A*, and I combine them with Kleisli composition to get a function from*Err (Err A)*to*Err A*. This is the component of µ.

The construction in the opposite direction is easier. Given a monad *Err*with ? and µ, the Kliesli category is constructed as follows.

- The identities are just the components of η.
- Given a function
*f*from*A*to*Err(B)*and a function*g*from*B*to*Err(C)*, I compose the two as µ ·*Err(g)*·*f*.

Again, the details and the proofs of the appropriate monad and category laws are left to the reader. I hope this brief aside has been useful. I now return to using the word “monad” but talking about monads via Kleisli categories.

Once again, let’s pause to sum up.

- Computer programmers like to work by composing some things together, which we call functions.
- They aren’t functions in the obvious way… but they do make up a category.
- Actually, they are functions after all, but only if you squint and change the ranges into something weirder.
- The category that they form is called a Kleisli category, and it’s basically another way of looking at monads.
- These monads / Kleisli categories nicely describe the techniques we use to solve practical problems.

It’s not just about those four examples, either. Those are typical of many, many more ideas about programming models that can be described in the same framework. I think it’s fair to sum up and say, at this point, that someone interested in studying and analyzing programming languages and models should be familiar with some ideas from category theory, and with monads in particular.

But still, what about the humble computer programmer, who is not designing a new language, is not writing research papers analyzing programming languages, but just wants to solve ordinary everyday problems? That’s a fair question. As long as monads remain just a mathematical formalism for understanding what computer programmers mean by functions, the practicing computer programmer has a good claim to not needing to understand them.

It’s becoming clear, though, that monads are on their way into practical programming concerns, too. In the past, these Kleisli arrows, the modified notions of “function” used by computer programmers, were built into our programming languages. Functions in C used a Kleisli arrow, and C++ functions used a different one. The language specification would tell us what is and what is not possible using a function in this language, and if we wanted something different, too bad. Maybe once a decade, we’d make the swap to a brand new programming language, and bask in the warm rays of some new language features for a while.

Consider the *Err* monad, which gave us functions that might fail and report their failure in structured ways. Modulo some details and extensions, this is basically structured exception handling. Looking to history, programmers worked without exception handling in their programming languages for many years. Of course, languages like C are all Turing complete, and can solve any possible computational problem, proper error handling included. But we don’t apply categories to think about possible computations; categories are for thinking about composition. Without exception handling in the notion of a “function” that’s provided by languages like C, programmers were left to do that composition by hand.

As a result, any C function that could fail had to indicate that failure using a return value. In many cases, conventional wisdom built up saying things like “return values are for indicating success or failure, *not* for giving back answers”. Coding conventions called for most if not all function calls to be followed with *if* statements checking for failure, and the resulting code was borderline unreadable. This was the heyday of flowcharts and pseudo-code, because no one expected to be able to understand real code at a glance! In reality, though, programmers only checked for errors when they thought they was possible, and a lot of errors went undetected. Programs were often unreliable, and likely untold billions of dollars spent on extra development work and troubleshooting.

What was the reason for this? It’s quite simple: the C programming language and others of its time provided an insufficient kind of Kleisli arrow! If their Kleisli arrow had included the functionality from the *Err* monad we defined above, this could have been avoided. But the notion of what a function means in C is fixed, so the answer was to deal with it, and eventually migrate to a different programming language, rewriting a lot of software, and likely costing another untold billions of dollars.

What about the *Pref* monad, and others like it? As discussed earlier, this is about defining computations in a larger context of available information and state of the world.

In the past, we had global variables, the slightly more modern equivalent of just storing information at a known place in computer memory. Quick and dirty, but even 30 years ago, programmers knew they were the wrong answer, and wouldn’t be manageable for larger programs. Object oriented programming tried to alleviate the problem a little, by having functions run in a specific “object” that serves as their context, and that was implicitly passed around at least within the implementation of the object itself. To get this, everyone effectively had to change programming languages to get a better Kleisli arrow again. But even so, object-oriented languages don’t give a perfect answer to this problem.

This point is about the future, but I’ll start out by pointing out that everything here is already possible, but just requires an appropriate choice of programming language!

One current challenge for the computer programming community is finding effective ways to handle parallelism. Ironically, while past examples have focused on the problem of putting too little power into a language’s Kleisli arrow, the problem this time is too much! Plain (also known as “pure”) functions present lots of opportunities for parallelism. When code is executed in parallel, it may run faster, or if the parallelism is poorly designed it may even run slower, but in any case it will *certainly* still give the same answer. But when the Kleisli arrow incorporates destructive updates, that is no longer the case. Now parallelism is risky, and might give unexpected or incorrect results due to so-called *race **conditions*.

We can’t just remove destructive updates from a language’s Kleisli arrow, though. A program that has no observable effects at all isn’t useful. What is useful is the ability to separate the portions of code that perform destructive update from those that just compute pure functions. So for the first time, we need a language with more than one kinds of Kleisli arrow, in the same language!

There is already at least one language that offers precisely this. Programmers in the Haskell language can build their own monads, and work in the Kleisli category of a monad of their choosing. The programming language offers a nice syntax for making this approach readable and easy to use. If something might fail, you can throw it in *Err*. If it needs access to the application settings, throw it in *Pref*. If it needs to do input or output, throw it in *IO*. Haskell web application frameworks and similar projects start by defining an appropriate monad with the appropriate features for *that kind* of application.

Another current trend in the computer programming community is toward building more domain-specific programming models. The language Erlang became popular specifically for providing a new programming model with advantages for parallelism. Microsoft’s .NET framework incorporates LINQ, which offers a programming model that’s better for bulk processing and querying of collections of data. Rails popularized domain-specific languages for web applications. Other languages offer continuations as a way to more easily build specify computations in a more flexible way. All of these are examples of working in new and different Kleisli arrows that capture exactly the model appropriate for a given task.

It comes down to this: If we believe that there is one single notion of “function” that is most appropriate for all of computer programming, then as practical programmers we can find a language that defines functions that way, and then forget about the more general idea of monads or Kleisli arrows as a relic of theoreticians. But it’s not looking that way. The programming community is moving quickly toward different notions of what a function means for different contexts, for different tasks, even for different individual applications. So it’s useful to have the language, the tools, and the intuition for comparing different procedural abstractions. That’s what monads give us.

Using a language with a choice of monads offers some other advantages here, too. It gives us back our abstraction. In Haskell, for example, it’s possible to write code that is applicable in multiple different monads. A surprising amount of the programming done with one monad in mind actually has meaning in very different monads! For example, consider the following Haskell type:

*sequence :: Monad m => [m a] -> m [a]*

What this means is that for any monad, which we’ll call *M*, sequence converts from a list of values of *M(A)* into *M(List(A))*, the monad applied to lists themselves. Let’s take a minute to consider what this means for each of our four examples. For *Err*, it takes a list of results that might be failures, and if any of them are failures, it fails; but if not, then it gives back a list of all the results. It’s basically a convenient way to check a whole list of computations for a failure. For *Pref*, it takes a single set of application preferences, and distributes that to everything in the list, giving back a list of the results. For the power-set monad, *P*, it would take a list of sets, and give back a set of all the ways to choose one item from each set. And for *IO*, it takes a list of instruction cards, and gives back the single card with instructions for doing all of them in turn. Amazingly, this one function, which had only one implementation, managed to make sense and do something useful for all four of our examples of monads!

Along with a choice of monads comes the ability to abstract over that choice, and write meaningful code that works in *any* monad that you do end up choosing.

Between all of these forces, I predict that within the next ten years, software developers will be expected to discuss monads in the same way that most developers currently have a working vocabulary of design patterns or agile methodologies.

While most of this has been about monads, I don’t want to leave anyone with the impression that monads are the only influence of categories in computer programming. All of the following ideas have found their way into programming practice, mostly (so far) within the Haskell programming language community because of its flexibility and a deep academic culture and history.

*Monad transformers*are a powerful technique for combining the effects of more than one monad to build rich and powerful programming models.*Functors*and*applicative functors*(a.k.a. strong lax monoidal functors for mathematicians) are weaker than monads, but more widely applicable.- Other kinds of categories that are not Kleisli categories can often be defined and composed to solve specific problems. Freyd categories are also useful.

I’ll stop there, but only as an encouragement to look more into the various abstractions from category theory that programmers have found useful. A good starting point is the (Haskell-specific) Typeclassopedia by Brent Yorgey. That’s just a door into the many possibilities of applying category-based ideas and intuitions in computer programming.

But I hope I was able to convey how these ideas aren’t just made up, but are actually the natural extension of what computer programmers have been doing for decades.

]]>