I’ve been pouring a lot of effort into CodeWorld lately… and I wanted to write a sort of apology to the Haskell community. Well, perhaps not an apology, because I believe I did the right thing. But at the same time, I realize that decisions I’ve made haven’t been entirely popular among Haskell programmers. I’d like to explain what happened, and try to make it up to you!
Originally, I started this project using Haskell and the excellent gloss package, by Ben Lippmeier. CodeWorld has been moving slowly further and further away from the rest of the Haskell community. This has happened in a sequence of steps:
- Way back in 2011, I started “CodeWorld”, but at the time, I called it Haskell for Kids. At the time, I understood that the reasons I’d chosen Haskell as a language were not about cool stuff like type classes (which I love) and monads and categories and other commonplace uses of solid abstractions (which fascinate me). Instead, I chose Haskell for the simple reason that it looked like math. The rest of Haskell came with the territory. I built the first CodeWorld web site in a weekend, and I had to settle on a language and accept all that came with it.
- From the beginning, I made some changes for pedagogical reasons. For example, gloss defines rotation to be clockwise. I insisted on rotation working in the counter-clockwise direction, because that’s the convention universally used in math. Later, I resized the canvas to 20×20, so that typical programs would need to use fractions and decimals, which is a middle school math education goal. I made thes changes, even though they broke compatibility with a widely used package. Sorry for anyone that’s struggled with this.
- I rebranded “Haskell for Kids” as CodeWorld, and stopped explicitly depending on gloss in favor of just reproducing its general approach in a new Prelude. This was a deliberate attempt to get away from focusing on the Haskell language and libraries, and also to the accompanying import statements and such. This hid the ways that Haskell was a general purpose language with uses outside this toy environment. That is unfortunate.
- I rewrote the Haskell Prelude, to remove type classes. Along the way, I collapsed the whole numeric type class hierarchy into a single type, and even got Luite (the author of GHCJS) to help me with some deep black magic to implement equality on arbitrary Haskell types without type classes. This threw away much of the beauty of Haskell… in favor of dramatically improved error messages, and fewer things you need to know to get started. It was a real loss.
- Finally, I commited the unforgivable sin. I dropped curried functions, in favor of defining functions of multiple parameters using tuples. This finally makes CodeWorld feel like a completely different language from Haskell. That really sucks, and I know some people are frustrated.
Why It Happened?
First, I want to point out some things that are not the reason for any of this:
- I did not do this because I think there’s something wrong with Haskell. I love type classes. I love currying, and especially love how it’s not just a convenient trick, but sometimes introduces whole new perspectives by viewing tedious functions of multiple parameters as simple, clean, and elegant higher-order functions.
- I also did not do this because I think anyone is incapable of learning full-fledged Haskell. In fact, I taught full-fledged Haskell to middle schoolers for a year. I know they can do it.
So why did I do it? Two reasons:
- Teaching mathematics has always been more important to me than teaching Haskell. While Haskell is an awesome programming language, mathematics is just an awesome perspective on life. For every student who benefits from learning an inspiring programming language, many students will benefit from learning that humanity has a method called mathematics for thinking about fundamental truths in a systematic, logical way that can capture things precisely. So any time I have to choose between pushing students further toward their math education or away from it, I’ll choose toward it.
- Details matter. Even though I know kids are capable of a lot, they are capable of a lot more without artificial obstacles in their way. I learned this the hard way teaching this class the first time. The smallest little things, with absolutely no great significance as a language, matter a lot. Having to put parentheses around negative numbers obscures students from reaching leaps of understanding. Confusing error messages mean the difference between a student who spends a weekend learning, and one who gives up on Friday afternoon and doesn’t think about it until the next school day. Different surface syntax means that a lot of kids never fully make the connection that functions here are the same thing as functions there.
In the end, I do think these were the right decisions… despite the frustration they can cause for Haskell programmers who know there’s a better way.
Making Up For It
A couple weekends ago, though, I worked on something to hopefully restore some of this loss for Haskellers. You see, all the changes I’ve made, in the end, come from replacing the Prelude module with my own alternative. Specifically:
- I deliberately replaced functions from the Prelude with my modified versions.
- Because I provided an alternative Prelude, I had to hide the base package, which made it impossible to import things like Control.Monad. This was not a deliberate decision. It just happened.
So I fixed this. I added to the codeworld-base package re-exports of all of the modules from base. I renamed Prelude to HaskellPrelude in the process, so that it doesn’t conflict with my own Prelude. And finally, I added a new module, CodeWorld, that exports all the really new stuff from CodeWorld like pictures, colors, and the interpreters for pictures, animations, simulations, etc. The result is that you can now start your programs with the following:
import CodeWorld -- If you still want to do pictures, etc.
main = putStrLn "Hello, World"
At this point, you can write any Haskell you like! You aren’t even constrained to pure code, or safe code. (The exception: TemplateHaskell is still rejected, since the compiler runs on the server, so TH code would execute code on the server.)
Right now, the CodeWorld module still uses uncurried functions and other CodeWorld conventions like Number for numbers, etc. There’s no reason for this, and it’s something that I should probably change. Anyone want to send a pull request?
I’m continuing work on CodeWorld, my educational programming environment based on geometry and algebra. There are big changes coming! If you’re interested in following the project, please join the new codeworld-discuss mailing list, where I’ll send more regular announcements about significant changes, as well as try to answer questions, and discuss future directions.
Here are some things I intend to change in the near future. A more complete list is on the project issue tracker, but this is a summary with more details and reasoning about some of the changes.
Aligning With Math Education
An important goal of this project is to align with a standards-based U.S. middle school math education, as much as possible. To be clear, I still refuse to add complexity or turn the project into a patchwork of specific lessons that promote a specific narrow path of learning. First and foremost, this should be an environment for tinkering and encountering ideas in self-motivated way. But given alternative designs that could each be valid on their own, I’ll choose the one that pushes students toward the math standards.
It’s sometimes a tough line to draw. But I’ve become convinced that there are a few places where I can do better. Two of those are going to be major breaking changes, coming soon.
1. Death to Currying
Haskell’s convention of currying functions is the wrong default for CodeWorld. Practically all of mathematics, especially at introductory level, is carried out with the notation f(x,y) = … . The interpretation is that a function of two parameters is a function whose domain is a product – a set of ordered pairs. The Haskell language makes a different choice. Applying a function to two parameters is more like f(x)(y) (the parentheses are optional in Haskell itself), and the interpretation is that f(x) denotes a partially applied function that’s still waiting for its second parameter.
If the goal were to teach about higher-order functions, there would be lots of great arguments for the latter. If the goal were convenience, you could argue for the latter pretty persuasively, as well. I think Haskell’s use of currying is great. But when the goal is to let students encounter and tinker with things they will see in school math, the right choice is to adopt the convention of mathematics.
Luckily, the assumption of curried multi-parameter functions isn’t baked into Haskell too deeply. By changing the standard library, it’s quite possible to write f(x,y) just as well. The parentheses on f(x) become optional, but this is actually true of mathematics in general (for example, operators in linear algebra are often written without parentheses, as are trig functions). I will adopt the convention of using parentheses around even single function parameters.
The only big source of awkwardness comes with binary operators. So long as we choose not to teach the notations `foo` (for turning a function into an operator) or (+) (for turning an operator into a function), this doesn’t come up much. Notably, sections still work fine, since they take only one argument.
A couple convenient side effects of this choice are nice, too:
- Students who routinely write parentheses around function arguments less often find themselves forced to surround negative numbers in parentheses for weird parsing reasons. As trivial as it might seem, this was a very real and significant learning obstacle the last time I taught the class, and I’ll be happy to see it go.
- Getting expression structure wrong sometimes gives much better error messages this way. It’s harder to accidentally mix up precedence between an operator and function application; and passing too few arguments to a function gives a clear error rather than inferring a function type and breaking in an obscure way elsewhere.
2. Resizing the Canvas
The second big change is to resize the canvas from 500×500 to 20×20.
The justification for a 500×500 canvas was generally about confusing pixels – little dots on the screen – with the general idea of a coordinate system. It’s convenient to blur the distinction at first, but it has in the past become a barrier to understanding the full nature of the coordinate plane with real (or even rational) coordinates. Many students were confused when later faced with fractional coordinates. At the same time, developing a full understanding of the rational number system is a big topic in 6th, 7th, and 8th grade mathematics, so it would be great to ask students to do more tinkering with these numbers.
By replacing this with a 20×20 grid (x and y coordinates ranging from -10 to 10), several goals are accomplished:
- Students early in the class are working with numbers in a range they can comprehend better.
- Students routinely work in fractions or decimals to fine tune their projects.
- The abstract coordinate plane, including fractional coordinates, becomes more familiar.
This is a big win overall.
Changes to Usability
On the less controversial side, I’m planning a number of changes to make the site more usable:
- Pervasive auto-complete, based on a pre-populated list of the standard library symbols as well as parsing the student code for declared names.
- More complete documentation, tutorials, and better examples. I admit that the current site is grossly lacking in documentation. I don’t envy anyone who tries to figure it out on their own!
- Better tools for playing around with results. At the very least, students will be given the chance to scroll, pan, and see coordinates of points in pictures, animations, and simulations.
Long-Term Wish List
I also have my wish list for things I’d love to see possible, but am not quite ready to build yet. This includes:
- Social features: sharing projects with friends, commenting on or expressing support for other projects.
- Collaborative projects with shared editing or exporting libraries for others to use.
- Better debugging tools, such as easy controls to move forward and back in time, fast-forward, pause, etc. for animations, simulations, and even games.
- Possibly grading features for teachers to grade projects and provide a scoring rubric and comments.
About three years ago, I started work on an idea about technology-based math education. The idea was to get middle school students to work passionately on using mathematics to create things, by:
- Doing their own original, creative work, instead of following instructions or reaching set answers.
- Getting instant feedback 24 hours a day, so they can tinker and learn in a self-directed way.
- Building confidence by working on their own ideas, inspiring pride and excitement.
- Experiencing how concepts from geometry, algebra, and physics can be springboards for creativity.
- Becoming creators, rather than just consumers, of technology.
That’s a lofty set of goals, but it was very successful. In the 2011-2012 school year, I taught a small class of six students, two to three hours per week. We had an awesome time. They built their own computer games throughout the year. We struggled together, worked our way through, and finished the school year with an awesome expo where the students showed off their work to local technology professionals and participated in a question-and-answer panel about their experiences. It was fascinating listening to this, because a few patterns arose:
- Students didn’t really think of what they were doing as math. This remained true, even when the skills they learned involved describing the behavior of systems using equations, functions, and variables; describing complex shapes in terms of geometry, the coordinate plane, and rotations, translations, and scaling; coming to grips with the meaning of probability and randomness; etc.
- The students who entered the year being “good at technology” weren’t necessarily the most likely to succeed. Talking to these students broke all of the stereotypical molds about computers and technology! Students took to the activity and wildly succeeded were very often girls, and had previously thought they were more the art-and-music type.
At the end of that year, I had plans to teach this program in multiple schools the following school year. Unfortunately, things then got a little sidetracked. I started a new job at Google over the summer, moved to California, and dropped the program. The web site that students had used to build their projects fell into disrepair, and stopped working. I stopped doing anything about it.
Over the last week and a half, though, that’s changed! CodeWorld is back!
The CodeWorld web site is (as always) at http://www.codeworld.info.
Any web browser will do, but you really need to use the latest version of whatever browser you choose. If you’ve been putting off upgrading Internet Explorer, it’s long past time!
You’ll also want a Google account. You can log in using your Google account, and save your programs to Google Drive. Because your programs are saved to the cloud, you can use the web site from any computer you like, even computer labs in a school, and your programs will follow where ever you go.
Using the web site is simple. Type your program on the left. Click Run to see it work on the right. You can sign in to open your existing projects and save your projects. You can also get links to share your projects with others. There are sample projects along the bottom of the screen, including Yo Grandma!, a game written by Sophia, one of my students from the original class.
Unfortunately, instructions on how to write the programs are still mostly missing. If you already know the language, a link to the generated documentation might help. Otherwise, hold on! Once the programming environment is stable, I plan to put together a comprehensive progression of exercises, tutorials, and examples.
Behind the Scenes
Under the hood, I mostly recreated this from scratch, throwing away most of the original project from a few years ago. This new version of the environment has a lot of advantages: it runs your programs on your own computer, so your program runs a lot faster. It’s less restrictive. And I completely customized the language to make a lot of things simpler and easier to understand.
- The programming language for CodeWorld is called Haskell. Haskell is an awesomely mathematical language, but parts of it are also notoriously complex. The new incarnation of CodeWorld still uses Haskell, but goes a lot further to hide the rough edges. In particular, you’ll rarely see any classes, and there’s an obvious type for most things (e.g., all text has the type Text, and all numbers have the type Number.)
- Previously, CodeWorld was based on a library called Gloss for the Haskell programming language. Gloss is great, and I saved as many ideas from it as I could. But CodeWorld is now its own library. This let me clean up some terminology, align the meaning of programs more closely with the goals of algebraic thinking and math concepts, and work with the simplified version of the language.
I’ll try to keep posting here as I have learning material ready to use with this tool. Stay tuned!
I’ve been thinking a lot the last few weeks about the future of CodeWorld. I’m writing this to share my vision of where things are going, and why. I’d appreciate feedback from anyone with thoughts on the future, different ideas, and so on.
And Overview of the Past
For anyone new to the ideas, I should review a little about what CodeWorld is. CodeWorld is the name for the curriculum I built and taught last school year. It has the goal of teaching abstract and mathematical reasoning skills through computer programming. Throughout the course of the year, students use a web site to write descriptions of pictures, then animations, and finally video games, and run them in the web browser to immediately see the results.
I taught a pilot program last school year in a neighborhood school, and it was a huge success. I have high hopes of getting something going again in the future.
Many people have noticed that CodeWorld stopped in its tracks some time during the summer. What ended up happening was a fairly big life change for me: I’ve changed jobs (now working for Google on YouTube!), moved (to San Francisco from Colorado), and have been focused on personal changes for a while. Unfortunately, that meant I had to cancel the plans to teach CodeWorld at three schools in Colorado this school year. I hope, though, this is only a temporary setback! I still have high hopes for the future.
Part 1: The Technology
In terms of technology, CodeWorld is built on several things:
- The Haskell programming language. This is a language that’s very well suited for CodeWorld because it expresses things in a way that’s very declarative and consistent with the mental models students should bring into algebra and other abstract mathematics classes.
- The Gloss library. This library, originally developed for introductory college programming classes, is a great fit for many of the same reasons. It’s based on Haskell, and again does everything declaratively. It’s also very concise, not needing a lot of wordy boilerplate to get something quickly on the screen.
- The SafeHaskell feature of GHC. This allows the server to safely compile and run code written by students without worrying that it might delete critical files and such.
In some ways, the set of technology chosen for CodeWorld is perfect. Specifically, it let me get a web site up and running in about a week of work, right before the start of the class last school year. That saved a lot of hassle, and made the class a lot more successful. In other ways, though, each of these choices is lacking in a few ways.
- Haskell is a great choice of language, but it’s very difficult to target Haskell to run in web browsers. For this reason, CodeWorld actually runs students’ programs way over in a data center, far from their computers. This worked and was easy to do (great!), but it also means programs run very slowly, and the server gets bogged down if it’s used by too many people at once (not so good).
- Gloss is the closest thing we have to a perfect choice. Even that, though, has its limitations. Because Gloss has a number of users who want to do more advanced things (like use it to show off parallel processing APIs in Haskell), there are some technical pieces in the back end that are difficult to fit in with a web browser as well. Also, mainly due to backward compatibility, there are inconsistencies in the API (circle to draw the outline of a circle versus rectangleWire to do the same with a rectangle, for example), inconsistencies with what students will see in math classes (angles are clockwise, for example, in the official implementation), and lost opportunities to make small tweaks that will more clearly emphasize declarative thinking. We definitely want something like Gloss, but there are a few details that it would help to tweak in some incompatible ways.
- SafeHaskell turns out to be problematic for a variety of reasons. Aside from the performance problems from running on the server in the first place, it’s really only a small part of a solution to the problems of running untrusted code. For example, it has no protections against using excessive resources and slowing down the server until it’s unusable.
This technology migration isn’t done yet, and there’s still a lot of little syntax stuff that’s frustratingly unsupported by Fay… but I’m working on it, helping out the Fay project a little but, too, and it’ll get there!
Part 2: The Presentation
In addition to changing technologies, I’ve been putting a lot of thought into what I want the future of the system to look like. I am looking at how to make the platform more appealing when I am not personally living right by the school and able to pop over and help with any students who are having problems.
One side of this is definitely a redesign of the web site. The site needs examples, it needs to look more finished, and it needs documentation! I’m working on a mock-up of the future web site, and while my web design skills are definitely lacking, I think I’m happy with the concept so far. You can take a look, if you want, and let me know what you think.
Another side actually was started over the summer, and that’s putting together a comprehensive teacher’s guide, including worksheets, exercises, and organization of the content. This needs to be more than just a web site; it needs to be an organized approach to presenting the ideas and guiding students through the learning process. I’m strongly considering making a somewhat involved video series to present the concepts, as well.
Part 3: Community
The third and final part of this is putting the pieces in place for a sustainable community of people working together. I don’t have any specific plans here, yet… but I’ll just say that I see positive things ahead. I’m now working with tens of thousands of brilliant people, many of whom care very much about education, in a company that encourages its employees to experiment with technology in ways that aren’t focused on just one product. I’m also within a short drive of the Khan Academy, and plenty of other groups that care a lot about math education.
Some things I’d like to try here:
- Getting together students who build things with CodeWorld to meet and share their creations.
- Helping students make the transition from developing things for “a school project” to sharing them with friends and others, and possibly even offering their creations in various app stores or markets. (Student creations in CodeWorld actually tend to be fairly similar to the casual game market for smartphones and such…)
- Creating more resources for teachers who want to incorporate this into their schools. I’d love to hold seminars and such for teachers! Please ask me to do this, once the pieces are in place!
So that’s where CodeWorld is, and where it’s hopefully going. I’m still very excited about the future of this effort, and looking forward to making some more progress here soon.
(A Side Note: I’ve been formulating the final thoughts on this post for about a week now. In an entirely unrelated coincidence, a good friend of mine and fellow Haskell programmer, Doug Beardsley, ended up writing two posts about monads over the weekend as well. Weird! But don’t fret; this isn’t really the same thing at all. I’m not writing to teach Haskell programmers how to use monads. I’m writing about a kind of intuition about why these concepts turn out to matter in the first place. You won’t find much here by way of how to program in Haskell.)
Category Theory for Software Development?
Match made in heaven? Or abstraction distraction?
If you’re a software developer, have you heard about monads and wondered what they were? Have you tried to learn Haskell, and struggled with them? Wondered why people worry so much about them? Have you watched the videos from Microsoft’s “Channel 9″ and heard a bunch of researchy Microsoft folk talk about them, but had trouble relating them to your day-to-day programming experience?
Or if you’re interested in mathematics, have you heard murmurs in the past about how category theory interests computer science people? Looked for some clear statement of why we care, and what problems we might be interested in? Wondered if it’s really true at all? Perhaps you are like a friend of mine (and a first-rate algebraist, too, so it’s entirely reasonable to have these questions) who asked me about this a year or so ago, remembered hearing a lot of excitement in the early 90s about category theory and computer science, but never heard whether it had really panned out or was a dead end?
These are the kinds of questions I begin with. My goal is to demonstrate for you, with details and examples:
- Where category-based intuition and ideas, and monads in particular, come from in computer programming.
- Why the future of programming does lie in these ideas, and their omission in today’s mainstream languages has cost us dearly.
- What the state of the art looks like in applying category-based ideas to problems in computer programming.
If you’re coming into this without a knowledge of category theory, never fear; this may be one of the gentlest introductions to the idea of categories and monads that you will find. But you’ll want to slow down and take a moment to understand the definition of a category and related ideas like function composition; these are absolutely crucial. Then you want to completely skip or just skim through the section called “What’s This Got To Do With Monads?” where I tell you how what we’re talking about here relates to the traditional math meaning of monads. Don’t worry, you don’t need to know that at all.
On the other hand, if you’re a mathematician, you may want to skim the bits where I review basic category theory, and just dig in where I am talking about the computer programming perspective. Just be forewarned, my introduction to monads will be via Kleisli categories, so take a minute when we get to that part and make sure you’re familiar with how the relationship works out.
Ready? Here goes!
Computer Programming and Functions: A Tenuous Relationship
Quick quiz: Do computer programmers use functions?
Ask any computer programmer you know, and you will hear: YES! Functions are some of the most basic tools computer programmers use. Then you’ll get odd looks, for asking such a silly question. Of course computer programmers use functions. That’s like asking if carpenters use nails! Right?
The truth, though, is a bit more complicated. To a mathematician, a function is just an association of input values to output values… and that is all! Any two functions that associate the same input values to the same output values are the same. Yes, you can represent functions by formulas (sometimes, anyway), but you can also represent them with just tables of inputs and outputs, or if they are functions between real numbers, as graphs. If you ask computer programmers for examples of functions, though, you will start hearing about some pretty bizarre things. I call these the “I must have skipped that day of calculus” functions. These are things that computer programmers are quite happy referring to as functions, but that to a mathematician are not really functions at all!
- “Functions” that return randomly chosen numbers… and if evaluated several times, will give a different answer each time.
- “Functions” that return one answer on Sundays, but a different answer on Mondays, yet another on Tuesdays, and so on.
- “Functions” that cause words to appear on some nearby computer screen every time you calculate their values.
What’s going on here? Most computer programmers go about their lives happily calling these things functions, but really they are something different. But wait a second! They do have quite a lot in common with functions. Namely, they have: (a) parameters, representing their domain; and (b) return values, representing their range. (Many computer programmers are happy to talk about functions that have no parameters, or no return values… but there’s no need to be overly picky here. We can just regard their domains and ranges as one-element sets, so that no actual information is conveyed, but we can keep up appearances.)
Even more importantly, these “functions” share one more thing with the functions of mathematicians: they are constantly being composed by taking the result from one function and passing it along as a parameter to another function. When I say composed, I mean it almost exactly in the basic mathematics sense of function composition: (f · g)(x) = f(g(x)). In fact, the whole reason our “functions” exist at all is to be composed with each other! Once upon a time, in the early days of computers, we liked to keep track of information by just sticking it in known places in the computer’s memory; but all this shared knowledge about where to find information made it hard to write parts separately and fit them together, so we mostly switched to this idea of functions and composition instead.
Here’s the executive summary so far:
- When computer programmers talk about functions, they do not mean exactly what mathematicians do.
- What they do mean is the idea of having inputs (domains), outputs (ranges), and most importantly composition.
Along Came The Category…
So in the previous section, we ended up with our hands full of things that sort of look like functions. They have domains and ranges, and they can be composed. But at the same time, they are not functions in the mathematics sense. Baffling? No, not really. Mathematicians deal with stuff like that a lot. They have a name for systems of function-esque things of exactly that form. That name is… cue the drumroll, please… CATEGORIES!
In math-speak, categories are:
- collections of “objects” (you should think of sets),
- and “arrows” (you should think of functions between sets),
- where each arrow has a domain and a range,
- each object has an “identity” arrow (think of the identity function, where f(x) = x)
- and arrows can be composed when the domains and ranges match up right.
Before we agree to call something a category, we also throw in a few rules, such as if you compose any function with an identity, it doesn’t actually change, and composing functions obeys the associative property. These should be unsurprising, so if they seem strange to you, please take a moment, grab a pencil, and try working it out using the definition of function composition earlier: (f · g)(x) = f(g(x)), and simplifying.
The nice thing about categories is this: it’s not just some pointless abstraction that a bunch of mathematicians made up. Categories are defined that way because people have looked at literally hundreds of things that all look sort of like functions with domains and ranges and compositions. Things from algebra, like groups and rings and vector spaces; things from analysis, like metric spaces and topological spaces; things from combinatorics, like elements of partially ordered sets and paths in graphs; things from formal logic and foundations, like proofs and propositions. Almost without fail, they can be described using all the ideas we just looked at! In short, categories are the right intuition for talking about composing things with domains and ranges, which is exactly the situation we’re in.
The Four Horsemen of the Catapocalypse
Now you can see why categories come into the picture: they are the right intuition for things that maybe aren’t functions, but can be composed like functions. But just because a category exists doesn’t mean it’s worth talking about. What makes this worth talking about is that the category-related ideas aren’t just there, but actually express common concerns for computer programmers.
It’s now time to get a little more specific, and introduce the four examples that will guide us the rest of the way through this exploration. Each example highlights one way that the “functions” used by computer programmers might be different from the functions that mathematicians talk about. These examples represent actual kinds of problems that computer programmers have run into and solved, and we’ll look more at the practical side of them later. For now, we’ll just be happy getting familiar with the general ideas.
The First Horseman: Failure
The first problem is failure. Computer programmers do lots of things that might fail. Reading from files (they might not exist, or on a computer with more than one user, they might not be set to allow you to read them), talking over the internet (the network might be broken or too slow), even just doing plain old calculations with a large amount of data (you might run out of memory). Because of this, dealing with failure is a constant concern.
In general, in modern computer programming tools, it’s always understood that a function might fail. You may get an answer, but you also may get a reason that the task could not be completed. When that happens, programmers are responsible for dealing with it responsibly: letting someone know, cleaning up the leftover mess in computer memory from a half-complete task, and just otherwise putting the pieces back together. A major factor in programming techniques or tools is how easy they make it for programmers to cope with the constant possibility of failure.
The Second Horseman: Dependence
The second problem is dependence on outside information. While functions of mathematics are nice and self-contained, computer programmers often don’t have that luxury. Computer programs are messes of configuration. Even simple mobile phones have pages and pages of settings. What language does the user speak? How often should you save off a copy of their work? Should you encrypt communication over the network? Rare is the application today that doesn’t have a “Settings” or “Preferences” menu item. In many other contexts, too, computer programs depend on information that is a sort of “common knowledge” throughout the application, or some part of the application.
Ways of dealing with this have progressed through the ages. When everything was stored in well-known memory locations anyway, it was easy enough to just look there for information you need; but that led to problems when different parts of a program needed different information and sections of programs could step on each other’s toes. The massively influential technique known as object-oriented programming can be seen as partly an attempt to solve exactly this problem by grouping functions into a context with information that they depend on. The simplest and most flexible answer would be to just pass the information around to all the functions where it is needed… but when that’s a lot of places, passing around all those parameters can be very, very inconvenient.
The Third Horseman: Uncertainty
The third problem is uncertainty, also known as non-determinism. A normal function associates an input to an output. A non-deterministic function associates an input to some number of possible outputs. Non-determinism is less well-known than the first two problems, but possibly only because it hasn’t yet seen a convincing solution in a general purpose language! Consider:
- Theoretical computer science talks about non-determinism all the time, because it’s the right approach for discussing a lot of computational problems, ranging from parsing to search to verification. That language just hasn’t made its way into the programming practice.
- Non-determinism comes up when querying, searching, or considering many possible answers. These are precisely the places that programmers end up relying on a variety of domain specific languages, ranging from SQL to Prolog, and more recently language-integrated technologies like LINQ.
- Even with specialized languages for heavy-duty querying and search tasks, we still end up writing a lot of our own nested looping and control structures for the purpose of looking through possibilities when it’s not worth crossing that language barrier. This kind of thing is responsible for some of the more complex code structures you find these days.
While the first two problems of failure and dependence are at least partly solved by current mainstream programming languages, non-determinism is as yet solved mostly by special-purpose sub-languages, with LINQ as the notable exception.
The Fourth Horseman: Destruction
Finally, the fourth problem is destruction. Evaluating a math-type function is observable only in that you now know the answer. But in computer programming, functions can have permanent effects on the world: displaying information, waiting on responses from other computers or people, printing documents, even quite literally exploding things, if they are running on military systems! Because of this, things that aren’t specified in mathematics, like the order in which evaluation happens, matter quite a lot here.
The destructive nature (by which we just mean having effects that can’t be undone) of computer programming functions has plenty of consequences. It makes programming more error-prone. It makes it harder to divide up a task and work on different parts simultaneously, such as you might want to do with a modern multi-core computer, because doing the parts in the wrong order might be incorrect. But at the same time, these destructive effects are in a sense the whole point of computer programming; a program that has no observable effects would not be worth running! So in practically all mainstream programming languages, our functions do have to cope with the problem of destruction.
Back To The Function
Now we’ve seen the faces of some problems we find in the computer programming world. We build software that might fail, has to deal with a ton of extra context, models non-deterministic choice, and sometimes has observable effects on the world that constrain when we can perform the computation.
It may now seem that we’ve left the nice and neat world of mathematical functions far behind. We have not! On closer inspection, we’ll see that if we can just squint hard enough, each of these quasi-functions can actually be seen as true, honest-to-goodness functions after all. There is a cost, though. To turn them into real functions, we need to change the range of those functions to something else. Let’s see how it works for each of our function types in turn:
Functioning With Failure
Our first example of pseudo-functions were those that might fail. It’s not hard to see that a function that could fail is really just a function whose results include two things:
- successes, which are the intended possible results; and
- failures, which are descriptions of why the attempt failed.
So for any set A, we’ll define a new set called Err(A) to be just A together with possible reasons we might have failed. Now a possibly failing function from a set A to a set B is really just an ordinary function from A to Err(B).
Functioning With Dependence
Our second type of pseudo-functions were those that depended on information that they got from the world around them: perhaps preferences or application settings. We play a similar trick here, but for a set A, we will define the set Pref(A) to be the set of functions from application settings to the set A. Now watch closely: a function in context from A to B is just an ordinary function from A to Pref(B). In other words, you give it a value from the set A, and it gives you back another function that maps from application settings to the set B.
As confusing as that might sound, a function whose range is another function is really just a function of two parameters, except that it takes its parameters one at a time! Take a minute to convince yourself of this. The conversion between these two equivalent ideas is sometimes called “currying”. So by changing the range of our function, we actually effectively added a new parameter, and it now receives the application settings as a parameter. Remember that except for being inconvenient (we’ll deal with that later), that’s exactly what we wished for.
Functioning With Uncertainty
This is perhaps the most obvious example of all. Our third type were those that represent non-determinism: instead of one specific answer, they have many possible answers. This is easy enough: for each set A, define P(A) to be the power set of A, whose members are themselves sets of values of A. Then a non-deterministic function from A to B is just an ordinary function from A to P(B).
Functioning With Destruction
Our final trick is to deal with functions that have destructive effects. Here we’ll need to be a bit more elaborate in constructing a new range: for each set A, we define IO(A) (standing for input/output, which captures the notion of effects that interact with the rest of the world). An element of the set IO(A) is a list of instructions for obtaining a member of A. It is not a member of A, merely a way to obtain one, and that procedure might have any number of observable effects.
Now we play the same trick and change the range: a destructive function from A to B is just an ordinary plain old mathematical function from A to IO(B). In other words, if you give me an A, then as a plain old function I can’t actually do the steps to get a B, but I can certainly tell you what they are.
But what about composition? It’s great to be back in the world of plain functions, but remember what got us here in the first place? We liked functions because we liked composition; but it seems we’ve now lost composition! If I have a possibly failing function from A to B, and another from B to C, well now I’ve turned them into functions from A to Err(B) and then B to Err(C). Those function domains and ranges don’t match up, and I can’t compose them!
Hold Your Horses, Heinrich Kleisli to the Rescue!
Well, all is not lost. I just haven’t yet told you how to compose these “special” functions.
Because some math dude found these things before us, we call our “special” functions by a name: Kleisli arrows. There are two things going on here at once, so keep your eyes open: first, Kleisli arrows are just plain old ordinary functions, but with weird-looking ranges. Since they are just functions, you can compose them as functions, and that’s just fine. But at the same time, they are “special”, and we can compose them as Kleisli arrows, too.
Remember what we decided earlier? The right way to think about composition is by talking about a category. Sets are a category, and that’s fine if you want plain function composition. But now we want a new kind of category, too. It’s called the Kleisli category. If you don’t remember what all the parts of a category are, take a second to review them. To define a category, I need objects, arrows, identities, and composition.
- To keep things simple, the objects in this new category will be the same: they are just sets of things.
- The arrows in this category are, unsurprisingly, the Kleisli arrows.
- I haven’t told you yet what the identities and composition look like, so let’s do that next.
First, we look at failure. We’re given a failure Kleisli arrow from A to B, and one from B to C. We want to compose them into a Kleisli arrow from A to C. In other words, we have an ordinary function from A to Err(B), and a function from B to Err(C), and we want one from A to Err(C). Take a minute to think about what to do.
The central idea of error handling is that if the first function gives an error, then we should stop and report the error. Only if the first function succeeds should we continue on to the second function, and give the result from that (regardless of whether it’s an error or a success).
- If g(x) is an error, then (f · g)(x) = g(x).
- If g(x) is a success, then (f · g)(x) = f(g(x)).
To complete the definition of a category, we also need to decide about the identity Kleisli arrows. These are the ones that don’t do anything, so that if you compose them with any other Kleisli arrow, it doesn’t change the other one. Identities are functions from A to Err(A), and it turns out these are just the functions f(x) = x, just like for sets. Notice that means they never return an error; only a successful result.
I’ll run more briefly through the remaining three examples, but I encourage readers who aren’t yet clear on how this will work to write them out in more detail and use this opportunity to become more comfortable with defining a second category of Kleisli arrows.
Next we have Klesli arrows for dependence, which are functions from A to Pref(B). Recall that adding the Pref to the range is equivalent to adding a new parameter for the application preferences. The key idea here is that if I have two functions that both need to know the application preferences, I should give the same preferences to both. Then composing two of these Kleisli arrows just builds a new function that gets the extra preferences parameter, and passes the same one along to the two component parts. And identities? A Kleisli identity will get that extra preferences parameter, but will ignore it and just return its input anyway.
The Kleisli arrows for uncertainty, or non-determinism, are functions from A to P(B), the power set of B. The key idea for non-determinism is that at each stage, we want to try all possible values that might exist at this point, and collect the results from all of them. So the composition calculates the second function for each possible result of the first, then the resulting possibilities are merged together with a set union. The identities, of course, aren’t really non-deterministic at all, and just return one-element sets containing their input.
Finally, Kleisli arrows for destructive effects are functions from A to IO(B). The key idea here is to combine instructions by following them in a step-by-step manner: first do one, then the next. So the composition writes instructions to perform the first action, look up the second action from the result, and then perform the second action, in that order. A Kleisli identity here is just an instruction to do nothing at all and announce the input as the result. So for each of the four motivating examples, we created a new category, the Kleisli category.
These new categories have their own function-like things, and related ideas of composition and identities, that express the unique nature of each specific problem. By using the appropriate notion of composition in the right Kleisli category, you can solve any of these long-standing computer programming problems in a nice composable way.
And that’s why you should care about monads.
Monads?!? Oh yes, I should mention that we’ve just learned about monads. We simply forgot to use the word.
What’s This Got To Do With Monads?
This section is for those of you who want to know how the stuff we said earlier are related to monads as they are understood in mathematics. If you open Wikipedia, or most category theory textbooks, and look up monads, they won’t look very much like what we just did. You’ll see something about an endofunctor, and two natural transformation, and properties about commuting triangles and squares.
We haven’t talked about functors at all, much less natural transformations… so how could we have possibly learned about monads? It turns out there’s more than one way to describe monads. The one we’ve just gone through is an entirely valid one. The shifts we made to the ranges of our functions earlier — Err, Pref, P, and IO – are actually examples of monads. To make sure they are monads in the conventional math way, we’d have to work pretty hard: first, prove that they are functors. Then build two natural transformations called η and µ, and prove that they are natural. Finally, prove the three monad laws.
But wait, there’s an easier way! Heinrich Kleisli, whom we’ve already met from the categories earlier, pointed out that if you can build a category like the ones we did in the last section, whose arrows are just functions with a modified range, then your category is guaranteed to also give you a monad. That’s quite convenient, because as computer programmers, we tend to care a lot more about our Kleisli arrows than we do about a mathematician’s idea of monads. Remember, those Kleisli arrows are exactly the modified notion of functions that we were already using, long before we ever heard a word about category theory! And Kleisli tells us that as long as composition works the way we expect with our Kleisli arrows (namely, that it’s associative and the identities act like identities), then all that other stuff we’re supposed to prove to show we have a monad just happens for us automatically.
Still, it’s an interesting side question to look at the relationship between the two. I won’t give all the details, but I’ll give the structure, and then leave the interested reader with some familiarity with category theory to fill in the proofs of the relevant properties. We’ll use Err as our monad, just to pick a specific example, but nothing here is specific to Err.
- We start with Err, which is already a map from objects to objects. But the traditional definition of a monad also requires that it be a functor. That is, given a function f from A to B, I need a way to construct a function Err(f) from Err(A) to Err(B). I do it as follows: in the underlying category (not the Kleisli category, just the category of sets), I find an identity function from Err(A) to Err(A). Then I find a Kleisli identity from B to Err(B). I compose that Kleisli identity in the underlying category with f, and get a function from A to Err(B). I can now do a Kleisli composition of the identity from Err(A) to Err(A) and the function from A to Err(B), and get a function from Err(A) to Err(B). That’s the one I’ll call Err(f).
- Next, I need a natural transformation η, from the identity functor to Err. This is easy: the components of η are the Kleisli identities.
- Finally, I need a natural transformation µ from Err² to Err. To get the component of µ at A, I take the identity functions in the underlying category from Err (Err A) to Err (Err A), and then from Err A to Err A, and I combine them with Kleisli composition to get a function from Err (Err A) to Err A. This is the component of µ.
The construction in the opposite direction is easier. Given a monad Errwith ? and µ, the Kliesli category is constructed as follows.
- The identities are just the components of η.
- Given a function f from A to Err(B) and a function g from B to Err(C), I compose the two as µ · Err(g) · f.
Again, the details and the proofs of the appropriate monad and category laws are left to the reader. I hope this brief aside has been useful. I now return to using the word “monad” but talking about monads via Kleisli categories.
Joining The Monadic Revolution
Once again, let’s pause to sum up.
- Computer programmers like to work by composing some things together, which we call functions.
- They aren’t functions in the obvious way… but they do make up a category.
- Actually, they are functions after all, but only if you squint and change the ranges into something weirder.
- The category that they form is called a Kleisli category, and it’s basically another way of looking at monads.
- These monads / Kleisli categories nicely describe the techniques we use to solve practical problems.
It’s not just about those four examples, either. Those are typical of many, many more ideas about programming models that can be described in the same framework. I think it’s fair to sum up and say, at this point, that someone interested in studying and analyzing programming languages and models should be familiar with some ideas from category theory, and with monads in particular.
But still, what about the humble computer programmer, who is not designing a new language, is not writing research papers analyzing programming languages, but just wants to solve ordinary everyday problems? That’s a fair question. As long as monads remain just a mathematical formalism for understanding what computer programmers mean by functions, the practicing computer programmer has a good claim to not needing to understand them.
It’s becoming clear, though, that monads are on their way into practical programming concerns, too. In the past, these Kleisli arrows, the modified notions of “function” used by computer programmers, were built into our programming languages. Functions in C used a Kleisli arrow, and C++ functions used a different one. The language specification would tell us what is and what is not possible using a function in this language, and if we wanted something different, too bad. Maybe once a decade, we’d make the swap to a brand new programming language, and bask in the warm rays of some new language features for a while.
The Past: Error Handling
Consider the Err monad, which gave us functions that might fail and report their failure in structured ways. Modulo some details and extensions, this is basically structured exception handling. Looking to history, programmers worked without exception handling in their programming languages for many years. Of course, languages like C are all Turing complete, and can solve any possible computational problem, proper error handling included. But we don’t apply categories to think about possible computations; categories are for thinking about composition. Without exception handling in the notion of a “function” that’s provided by languages like C, programmers were left to do that composition by hand.
As a result, any C function that could fail had to indicate that failure using a return value. In many cases, conventional wisdom built up saying things like “return values are for indicating success or failure, not for giving back answers”. Coding conventions called for most if not all function calls to be followed with if statements checking for failure, and the resulting code was borderline unreadable. This was the heyday of flowcharts and pseudo-code, because no one expected to be able to understand real code at a glance! In reality, though, programmers only checked for errors when they thought they was possible, and a lot of errors went undetected. Programs were often unreliable, and likely untold billions of dollars spent on extra development work and troubleshooting.
What was the reason for this? It’s quite simple: the C programming language and others of its time provided an insufficient kind of Kleisli arrow! If their Kleisli arrow had included the functionality from the Err monad we defined above, this could have been avoided. But the notion of what a function means in C is fixed, so the answer was to deal with it, and eventually migrate to a different programming language, rewriting a lot of software, and likely costing another untold billions of dollars.
The Present: Global Variables and Context
What about the Pref monad, and others like it? As discussed earlier, this is about defining computations in a larger context of available information and state of the world.
In the past, we had global variables, the slightly more modern equivalent of just storing information at a known place in computer memory. Quick and dirty, but even 30 years ago, programmers knew they were the wrong answer, and wouldn’t be manageable for larger programs. Object oriented programming tried to alleviate the problem a little, by having functions run in a specific “object” that serves as their context, and that was implicitly passed around at least within the implementation of the object itself. To get this, everyone effectively had to change programming languages to get a better Kleisli arrow again. But even so, object-oriented languages don’t give a perfect answer to this problem.
The Near Future (/ Present): Purity, Effects, Parallelism, Non-Determinism, Continuations, and More!
This point is about the future, but I’ll start out by pointing out that everything here is already possible, but just requires an appropriate choice of programming language!
One current challenge for the computer programming community is finding effective ways to handle parallelism. Ironically, while past examples have focused on the problem of putting too little power into a language’s Kleisli arrow, the problem this time is too much! Plain (also known as “pure”) functions present lots of opportunities for parallelism. When code is executed in parallel, it may run faster, or if the parallelism is poorly designed it may even run slower, but in any case it will certainly still give the same answer. But when the Kleisli arrow incorporates destructive updates, that is no longer the case. Now parallelism is risky, and might give unexpected or incorrect results due to so-called race conditions.
We can’t just remove destructive updates from a language’s Kleisli arrow, though. A program that has no observable effects at all isn’t useful. What is useful is the ability to separate the portions of code that perform destructive update from those that just compute pure functions. So for the first time, we need a language with more than one kinds of Kleisli arrow, in the same language!
There is already at least one language that offers precisely this. Programmers in the Haskell language can build their own monads, and work in the Kleisli category of a monad of their choosing. The programming language offers a nice syntax for making this approach readable and easy to use. If something might fail, you can throw it in Err. If it needs access to the application settings, throw it in Pref. If it needs to do input or output, throw it in IO. Haskell web application frameworks and similar projects start by defining an appropriate monad with the appropriate features for that kind of application.
Another current trend in the computer programming community is toward building more domain-specific programming models. The language Erlang became popular specifically for providing a new programming model with advantages for parallelism. Microsoft’s .NET framework incorporates LINQ, which offers a programming model that’s better for bulk processing and querying of collections of data. Rails popularized domain-specific languages for web applications. Other languages offer continuations as a way to more easily build specify computations in a more flexible way. All of these are examples of working in new and different Kleisli arrows that capture exactly the model appropriate for a given task.
It comes down to this: If we believe that there is one single notion of “function” that is most appropriate for all of computer programming, then as practical programmers we can find a language that defines functions that way, and then forget about the more general idea of monads or Kleisli arrows as a relic of theoreticians. But it’s not looking that way. The programming community is moving quickly toward different notions of what a function means for different contexts, for different tasks, even for different individual applications. So it’s useful to have the language, the tools, and the intuition for comparing different procedural abstractions. That’s what monads give us.
Abstraction Over Monads
Using a language with a choice of monads offers some other advantages here, too. It gives us back our abstraction. In Haskell, for example, it’s possible to write code that is applicable in multiple different monads. A surprising amount of the programming done with one monad in mind actually has meaning in very different monads! For example, consider the following Haskell type:
sequence :: Monad m => [m a] -> m [a]
What this means is that for any monad, which we’ll call M, sequence converts from a list of values of M(A) into M(List(A)), the monad applied to lists themselves. Let’s take a minute to consider what this means for each of our four examples. For Err, it takes a list of results that might be failures, and if any of them are failures, it fails; but if not, then it gives back a list of all the results. It’s basically a convenient way to check a whole list of computations for a failure. For Pref, it takes a single set of application preferences, and distributes that to everything in the list, giving back a list of the results. For the power-set monad, P, it would take a list of sets, and give back a set of all the ways to choose one item from each set. And for IO, it takes a list of instruction cards, and gives back the single card with instructions for doing all of them in turn. Amazingly, this one function, which had only one implementation, managed to make sense and do something useful for all four of our examples of monads!
Along with a choice of monads comes the ability to abstract over that choice, and write meaningful code that works in any monad that you do end up choosing.
Between all of these forces, I predict that within the next ten years, software developers will be expected to discuss monads in the same way that most developers currently have a working vocabulary of design patterns or agile methodologies.
Beyond Monads: More Categorical Programming
While most of this has been about monads, I don’t want to leave anyone with the impression that monads are the only influence of categories in computer programming. All of the following ideas have found their way into programming practice, mostly (so far) within the Haskell programming language community because of its flexibility and a deep academic culture and history.
- Monad transformers are a powerful technique for combining the effects of more than one monad to build rich and powerful programming models.
- Functors and applicative functors (a.k.a. strong lax monoidal functors for mathematicians) are weaker than monads, but more widely applicable.
- Other kinds of categories that are not Kleisli categories can often be defined and composed to solve specific problems. Freyd categories are also useful.
I’ll stop there, but only as an encouragement to look more into the various abstractions from category theory that programmers have found useful. A good starting point is the (Haskell-specific) Typeclassopedia by Brent Yorgey. That’s just a door into the many possibilities of applying category-based ideas and intuitions in computer programming.
But I hope I was able to convey how these ideas aren’t just made up, but are actually the natural extension of what computer programmers have been doing for decades.
Just sharing what we’ve been talking about in my classes at LSV (Little School on Vermijo) the past few days. It all started when a math teacher I follow on Google+ posted a link to this video on the mathematics of juggling, by Cornell professor Allen Knutson.
So we all watched the video, and talked about it… and then about midnight last night, I realized this would make a perfect introduction to Gloss simulations, which we’re learning about in our Haskell programming class! Fast forward about 20 minutes of hacking, and we have this short program:
import Graphics.Gloss thePattern = [5,2,5,1,2] initial g = (-1, 0.0, , cycle thePattern) step dt (hand, time, balls, pattern) = (newhand, newtime, newballs, newpattern) where (throw, newtime) = properFraction (time + dt) newhand = if throw == 1 then -hand else hand thrown = if throw == 1 then [ (hand, newtime, head pattern) ] else  newpattern = if throw == 1 then tail pattern else pattern newballs = [ (bhand, btime + dt, height) | (bhand, btime, height) <- balls, btime + dt < fromIntegral height ] ++ thrown draw (hand, time, balls, pattern) = pictures [ juggler, pictures [ ball b | b <- balls ] ] ball (bhand, btime, height) = translate (50*x) (50*y) (circleSolid 10) where t = 1 - 2 * (btime / fromIntegral height) x = if even height then bhand else bhand * t y = if height < 3 then 0 else fromIntegral (height - 1) * (1 - t^2) juggler = pictures [ line [(-50, 0), (0, 25), (50, 0)], line [(-30, -100), (0, -50), (30, -100)], line [( 0, 25), (0, -50)], translate 0 50 (circle 25) ]
Feel free to check it out.
This isn’t the best code in the world, because I deliberately write it to avoid as many ideas as I can that we haven’t talked about in my Haskell class. Among those, for example, are any kind of user-defined data types, or any structured data other than tuples and lists! I’ve also used list comprehensions when maps and filters might be clearer, for the same reason: we’ve done list comprehensions in class! Those are coming soon, but for the time being, it’s not too hard to see what’s going on here. A couple notes about the world type:
- The state of world comprises four pieces of information: which hand will throw next, how much time has passed since the last throw, what balls are in the air, and what pattern is coming up next. Since we don’t have user-defined types and it makes the math easy, I’ve described hands as -1 for left, and +1 for right.
- The balls in the air form a list, but each element of that list is another tuple, containing the hand the ball was thrown from, how long it has been in the air, and its height.
So like all Gloss simulations, we need to specify three things:
- The initial state. This is the left hand, no time since the last throw, no balls in the air, and the pattern sitting ahead of us. The cycle function turns a finite list into an infinite repeating one.
- The step function. This is where most of the logic sits… there are a lot of bits, but it does what you’d expect! Every second, a ball is thrown, and the hand switches sides. We update the balls to add to their time in the air, and keep only the ones that haven’t landed. When a ball is thrown, we add it. And each time we drop a number from the pattern.
- The draw function. This draws the juggler and all of the balls. The balls are drawn with some faux-physicsy thing that, frankly, is just the result of fiddling and isn’t an accurate physics simulation. I can get away with that when teaching middle schoolers! Hey, at least they follow parabolas; just not all with the same acceleration. In class I just skimmed over the math for where the ball appears, since the class right now is about the state type, and drawing is something they’ve been doing all year anyway.
Definitely a lot of fun. I encourage you to copy and paste the code above into http://dac4.designacourse.com:8000/sim and fiddle with the patterns and see what they look like!
Liam O’Connor got me thinking about the best way to explain the idea of a pseudo-random number generator to new programmers. This post is my answer. If you already understand them, there won’t be anything terribly new here. That said, I enjoy clean examples even for easy ideas, so if you do too, then read on!
Note: The title may have caused some confusion. I’m not suggesting you use the trivial algorithms provided here for any purpose. Indeed, they are intentionally over-simplified to make them more understandable. You should read this as an explanation of the idea of how generating random numbers works, and then use the random number generators offered by your operating system or your programming language, which are far better than what’s provided here.
Suppose you’re writing a puzzle game, and you need to choose a correct answer. Or suppose you are writing a role-playing game, and need to decide if the knight’s attack hits the dragon or deflects off of its scales. Or you’re writing a tetris game, and you need to decide what shape is going to come next. In all three of these situations, what you really want is a random number. Random numbers aren’t the result of any formula or calculation; they are completely up to chance.
Well, here’s the sad truth of the matter: computers can’t do that. Yes, that’s right. Picking random numbers is one of those tasks that confound even the most powerful of computers. Why? Because computers are calculating machines, and we just said that random numbers aren’t the result of any calculation!
Of course, you’ve probably played games on a computer before that seem to pick numbers at random, so you may not believe me. What you’re seeing, though, aren’t really random numbers at all, but rather pseudo-random numbers. Pseudo-random numbers are actually the result of a mathematical formula, but one designed to be so complicated that it would be hard to recognize any pattern in its results!
Writing a Pseudo Random Number Generator
A lot of smart people actually spend a lot of time on good ways to pick pseudo-random numbers. They try a bunch of different complicated formulas, and try to make sure that patterns don’t pop up. But we can build a simple one pretty easily to pick pseudo-random numbers from 1 to 10. Here it is, in the programming language Haskell:
random i = 7 * i `mod` 11
Since it’s a function, it needs to have an input. It then multiplies that input by 7, and then finds the remainder when dividing by 11. We’ll give it the previous number it picked as input, and it will give us back the next one. Suppose we start at 1. Then we get the following:
random 1 -> 7 random 7 -> 5 random 5 -> 2 random 2 -> 3 random 3 -> 10 random 10 -> 4 random 4 -> 6 random 6 -> 9 random 9 -> 8 random 8 -> 1
Let’s look at the range of answers. Since the answer is always a remainder when dividing by 11, it’ll be somewhere between 0 and 10. But it should be pretty easy to convince ourselves that if the number we give as input is between 1 and 10, then 0 isn’t a possibile answer: if it were, then we’d have found two numbers, both less than 11, that multiply together to give us a multiple of 11. That’s impossible because…. 11 is prime. So we’re guaranteed that this process picks numbers between 1 and 10. It seems to pick them in a non-obvious order with no really obvious patterns, so that’s good. We appear to have at least a good start on generating random numbers.
Notice a couple things:
- We had to pick somewhere to start. In this case, we started out by giving an input of 1. That’s called the seed. If you use the same seed, you’ll always get the exact same numbers back! Why? Because it’s really just a complicated math problem, so if you do the same calculation with the same numbers, you’ll get the same result.
- To get the next number, we have to remember something (in our case, the last answer) from the previous time. That’s called the state. The state is important, because it’s what makes the process give you different answers each time! If you didn’t remember something from the last time around, then you’d again be doing the same math problem with the same numbers, so you’d get the same answer.
Doing Better By Separating State
Unfortunately, our random number generator has a weakness: you can always predict what’s coming next, based on what came before. If you write tetris using the random number generator from earlier, your player will soon discover that after a line, they always get an L shape, and so on. What you really want is for your game to occasionally send them a line followed by a T, or even pick two lines in a row from time to time!
How do we do this? Well, the next answer that’s coming depends on the state, so our mistake before was to use the previous answer as the state. The solution is to use a state that’s bigger than the answer. We’ll still be looking for random numbers from 1 to 10, but let’s modify the previous random number generator to remember a bigger state. Now, since state and answer are different things, our random function will have two results: a new state, and an answer for this number.
random i = (j, ans) where j = 7 * i `mod` 101 ans = (j - 1) `mod` 10 + 1 -- just the ones place, but 0 means 10
That says take the input, multiply by 7, and find the remainder mod 101. Since 101 is still prime, this will always give answers from 1 to 100. But what we really wanted was a number from 1 to 10, just like the one we had before. That’s fine: we’ll just take the ones place (which is between 0 and 9) and treat 0 as 10. The tens place doesn’t really change the answer at all, but we keep it around to pass back in the next time as state.
Let’s see how this works:
random 1 -> ( 7, 7) random 7 -> (49, 9) random 49 -> (40, 10) random 40 -> (78, 8) random 78 -> (41, 1) random 41 -> (85, 5) random 85 -> (90, 10) random 90 -> (24, 4) random 24 -> (67, 7) random 67 -> (65, 5) random 65 -> (51, 1)
Excellent! Now instead of going in a fixed rotation, some numbers are picked several times, and some haven’t been picked yet at all (but they will be, if we keep going), and you can no longer guess what’s coming next just based on the last number you saw. In this random number generator, the seed was still 1, and the state was a number from 1 to 100.
People who are really interested in good random numbers sometimes talk about the period of a pseudo-random number generator. The period is how many numbers it picks before it starts over again and gives you back the same sequence. Our first try had a period of 10, which is rather poor. Our second try did much better: the period was 100. That’s still pretty far off, though, from the random number generators in most computers, the period of which can be in the millions or billions.
Real World Pseudo-Random Number Generators
Our two toy pseudo-random number generators were fun, but you wouldn’t use them in real programs. That’s because operating systems and programming languages already have plenty of ways to generate pseudo-random numbers. And those were created by people who probably have more time to think about random numbers than you do! But some of the same ideas come up there. For example, consider this (specialized) type signature for the random function in the Haskell programming language:
random :: StdGen -> (Int, StdGen)
Look familiar? StdGen is the state, and choosing a random Int gives you back the Int, and a new StdGen that you can use to get more pseudo-random numbers! Many programming languages, including Haskell, also have “global” random number generators that remember their state automatically (in Haskell, that is called randomIO), but under the covers, it all comes down to functions like the ones we’ve written here… except a lot more complex.
Where To Get a Seed
We’ve still left one question unanswered: where does the seed come from? So far, we’ve always been using 1 for the seed, but that means that each time the program runs, it will get the same numbers back. So we end up with a similar situation to what we saw before, where players will realize that a game starts with the same sequence of random events each time.
To solve this problem, the seed should come from somewhere that won’t be the same each time. Here are two different ways to seed a random number generator.
- Mostly, pseudo-random number generators are seeded from a clock. Imagine if you looked at the second hand on a clock, used it to get a number from 1 to 60, and used that for your seed. Then the game would only act the same if it started at the same number of seconds. Even better, you could take the number of seconds since some fixed time in the past, so you’d get an even bigger difference in seeds. (Entirely by coincidence, computers often use the number of seconds since January 1, 1970.)
- You might try to get a good seed from details of the way the user uses your program. For example, you can look at the exact place the user first clicks the mouse, or exactly how much time passes between pressing keys. They will most likely not be exact, and click a few pixels off or type ever so slightly slower, even if they are trying to do exactly the same thing. So, again, you get a program that acts differently each time. This is called using entropy.
Most of the time, using the computer’s built-in clock is okay. But suppose you’re making up a code word. It would be very bad if someone could guess your code word just by knowing when you picked it! (They would also need to know how your computer or programming language picks random numbers, but that’s not normally kept secret; they can probably find that out pretty easily.) Computer security and privacy often depends on picking unpredictable random numbers — ones that people snooping on you won’t be able to guess. In that case, it’s important that you use some kind of entropy, and not just the clock. In fact, when security is at stake, you can use entropy to modify the state as well, to make sure things don’t get too predictable. Most operating systems have special ways of getting “secure” random numbers that handle this for you.
Another example of entropy:
If you play the game Dragon Warrior for the Nintendo, but use an emulator instead of a real Nintendo, then you can save a snapshot of your game before you fight a monster, memorize what the monsters are going to do, and figure out exactly the right way to respond. When you load the game from the snapshot and try again, as long as you do the same things, the monster will respond in exactly the same way! That’s because the snapshot saves the state of the random number generator, so when you go back and load from the snapshot, the computer picks the same random numbers. So if a fight against a monster is going well but you make a disastrous move at the end, you can load your snapshot and repeat the exact same fight up to that point.
The same trick doesn’t work in Dragon Warrior 2 (or later ones), though! Why not? Because the company that makes the game started using entropy in their sequel. So now little things like exactly how long you wait between pressing buttons will change the game. Since you can’t possibly time everything exactly the same down to hundredths or thousandths of a second, the task is hopeless, and you have to just take your chances and trust to luck.
So as you can see, random numbers can become a very tricky topic. But ultimately it’s all just a complicated formula, a seed, and a state.