Skip to content
August 13, 2011 / cdsmith

Arrow = Category + Applicative? (Part IIa)

I’m labeling this part (II a), because I haven’t got everything I wanted to finish for this piece, but there are enough interesting ideas that it’s worth posting something, and enough time has passed that I’m getting people asking about the next part!

Summary So Far

In the previous installment of this series, we established a one-way relationship between Applicative and Arrow, while assuming a common Category instance.  In particular, we assumed a data type

data a :~> b = ...

and instances

instance Category (:~>) where
    id    = ...
    (.)   = ...
instance Arrow (:~>) where
    arr   = ...
    first = ...

and satisfying the axioms

[C1] f                            = f . id
                                  = id . f
[C2] f . (g . h)                  = (f . g) . h
[A1] arr   id                     = id
[A2] arr   (f . g)                = arr f   . arr g
[A3] first (f . g)                = first f . first g
[A4] first (arr f)                = arr (f `cross` id)
[A5] first f . arr (id `cross` g) = arr (id `cross g) . first f
[A6] arr fst . first f            = f       . arr fst
[A7] arr assoc . first (first f)  = first f . arr assoc

And we showed that we can always write the following Applicative instance, universally quantified over any domain for the Category:

instance Applicative ((:~>) a) where
    pure x  = arr (const x)
    f <*> x = arr (uncurry (flip ($)))
            . first x
            . arr swap
            . first f
            . arr dup

And, furthermore, that it will automatically satisfy all four of the axioms for Applicative functors.

[F1] v                 = pure id <*> v
[F2] u <*> (v <*> w)   = pure (.) <*> u <*> v <*> w
[F3] pure f <*> pure x = pure (f x)
[F4] u <*> pure y      = pure ($ y) <*> u

This demonstrates that Arrow is at least as specific as the combination of Category and a universally quantified Applicative instance: if you have the first, then you automatically have the second as well.  We now want to explore the opposite direction…

The Inverse Correspondence

We will need a specific Arrow instance now to work with, defined in terms of an arbitrary Applicative instance.  Here’s the one we’ll be using:

instance Arrow (:~>) where
    arr f   = pure f <*> id
    first f = pure (,) <*> f . arr fst <*> arr snd

I’ll again try to convince you that these definitions make logical sense.  Keep in mind the following,

pure f <*> x = f <$> x = fmap f x

So the definition of arr is very general: it doesn’t even really need Applicative — just Functor.  That ought to be comforting, since in a sense all that Functor says is that we can naturally translate from pure Haskell functions to a different context.  It’s not surprising that there would be a strong connection between the lifting of functions given by fmap, and that given by Category and Applicative.  Indeed, arr is that connection.  The Category “id” value on the right of the fmap serves to tie the input from the Category with the input to the function f, giving precisely what we want: a lifting of f into the Category.

Another way to understand arr’s definition is to recall that in the ((->) t) functor, whose objects are just Haskell functions from a fixed domain type, fmap is precisely function composition.  The functor we have here looks very close to that one, and indeed, fmap has a similar meaning.  The difference is that in our applicative functor, fmap composes pure Haskell functions not with other functions, but with morphisms in our Category, giving a morphism as the result.  Lifting a function is done by composing it with the identity morphism.

The definition of first is basically self-evident if you’re familiar with Applicative.  If you’re confused, note that (.) binds more tightly than <*>… we’ll be using that a lot.  So this just unpacks a tuple, applies the given Category element on the left, and packs it back up using the tuple constructor.


Our stated goal was that by defining the correspondence above, things will just work out and we’ll have established that all of the axioms for Arrow hold.  This should be easy to check!  The Applicative laws give us a really nice tool to reason about equivalence of two expressions.  You may note that I’ve stated them slightly differently here, versus the previous post, by swapping the sides of a few of them.  You should read the four axioms [F1] through [F4] as having a natural left to right direction to them, and by applying each of them as appropriate, we can take any Applicative expression and obtain a normal form:

pure f <*> x_1 <*> x_2 <*> ... <*> x_n

Here, none of x_1 through x_n contain any applicative building blocks (pure or <*>, or any of the constants built from them like fmap, etc.)  Category also has such a normal form, where any expression from a category can be written as either a single identity, or a sequence of composed morphisms, with no other identities or composition occurring in the pieces:

f_1 . f_2 . ... . f_n

What about mixtures of the two?  We have no laws at all that allow us manipulate expressions across the two types, so we’re stuck with the pieces as they are, simplifying within each piece. So to verify the axioms, we simply substitute the definitions of arr and first from the previous section, reduce the Applicative and Category pieces to their respective normal forms, and compare.  Easy, right?

It’s left as an exercise for the reader to try this, but the result is that only the trivial axiom [A1] can be verified in this way.  The remaining Arrow laws do not follow from anything we’ve seen so far. It’s not just that we aren’t clever enough to do it: because both Applicative and Category have normal forms, and there’s no mixing of the two possible, it’s simply not possible that any more of the Arrow laws might be shown to hold all the time from those laws alone!  (This in spite of the fact that numerous source, such as here, explicitly claim otherwise!)

Dodging a Dead End Sign: New Laws!

In a strict sense sense, this is the end of the line: our original goal has conclusively failed.  That said, it’s not so surprising that it failed, and we can still learn a lot by redefining our goal.  We now seek to decide precisely what the difference is.

That is, can we state a (hopefully minimal) set of additional laws we can apply to the Applicative/Category combination, such that we can get all of the Arrow axioms? We’re looking for things that obviously ought to hold for all types that are simultaneously instances of Category and Applicative.  Of course, we’ll also have to go back and show that they follow from the Arrow axioms as well, or we risk losing the bidirectional correspondence.  It’s worth noting that Ross Patterson speculated about such axioms, and that while my axioms look a bit different, if you dig a bit they turn out to very very similar)

Here are the set of axioms I’ll propose:

[G1] u = pure const <*> u <*> id
[G2] u <*> id <*> id = pure join <*> u <*> id
[G3] arr f . u = pure f <*> u
[G4] (u <*> v) . arr f = u . arr f <*> v . arr f

A couple quick notes.  First of all, you might notice that the Arrow term, arr, appears in a couple of them.  That’s okay, though: we don’t mean arr in the Arrow sense.  We mean it in the sense of the definition earlier, as an fmap applied to the identity from the Category.  It turns out that arr plays a unique role in the structure we get from combining Category and Applicative.  It’s a sort of intermediate level of purity between “pure” elements and arbitrary elements.  The elements in the image of arr are pure in the sense of acting like plain functions in the Category, but they depend on the domain type from the Category, making them not quite “pure” in the Applicative sense.  It’s useful to make statements about elements with that half-purity property.

Second, you might notice a use of “join” in [G2], which is of course a Monad term!  Never fear, though, we mean join specialized to the ((->) r) monad, which is just the function:

join f x = f x x

So no worries, we haven’t let monads creep in quite yet!  The connection here is very strong though, and is strengthened by noting that “const” is similarly just “return” specialized to the ((->) r) monad, and together, return and join completely determine the monad structure on a type!  So in a sense, laws [G1] and [G2] are relating the structure we’ve got to monad terms… but the monad is the standard function monad instead of one defined on this type.

Having gained a little insight into [G1] and [G2], we now look at the laws [G3] and [G4].  These laws address the question of what happens when we compose an arr value, either on the left or right.  Essentially, arr composed on the left can be rewritten in terms of application, while on the right it distributes.

Let’s Prove Some Arrow Laws

One of the Arrow laws is so obvious, there’s no point in putting off its proof any longer; it turns out to be exactly equivalent to the first Applicative axiom, and is the one Arrow law that didn’t require our new assumptions:

arr id = pure id <*> id = id

Now, before we set out to methodically prove that Arrow axioms hold, it pays to look around and take stock of the situation with the new laws one at a time.

We can start out by generalizing [G1] and [G2] a bit.  As stated, these laws require identities on the right-hand side, but it turns out they actually only need semi-pure (that is, arr) values.  To see this, we’ll first prove a little lemma, which actually only requires the Applicative laws:

u <*> arr f = u <*> (pure f <*> id)
            = pure (.) <*> u <*> pure f <*> id
            = pure ($ f) <*> (pure (.) <*> u) <*> id
            = pure (.) <*> pure ($ f) <*> pure (.) <*> u <*> id
            = pure (($ f) . (.)) <*> u <*> id
            = pure (. f) <*> u <*> id

With that in our toolkit, we generalize [G1] and [G2] in the obvious way, just by moving arr values from the right to the left, and then moving the parentheses left.  It’s a tad cumbersome, but pretty obvious when you see what’s going on.

pure const <*> u <*> arr f
    = pure (. f) <*> (pure const <*> u) <*> id
    = pure (.) <*> pure (. f) <*> pure const <*> u <*> id
    = pure ((. f) . const) <*> u <*> id 
    = pure const <*> u <*> id
    = u
u <*> arr f <*> arr f
    = pure (. f) <*> (u <*> arr f) <*> id
    = pure (. f) <*> (pure (. f) <*> u <*> id) <*> id
    = pure (.) <*> pure (. f) <*> (pure (. f) <*> u) <*> id <*> id
    = pure ((.) (. f) <*> (pure (. f) <*> u) <*> id <*> id
    = pure (.) <*> pure ((.) (. f)) <*> pure (. f) <*> u <*> id <*> id
    = pure ((.) (. f) . (. f)) <*> u <*> id <*> id
    = pure join <*> (pure ((.) (. f) . (. f)) <*> u) <*> id
    = pure (.) <*> pure join <*> pure ((.) (. f) . (. f)) <*> u <*> id
    = pure (join . (.) (. f) . (. f)) <*> u <*> id
    = pure ((. f) . join) <*> u <*> id
    = pure (.) <*> pure (. f) <*> pure join <*> u <*> id
    = pure (. f) <*> (pure join <*> u) <*> id)
    = pure join <*> u <*> arr f

These identities can be very useful: they avoid the need to explicitly “stash away” semi-pure values on the right when applying the first two laws.  Note that the values on the right obviously need to be semi-pure, since otherwise we could be removing or duplicating effects (where “effects” here has the appropriate meaning for the particular instance you’re looking at).

Another convenient thing to note is that anywhere we use “arr” to indicate a semi-pure value, of course a pure value works too!  The following mini-identity makes this explicit:

pure x = pure const <*> pure x <*> id
       = pure (const x) <*> id
       = arr (const x)

We can now turn to the [G3] law, and see what insights it has to offer.  The first is actually one of the Arrow laws, but it’s also quite useful in reasoning about other things as well:

arr (f . g) = pure (f . g) <*> id
            = pure (.) <*> pure f <*> pure g <*> id
            = pure f <*> (pure g <*> id)
            = pure f <*> arr g
            = arr f . arr g

As an immediate consequence, we can simplify compositions involving pure and semi-pure values.

pure x . arr f = arr (const x) . arr f
               = arr (const x . f)
               = arr (const x)
               = pure x

A more broadly applicable set of identities lets us work with applications between pure values and compositions (of any values at all):

pure f <*> u . v = arr f . (u . v)
                 = (arr f . u) . v
                 = (pure f <*> u) . v
u . v <*> pure f = pure ($ f) <*> u . v
                 = (pure ($ f) <*> u) . v
                 = (u <*> pure f) . v

So applying a pure value to a composition, or a composition to a pure value, is the same as doing the application only to the first of the two values that are composed.

We also have the tools now to easily establish another of the Arrow laws.  Since we’ve done it so many times by now, I’ll start applying the composition law from Applicative and combining the resulting pure expressions on the left in one step.

arr fst . first f
    = pure fst <*> first f
    = pure fst <*> (pure (,) <*> f . arr fst <*> arr snd)
    = pure ((.) fst) <*> (pure (,) <*> f . arr fst) <*> arr snd
    = pure (((.) fst) . (,)) <*> f . arr fst <*> arr snd
    = pure const <*> f . arr fst <*> arr snd
    = f . arr fst

For the next step, it helps to consider what happens when you apply (with <*>) one semi-pure value to another.  The answer is reassuringly logical: Applicative’s <*> combinator can be specialized to functions, and the result goes by several names… Monad’s “ap”, for example, and combinatory logic’s S combinator.  Let’s use “ap” to describe the function we’re looking for:

(f `ap` g) x = f x (g x)

Then we have

arr f <*> arr g = arr f <*> (pure g <*> id)
                = pure (.) <*> arr f <*> pure g <*> id
                = arr (.) . arr f <*> pure g <*> id
                = pure ($ g) <*> arr (.) . arr f <*> id
                = arr ($ g) . arr (.) . arr f <*> id
                = arr (($ g) . (.) . f) <*> id
                = pure (($ g) . (.) . f) <*> id <*> id
                = pure join <*> pure (($ g) . (.) . f) <*> id
                = pure (f `ap` g) <*> id
                = arr (f `ap` g)

That turns out to be the hard work in establish our fourth Arrow law.

first (arr f) = pure (,) <*> arr f . arr fst <*> arr snd
              = arr (,) . arr f . arr fst <*> arr snd
              = arr ((,) . f . fst) <*> arr snd
              = arr (((,) . f . fst) `ap` snd)
              = arr (f `cross` id)

Not bad!  Four Arrow laws down, three to go, and we haven’t even used our shiny new [G4] law yet.  We can fix that, though, and prove a fifth Arrow law in the process:

first f . arr (id `cross` g)
    = (pure (,) <*> f . arr fst <*> arr snd) . arr (id `cross` g)
    = pure (,) . arr (id `cross` g)
      <*> f . arr fst . arr (id `cross` g)
      <*> arr snd . arr (id `cross` g)
    = pure (,) <*> f . arr (fst . (id `cross` g)) <*> arr (snd . (id `cross` g))
    = pure (,) <*> f . arr fst <*> arr (g . snd)
    = pure (,) <*> f . arr fst <*> arr g . arr snd
    = pure (,) <*> f . arr fst <*> (pure g <*> arr snd)
    = pure (.) <*> (pure (,) <*> f . arr fst) <*> pure g <*> arr snd
    = pure ((.) . (,)) <*> f . arr fst <*> pure g <*> arr snd
    = pure ($ g) <*> (pure ((.) . (,)) <*> f . arr fst) <*> arr snd
    = pure (($ g) . (.) . (,)) <*> f . arr fst <*> arr snd
    = pure ((.) (id `cross` g) . (,)) <*> f . arr fst <*> arr snd
    = pure ((.) (id `cross` g)) <*> (pure (,) <*> f . arr fst) <*> arr snd
    = pure (id `cross` g) <*> (pure (,) <*> f . arr fst <*> arr snd)
    = pure (id `cross` g) <*> first f
    = arr (id `cross` g) . first f

Of course, this series just wouldn’t be itself if we didn’t have a long ugly proof in there somewhere, so here it is: the proof of [A7].

arr assoc . first (first f)
    = pure assoc <*> first (first f)
    = pure assoc <*> (pure (,) <*> first f . arr fst <*> arr snd)
    = pure ((.) assoc) <*> (pure (,) <*> first f . arr fst) <*> arr snd
    = pure ((.) assoc . (,)) <*> first f . arr fst <*> arr snd
    = pure (. snd) <*> (pure ((.) assoc . (,)) <*> first f . arr fst) <*> id
    = pure ((. snd) . (.) assoc . (,)) <*> first f . arr fst <*> id
    = pure ((. snd) . (.) assoc . (,))
      <*> (pure (,) <*> f . arr fst <*> arr snd) . arr fst <*> id
    = pure ((. snd) . (.) assoc . (,))
      <*> (pure (,) . arr fst <*> f . arr fst . arr fst
      <*> arr snd . arr fst) <*> id
    = pure ((. snd) . (.) assoc . (,))
      <*> (pure (,) . arr fst <*> arr fst . arr fst . first (first f)
      <*> arr snd . arr fst) <*> id
    = pure ((. snd) . (.) assoc . (,))
      <*> (pure (,) <*> arr (fst . fst) . first (first f)
      <*> arr (snd . fst)) <*> id
    = pure ((.) ((. snd) . (.) assoc . (,)) . (,))
      <*> arr (fst . fst) . first (first f) <*> arr (snd . fst) <*> id
    = pure (. (snd . fst)) <*> (pure ((.) ((. snd) . (.) assoc . (,)) . (,))
      <*> arr (fst . fst) . first (first f)) <*> id <*> id
    = pure (.) <*> pure (. (snd . fst))
      <*> pure ((.) ((. snd) . (.) assoc . (,)) . (,))
      <*> arr (fst . fst) . first (first f) <*> id <*> id
    = pure ((. (snd . fst)) . (.) ((. snd) . (.) assoc . (,)) . (,))
      <*> arr (fst . fst) . first (first f) <*> id <*> id
    = pure (\x y z -> (x, (snd (fst y), snd z)))
      <*> arr (fst . fst) . first (first f) <*> id <*> id
    = pure join <*> (pure (\x y z -> (x, (snd (fst y), snd z)))
      <*> arr (fst . fst) . first (first f)) <*> id
    = pure (\((a,b),c) ((d,e),f) -> (a, (e, f))) <*> first (first f) <*> id
    = pure (. (snd `cross` id))
      <*> (pure (,) <*> (pure (fst . fst) <*> first (first f))) <*> id
    = pure (. (snd `cross` id))
      <*> (pure (,) <*> arr (fst . fst) . first (first f)) <*> id
    = pure (. (snd `cross` id))
      <*> (pure (,) <*> f . arr (fst . fst)) <*> id
    = pure (,) <*> f . arr (fst . fst) <*> arr (snd `cross` id)
    = pure (,) . arr assoc <*> f . arr fst . arr assoc <*> arr snd . arr assoc
    = (pure (,) <*> f . arr fst <*> arr snd) . arr assoc
    = first f . arr assoc

Not pretty, by any stretch of the imagination, but it’s done.

To be continued…

Quite a bit remains here: we still have one Arrow law remaining, we need to show that our four new Category+Applicative laws follow from the Arrow axioms in the other direction, and we need to show that the maps we’ve defined between Applicative and Arrow are inverse to each other.  With luck, this and more will come in the next exciting installment.

August 7, 2011 / cdsmith

An Invitation for Kids to Learn Programming

I wrote in my last entry about the class I’m teaching this school year for kids learning programming in the Haskell programming language.  After the response there, I talked it over with people involved, and we decided to extend an invitation to other children to be “virtually” part of the class!

If you:

  1. Know a child who would be interested in learning programming, and
  2. Know something about computer programming, and know or are willing to learn the basics of Haskell

Then I’m inviting you and the child you know to join our journey for the year!  You’d be volunteering to do the teaching, but at least we can have a group of kids all learning at the same time.

Here’s what that means.

The Projects

We’ll work throughout the year on a sequence of increasingly involved projects involving computer graphics, ranging from drawing pictures to creating computer games.  I went into a lot more detail about the plan in the previous entry, so I encourage you to go back and read it if you didn’t before.

The key point here is that we want to provide several opportunities for kids:

  • The thrill of building something they are proud of, and getting a computer do what they want, rather than the other way around.
  • Practice in abstract thinking that will prepare them for mathematics, logic, and life in general.

It’s not really a goal at all to prepare students for a computer programming career, nor to teach the programming language we’ll be using (Haskell) in any comprehensive sense.

The Community

First of all, everything we do in the class will be summarized on this blog, in posts in the “Haskell for Kids” category.  My students will be assigned to come leave comments on each post with questions, comments, or thoughts they have about what we did that week.  We encourage you and the child you’re working with to do the same.  Please respond to each other’s posts, and talk about what you find interesting as well.  Our hope is that not only will everyone learn programming, but that we’ll also be able to build a community of kids who know each other and care about each other’s accomplishments.

One of the things we’ll be doing the first week (which is next week.., wow!) is recording videos where students introduce themselves and talk about what they hope to get out of this.  We’ll put these on YouTube, and I’ll embed them in the first blog post.  Any other children participating are encouraged to do the same.  If you leave a link to your own video, I can embed it in that post as well!

I tend to be in favor of letting people have awesome experiences, rather than obsessing about safety… but a few common-place restrictions are needed here.  Please don’t post comments in the Haskell for Kids category with profanity, insults, etc.  Let’s be a supportive community.  Please also make sure kids don’t post phone numbers, addresses, or other contact information that really could compromise their safety.  I’ll be watching comments on these posts a little more closely than the rest of this blog, and  I will try to actively remove comments that are problematic, more than I do elsewhere.

Sharing Projects

One of the really cool aspects of this is that we’ll be building stuff — in the beginning pictures, then animations, and finally games!  It’s way more fun to share the stuff you’re building with others.  So I’ve set up a version control repository for all the kids’ projects.  The goal here is to let people easily get, compile, and try out the projects of others.  I’ll be happy to add anyone else that’s participating as a committer, so that you can add your students’ projects to the group, too; just ask in a reply to your child’s comment introducing themselves, and ask.

For anyone that doesn’t want to figure out how to set up darcs or use version control, you’re also welcome to email me your child’s source code in a zip file, and I’ll handle getting it checked in; but there might be delays of a day or two before others can get your code if you do it that way.


Every kid that’s participating will have a teacher: for the kids in the physical class, that’s me.  For the kid you know, that would be you!  I considered trying to set up a way for all of us teachers to talk to each other about lesson plans, feedback on what to do and what to spend more time on… but then I realized that there’s no real reason to be secretive about it!  We can use blog comments as well, and if kids read it and find it interesting, so much the better!

For rare (hopefully nonexistent) occasions where we really do need adult-only conversation, a lot of the people involved in this are already using Google+, so we can coordinate building up a mutual circle for adults involved in this project, and communicate there… if you want to be part of this, we’ll coordinate it after kids do introductory posts next week.

What Ages?

The students I’m teaching are in the 7th grade in U.S. terms, so about 12 years old, turning 13 during the school year.  Kids younger than that are always welcome to participate, as long as you (“you” meaning the responsible adult) feel like they will be able to keep up, or at least that it’s worth a try.  Kids older than that are welcome up to a point…

That point is defined by the idea that we want this to be a group of peers, and something that kids are doing for their personal development and fun.  I don’t want to set an age limit, but if I did, maybe it would be 14-15.  In the end, though, it depends on personality: some teens are happy to learn alongside younger kids and interact as peers, and some are not so happy about it.  And, if an older teen is learning programming as a possible job prospect, I wish them well but they are also not likely to find this the best way to do that.  You presumably know the child you’re thinking about, so feel free to make a judgement call!

Of course, children of any age — even those that are middle-aged, or collecting pensions — are welcome to read the blog, and pipe in with helpful comments and encouragement!

What Next?

My first in-person class will meet a week from Tuesday.  Sadly, a fair amount of the first two weeks is likely to be taken up with getting software installed: the Haskell Platform, for example, and a text editor, and becoming accustomed to using the console, changing directories, and typing commands.  We’ll also talk about the high-level ideas of compilers and programming languages and how they fit together, and play around with using GHCi as a calculator and with entering and running simple programs.

You can expect the first “real” blog post (the first one that follows an actual class) to appear Wednesday of next week, and that’s when we’ll ask kids to start commenting.  See you then!

August 3, 2011 / cdsmith

Haskell For Kids! Introduction

In a couple weeks, I’ll be posting a blog entry once a week called “Haskell for Kids”.  This is an introduction to what that’s all about.

What Is It, and Why?

I have the excellent chance this school year to teach Haskell to a group of children aged 12 to 13 years old!  I’m very excited about it.  I’m teaching the class as part of the Little School on Vermijo, which is a neighborhood school taught by my neighbor, Sue.  I love this concept: the school meets in her living room, it’s a very small class, and a lot of people in the neighborhood are pitching in with subjects here and there.  Last year, I taught a once-a-week mathematics class, which was really great.  This year, I’m trying programming for two hours per week.

Haskell?  Really?

Yes, I picked Haskell for a programming language.  That will be controversial, of course, but I think I have good reasons.

  • It’s something I’m excited about!  I think that’s bigger than a bunch of technical details: it’s important for everyone to teach what you’re excited about, and pass that excitement to others.
  • Ben Lippmeier’s Gloss library is an excellent educational tool.  It’s a library for building graphics-related programs with a high-level, composable, and semantically meaningful API.
  • I think Haskell is the best environment to learn and practice abstract patterns of thinking that will be useful in mathematics, logic, and a clear understanding of the world.  You see, I don’t care whether anyone comes out of this class headed for a computer programming career.  If they’ve got their heart set on being a poet, I hope to have contributed something valuable to their lives, too.

The objections I’ve heard when I’ve discussed the plan with others come down to Haskell being “hard”.  I think that’s an oversimplification.  If you look at the reasons Haskell is considered “hard”, a lot of it comes down to: (a) it’s different from other languages that people are used to, and (b) people are doing hard things with it.  Neither objection applies in this class.

That’s not to say that Haskell ought to be everyone’s choice.  If you’re looking at teaching programming to children, I think the most important thing is to avoid incidental complexity.  Any language that lets you get right to talking about what you want to do is suitable.  If I weren’t doing this in Haskell, I’d be choosing Python.  But excitement matters, so Haskell it is.

What’s Blogs Got To Do With It?

The actual class will be happening two mornings a week on Tuesday and Wednesday.  In the process of planning, though, I ran into the inevitable problem that there aren’t really any textbooks for teaching Haskell programming to 12 year olds!  So in lieu of a textbook, here’s my plan:

  1. Of course, I’ve done some preparation: made an outline of several dozen topics, and mapped out the dependencies between them, and ideas for exercises in Gloss that might accompany each one.  I don’t intend to share this, though, because I want to avoid locking in to a pace or syllabus before we even get started.
  2. Each week, directly after that week’s two classes, I’ll write up my notes of what we talked about, and publish them in that weeks’ Haskell for Kids blog entry as a review.
  3. My students will be assigned to come comment with specific questions they have, or with comments on things they found interesting, additional ideas that came to mind, or what they were able to do with their programs that seemed cool.

I hope to accomplish a couple things with that structure.

The first, and least interesting, is to have a written source for anyone in the class that wants to go back and review topics from the past.  To be honest, I hope this isn’t very common.  If I’ve got people needing to constantly go back and copy code from past lessons, then I think that’s a pretty good indication I’m going too fast!  That said, though, it will be nice to have the notes available, especially for students who are absent or just find it useful to go back and read it again.

The second, and more important, thing I hope to accomplish is to connect students in this class with a larger community.  Anyone else reading this blog, whether you’re learning Haskell or a part of the community already, is encouraged to jump in the conversation and participate.  I know I’ve talked to at least three or four people, from when I originally asked around for experiences with this through now, who wanted to be kept up to date on how it goes, or even follow along.  This is your chance!

A Preview of the Class

I’m organizing the class around the structure of Ben’s Gloss library, which happens to be related to a nice progression of programming ideas as well!  So the best way to explain the structure will be to mention the four entry points in Gloss:


This is the simplest of the four Gloss entry points: you give it a picture, and it displays the picture in a window.  We’ll use it to look at basic types of data (numbers, boolean values, etc.), structured types (for example, lists and tuples), expressions, precedence and order of operations, variables, and list comprehensions.


This Gloss entry point displays an animation, represented as a function that maps the current time to a picture.  We’ll use it to talk about defining functions, conditionals, pattern matching, and type declarations.


The third Gloss entry point runs a simulation, which differs from an animation in that it maintains a state.  We will use it to discuss defining and operating on our own data types.


The fourth and final Gloss entry point runs a simulation with user input, which is suggestively called a game.  We’ll use this to talk about issues arising in larger-scale programs: recursion in both code and data, modular programming, simple data structures, and building complex data types.  At this point, I also expect to transition even more toward providing time for practice, and encouraging everyone to create something of their own.

The exact time allocations for each part will be determined over the course of the year, so we’ll see how it goes!  If you’re interested, I hope you stick around.

July 30, 2011 / cdsmith

Arrow = Category + Applicative? (Part I)

It’s been said a few times that Haskell’s Arrow type class is really just the combination of Category and Applicative.  Frankly, I’ve never really “got” the arrow type class, and I’ve always thought it looked rather  convoluted and ugly — with it’s explicit use of tuple types and related weird contortions that remind me of the monstrosities that sometimes come out of lambdabot’s point-free converter.  Reading papers about whether the originally stated axioms are even all necessary reinforces that idea.  But Category and Applicative are very simple, easy, and intuitive.  So this is very good news… supposing it’s true.

Before just crossing arrows off of the list of things to learn, though, we should understand the statement and verify that it’s true.  Note that being true means not only that you can construct terms with the appropriate types, but that terms can be constructed in a way that gives inverse maps between the two concepts, and that the axioms on both sides follow from the other.  That’s what we set out to do here.

In this post, part I, we talk about what this means, and establish the easy direction: that there is an obvious Applicative corresponding to any Arrow.

What Does It Even Mean?

First, we run into the difficulty that the original statement isn’t very precise.  After all, Category and Arrow types have kind  (* -> * -> *), while Applicative types have kind (* -> *).  So clearly the same type can’t be an instance of all three type classes.  So what we really mean is that Arrow is equivalent to a Category instance, plus a universally quantified applicative instance for any fixed domain (where I mean domain in the Category sense; the first type parameter of the Category type).  That will be clearer if we write some declarations.

So, for the Arrow side, we have:

instance Category (:~>) where
    id    = ...
    (.)   = ...

instance Arrow (:~>) where
    arr   = ...
    first = ...

Since Arrow is already a subclass of Category, the Arrow axioms relate all of these terms together, and here I’ll slightly reorganize the Arrow axioms from their usual presentation to emphasize the role of the Category instance.

[C1] f                            = id . f
                                  = f  . id
[C2] f . (g . h)                  = (f . g) . h)
[A1] arr id                       = id
[A2] arr   (f . g)                = arr f   . arr g
[A3] first (f . g)                = first f . first g
[A4] first (arr f)                = arr (f `cross` id)
[A5] first f . arr (id `cross` g) = arr (id `cross` g) . first f
[A6] arr fst . first f            = f . arr fst
[A7] first f . arr assoc          = arr assoc . first (first f)

We used the following functions in the plumbing above:

(f `cross` g) (x,y) = (f x, g y)
assoc ((x,y),z)     = (x,(y,z))

On the Applicative side, we want the following

instance Category (:~>) where
    id    = ...
    (.)   = ...

instance Applicative ((:~>) a) where
    pure  = ...
    (<*>) = ...

Notice that the Applicative instance fixes the domain.  It’s universally quantified, sure, but within any use of the Applicative terms, we’ll need to keep the domain the same.  This is how we relate the instances for different kinds.

These are also subject to axioms:

[C1] f               = id . f
                     = f  . id
[C2] f . (g . h)     = (f . g) . h
[F1] pure id <*> v   = v
[F2] u <*> (v <*> w) = pure (.) <*> u <*> v <*> w
[F3] pure (f x)      = pure f <*> pure x
[F4] u <*> pure y    = pure ($ y) <*> u

The Category instance, and axioms [C1] and [C2], are given.  The question is:

  1. Given an Arrow instance and axioms [A1] through [A7], can we write an Applicative instance which satisfies axioms [F1] through [F4]?
  2. Given an Applicative instance and axioms [F1] through [F4], can we write an Arrow instance which satisfies axioms [A1] through [A7]?
  3. Are the maps between the two inverse to each other?  That is, can we deduce from the relevant axioms that the result of substituting back and forth in either order gives back the original constants?

If we can do this, then Arrow becomes not particularly interesting, since it’s just an overly complicated way of expressing computations with Category and Applicative.  This is what we’re trying to show.

Arrow to Applicative

In this part, we get from the Arrow axioms to the Applicative axioms.  Assuming Arrow (and, of course, its superclass Category), we’ll define the following Applicative instance.

instance Applicative ((:~>) a) where
    pure x  = arr (const x)
    f <*> x = arr (uncurry (flip ($))) . first x . arr swap . first f . arr dup

We made use of two auxiliary functions

dup   x    = (x,x)
swap (x,y) = (y,x)

What we’re saying here is pretty straightforward.

First, Arrow’s arr and Applicative’s pure are both notions of purity, but Applicative’s notion is just a bit stronger.  A map constructed with arr is effectively just a function, but at least it can depend on its input, if nothing else.  The Applicative pure, on the other hand, considers dependence on the input, too, as if it were a sort of effect (keep in mind that having the input around at all is part of what we’re wrapping up in the applicative functor), so purity is a combination for Arrow purity, and also starting from a constant functor.

Second, the application part tells us that our notion of application is to split the “input” (in the categorical sense) and let both f and x process it, and then apply the resulting function to the resulting value.  The order of processing matters here, since arrows can have side effects!  The convention with most Applicative instances is that effects are performed left to right, so we arrange for f to be processed before x.  It’s interesting, though, and will come up again later, that we could put the effects in the opposite order and still have a perfectly valid, though counter-intuitive, Applicative instance.

To verify that the Applicative instance defined here is a valid one, we’ll need to verify the axioms.  We have four of them to check.  In the following, I’ll apply the Category axioms freely without mention… in particular, I never indicate the grouping of composition, since it doesn’t matter, and I’ll collapse occurrences of id freely as well.

[F1] pure id <*> v = v
pure id <*> v
= arr (uncurry (flip ($))) . first v . arr swap . first (pure id) . arr dup            -- def. of <*>
= arr (uncurry (flip ($))) . first v . arr swap . first (arr (const id)) . arr dup     -- def. of pure
= arr (uncurry (flip ($))) . first v . arr swap . arr (const id `cross` id) . arr dup  -- [A4]
= arr (uncurry (flip ($))) . first v . arr (swap . (const id `cross` id)) . arr dup    -- [A2]
= arr (uncurry (flip ($))) . first v . arr (id `cross` const id) . arr dup             -- simplification
= arr (uncurry (flip ($))) . arr (id `cross` const id) . first v . arr dup             -- [A5]
= arr (uncurry (flip ($)) . (id `cross` const id)) . first v . arr dup                 -- [A2]
= arr fst . first v . arr dup                                                          -- simplification
= v . arr fst . arr dup                                                                -- [A6]
= v . arr (fst . dup)                                                                  -- [A2]
= v . arr id                                                                           -- simplification
= v . id = v                                                                           -- [A1]

I’ll skip the longer one for now and move to [F3].

[F3] pure (f x) = pure f <*> pure x
pure f <*> pure x
= arr (uncurry (flip ($))) . first (pure x) . arr swap . first (pure f) . arr dup                     -- def. of <*>
= arr (uncurry (flip ($))) . first (arr (const x)) . arr swap . first (arr (const f)) . arr dup       -- def. of pure
= arr (uncurry (flip ($))) . arr (const x `cross` id) . arr swap . arr (const f `cross` id) . arr dup -- [A4]
= arr (uncurry (flip ($)) . (const x `cross` id) . swap . (const f `cross` id) . dup)                 -- [A2]
= arr (const (f x))                                                                                   -- simplification
= pure (f x)                                                                                          -- def. of pure

That one was easy.  So emboldened, let’s move on to [F4].

[F4] u <*> pure y = pure ($ y) <*> u
u <*> pure y
= arr (uncurry (flip ($))) . first (pure y) . arr swap . first u . arr dup               -- def. of <*>
= arr (uncurry (flip ($))) . first (arr (const y)) . arr swap . first u . arr dup        -- def. of pure
= arr (uncurry (flip ($))) . arr (const y `cross` id) . arr swap . first u . arr dup     -- [A4]
= arr (uncurry (flip ($)) . (const y `cross` id) . swap) . first u . arr dup             -- [A2]
= arr (uncurry (flip ($)) . (id `cross` const ($ y))) . first u . arr dup                -- function identities
= arr (uncurry (flip ($))) . arr (id `cross` const ($ y)) . first u . arr dup            -- [A2]
= arr (uncurry (flip ($))) . first u . arr (id `cross` const ($ y)) . arr dup            -- [A5]
= arr (uncurry (flip ($))) . first u . arr ((id `cross` const ($ y)) . dup)              -- [A2]
= arr (uncurry (flip ($))) . first u . arr (swap . (const ($ y) `cross` id) . dup)       -- function identities
= arr (uncurry (flip ($))) . first u . arr swap . arr (const ($ y) `cross` id) . arr dup -- [A2]
= arr (uncurry (flip ($))) . first u . arr swap . first (arr (const ($ y))) . arr dup    -- [A4]
= arr (uncurry (flip ($))) . first u . arr swap . first (pure ($ y)) . arr dup           -- def. of pure
= pure ($ y) <*> u                                                                       -- def. of <*>

Not quite so easy, but we made it through.  Now face the real monster, axiom [F2].  Hopefully we’re familiar enough by now with applying [A2] and rewriting the functions that result that we can combine those into a single step now.  Before we dive in, we’ll need one auxiliary function: the inverse of the assoc function from Arrow axiom [A7]:

unassoc (x,(y,z)) = ((x,y),z)

Now, let’ start.  Because the proof is so long, I’ve taken the time to bold the parts that change in each line, so you can follow more easily.

[F2] u <*> (v <*> w) = pure (.) <*> u <*> v <*> w
pure (.) <*> u <*> v <*> w
= arr (const (.)) <*> u <*> v <*> w                                                      -- def. of pure
= arr (uncurry (flip ($))) . first w . arr swap . first (arr (uncurry (flip ($)))
 . first v . arr swap . first (arr (uncurry (flip ($))) . first u . arr swap
 . first (arr (const (.))) . arr dup) . arr dup) . arr dup                              -- def. of <*>
= arr (uncurry (flip ($))) . first w . arr swap . first (arr (uncurry (flip ($))))
  . first (first v) . first (arr swap) . first (first (arr (uncurry (flip ($)))))
  . first (first (first u)) . first (first (arr swap)) . first (first (first (arr (const (.)))))
  . first (first (arr dup)) . first (arr dup) . arr dup                                  -- [A3]
= arr (uncurry (flip ($))) . first w . arr swap . arr (uncurry (flip ($)) `cross` id)
  . first (first v) . arr (swap `cross` id) . arr ((uncurry (flip ($)) `cross` id) `cross` id)
  . first (first (first u)) . arr ((swap `cross` id) `cross` id)
  . arr (((const (.) `cross` id) `cross` id) `cross` id) . arr ((dup `cross` id) `cross` id)
  . arr (dup `cross` id) . arr dup                                                       -- [A4]
= arr (uncurry (flip ($))) . first w
  . arr (\ ((a,b),c) -> (c, b a)) . first (first v)
  . arr (\ ((a,b),(c,d)) -> ((c, b a), d)) . arr assoc . first (first (first u))
  . arr unassoc . arr (id `cross` dup) . arr (\ x -> ((x, (.)), x))                      -- [A2] and function identities
= arr (uncurry (flip ($))) . first w
  . arr (\ ((a,b),c) -> (c, b a)) . first (first v)
  . arr (\ ((a,b),(c,d)) -> ((c, b a), d)) . first (first u)
  . arr assoc . arr unassoc . arr (id `cross` dup) . arr (\ x -> ((x, (.)), x))          -- [A7]
= arr (uncurry (flip ($))) . first w
  . arr (\ ((a,b),c) -> (c, b a)) . first (first v)
  . arr (\ ((a,b),(c,d)) -> ((c, b a), d)) . first (first u)
  . arr (id `cross` dup) . arr (\ x -> ((x, (.)), x))                                    -- [A2] and [A1]
= arr (uncurry (flip ($))) . first w
  . arr (\ ((a,b),c) -> (c, b a)) . first (first v)
  . arr (\ ((a,b),(c,d)) -> ((c, b a), d)) . arr (id `cross` dup) . first (first u)
  . arr (\ x -> ((x, (.)), x))                                                           -- [A5]
= arr (uncurry (flip ($))) . first w
  . arr (\ ((a,b),c) -> (c, b a)) . first (first v)
  . arr (\ (a,(b,c)) -> ((c, b a), c)) . arr assoc . first (first u)
  . arr unassoc . arr (id `cross` ((.),)) . arr dup                                      -- [A2] and function identities
= arr (uncurry (flip ($))) . first w
  . arr (\ ((a,b),c) -> (c, b a)) . first (first v)
  . arr (\ (a,(b,c)) -> ((c, b a), c)) . first u
  . arr assoc . arr unassoc . arr (id `cross` ((.),)) . arr dup                          -- [A7]
= arr (uncurry (flip ($))) . first w
  . arr (\ ((a,b),c) -> (c, b a)) . first (first v)
  . arr (\ (a,(b,c)) -> ((c, b a), c)) . first u
  . arr (id `cross` ((.),)) . arr dup                                                    -- [A2] and [A1]
= arr (uncurry (flip ($))) . first w
  . arr (\ ((a,b),c) -> (c, b a)) . first (first v)
  . arr (\ (a,(b,c)) -> ((c, b a), c)) . arr (id `cross` ((.),)) . first u
  . arr dup                                                                              -- [A5]
= arr (uncurry (flip ($))) . first w
  . arr (\ (a,(b,c)) -> (c, b a)) . arr assoc . first (first v)
  . arr unassoc . arr (id `cross` \ (b,c) -> ((.) c, b)) . arr assoc
 . arr (\ (a,b) -> ((b,b),a)) . first u . arr dup                                        -- [A2] and function identities
= arr (uncurry (flip ($))) . first w
  . arr (\ (a,(b,c)) -> (c, b a)) . first v
  . arr assoc . arr unassoc . arr (id `cross` \ (b,c) -> ((.) c, b)) . arr assoc
  . arr (\ (a,b) -> ((b,b),a)) . first u . arr dup                                        -- [A7]
= arr (uncurry (flip ($))) . first w
  . arr (\ (a,(b,c)) -> (c, b a)) . first v
  . arr (id `cross` \ (b,c) -> ((.) c, b)) . arr assoc
  . arr (\ (a,b) -> ((b,b),a)) . first u . arr dup                                        -- [A2] and [A1]
= arr (uncurry (flip ($))) . first w
  . arr (\ (a,(b,c)) -> (c, b a)) . arr (id `cross` \ (b,c) -> ((.) c, b)) . first v
  . arr assoc . arr (\(a,b) -> ((b,b),a)) . first u . arr dup                            -- [A5]
= arr (uncurry (flip ($))) . first w
  . arr (\ (a,(b,c)) -> (b, c . a)) . first v . arr assoc
  . arr (\ (a,b) -> ((b,b),a)) . first u . arr dup                                        -- [A2] and function identities
= arr (uncurry (flip ($))) . first w . arr (id `cross` uncurry (flip (.))) . arr assoc
  . arr (\ ((a,b),c) -> ((b,a),c)) . first (first v)
  . arr (\ (a,b) -> ((b,b),a)) . first u . arr dup                                       -- [A7] and [A2] and identities
= arr (uncurry (flip ($))) . arr (id `cross` uncurry (flip (.))) . arr assoc . first (first w)
  . arr (\ ((a,b),c) -> ((b,a),c)) . first (first v)
  . arr (\ (a,b) -> ((b,b),a)) . first u . arr dup                                       -- [A5] and [A7]
= arr (uncurry (flip ($))) . arr (uncurry (flip ($)) `cross` id) . first (first w)
  . arr (swap `cross` id) . first (first v)
  . arr (dup `cross` id) . arr swap . first u . arr dup                                  -- [A2] and function identities
= arr (uncurry (flip ($))) . first (arr (uncurry (flip ($)))) . first (first w)
  . first (arr swap) . first (first v)
  . first (arr dup) . arr swap . first u . arr dup                                       -- [A4]
= arr (uncurry (flip ($))) . first (arr (uncurry (flip ($))) . first w
  . arr swap . first v . arr dup) . arr swap . first u . arr dup                         -- [A3]
= u <*> (v <*> w)                                                                        -- def. of <*>

That was… interesting.  Perhaps there’s a short, concise, elegant way to do this, but I don’t see it, so I just clobbered at it until it worked out in the end.  At least we’re done.

I will point out that I initially wrote a much longer proof, but had an error in the definition of the Applicative instance, and when I fixed that, I rewrote this proof from scratch and it came out a good bit shorter.  The technique is basically the same throughout: it’s all about applying the [A7] axiom to pull out assoc and unassoc pairs around the major parts and move them back and forth.  It’s also interesting that the [A6] axiom from Arrow did appear in the earlier proof, but didn’t turn out to be necessary this time.  In fact, the only place that axiom appears at all is in the proof of [F1], which is essentially that the functor preserves identity functions.  Very curious…


Okay, so we’ve now shown that arrows naturally give rise to Applicative instances: if you have an Arrow, then for any fixed input type, the arrows from that input type constitute an applicative functor.  That’s half (the easier half) of what we set out to show, but it’s a decent stopping point.  Coming next: what about getting Arrow from Applicative?  Are the Applicative axioms powerful enough?  (Hint: no)  What other axioms do we need to add, and are they natural things to expect of any type with both Applicative and Category instances?  Find out the answers to these exciting questions in the next installment!

July 20, 2011 / cdsmith

On fibonacci numbers as an example of asymptotic complexity

Most people have seen the elegant recursive definition of fibonacci numbers:  In Haskell,

fib 0 = 1
fib 1 = 1
fib n = fib (n-1) + fib (n-2)

and the fact that this performs horridly for higher values of n is well-known.  But there are a couple cute aspects of this example that are less known, so I thought I’d just take a moment to point them out.

First of all, what is the asymptotic complexity of the earlier implementation?  It’s exponential, sure, but let’s take a closer look at the analysis.  Let’s define a function fibtime to approximate the number of time steps needed to calculate the result of fib.  Since it’s straight-forward recursion where each individual function application performs constant work, and since we’re assuming arithmetic can be done in constant time (this isn’t actually true!  See comments later…) we can measure fibtime with the number of function applications needed to get the answer; all of the other work goes into the constant factor.  So it takes only one function application to evaluate the base cases; but for the inductive case, we need to evaluate two smaller instances of the problem as well:

fibtime 0 = 1
fibtime 1 = 1
fibtime n = fibtime (n-1) + fibtime (n-2) + 1

Look familiar?  Well, almost… there’s an extra “+1” in there.  Solve the recurrence relation, though, and that comes out in the constant factor, too… so the time complexity of computing fib n is O(fib n).  Cute, huh?

Okay, so we want something faster.  The normal answer is to convert to tail recursion (or write the obvious imperative algorithm; they amount to the same thing):

Edit 3: Thanks to Raphael in the comments for the comments that led to writing this code in a clearer style, and unifying it with the later version.

fib n = fst . fibpair
    where fibpair 0 = (0,1)
          fibpair n = (b,a+b) where (a,b) = fibpair (n-1)

The asymptotic analysis here is easy, and the running time is O(n).  Equivalently, Haskell allows a nifty definition using lazy lists:

fib n = fibs !! n
    where fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

While this looks substantially different, it turns out to be the same algorithm written in a more declarative style.  So great, we’ve gone from exponential to linear.  Most people stop there.

But can we do better?  Yes, it turns out we can!  A key observation here is that

\left( \begin{tabular}{cc} b-a & a \\ a & b \end{tabular}\right) \times \left( \begin{tabular}{cc} 0 & 1 \\ 1 & 1 \end{tabular}\right) = \left(\begin{tabular}{cc} (a+b)-b & b \\ b & a+b \end{tabular}\right)

In other words, you can encode the state (the a and b from the tail recursive version above) into a matrix, such that making progress involves just multiplying by a constant matrix.  But matrix multiplication is expensive, and even multiplication of constant 2×2 matrices takes some time, so this would still be linear with a somewhat worse constant factor… right?

Wrong!  The key point is that matrix multiplication is associative, and that means that we can rearrange the multiplications to avoid duplicating work.   How do I compute the 16th power of a matrix?  The best way is not to multiply the matrix by itself 16 times, but rather to compute the 8th power, and then square that.  If the power I want is odd, then I will just compute the power one below, and multiply by the original matrix once, but crucially, this can only happen once before I have an even power again!  So overall, computing the nth power of a matrix can be done in O(log n) time.

We don’t really want to represent matrices explicitly, so we instead just encode the first matrix above as the ordered pair (a,b), as we did earlier.  (Note that in this encoding, the identity matrix, which is the zero’th power of any matrix, is just (0,1), which was the same base case earlier… in fact we’re just adding one special case to the earlier code.)  Doing some basic algebra, the result turns out to look like this:

fib n = fst . fibpair
    fibpair 0          = (0,1)
    fibpair n | even n = (2*a*b-a^2, a^2+b^2) where (a,b) = fibpair (n `quot` 2)
              | odd  n = (b, a+b)             where (a,b) = fibpair (n - 1)

So, ever wanted to know the ten millionth fibonacci number?  No problem!  In case you were curious; the answer is 2,089,877 decimal digits long.

Wait a second here… at this point, you should be questioning whether our code really runs in logarithmic time!  Dear reader, I’ve been lying to you.  Just the length of the output from the fibonacci function is actually linear in the value of n!  So it’s outright impossible to produce that output in better than O(n) time.  So was this all a waste?  No, not really.  We made a simplifying assumption up front that all arithmetic can be done in constant time.  For most reasonable values of input, we make that assumption all the time, and it’s useful in practice.  However, it does fail when we hit very large values that exceed normal fixed size data types.  This algorithm doesn’t really run in logarithmic time… but in the exact same sense, neither was the previous version really linear.

Okay, we’ve got that down.  So…. can we do better?

Not really, if we need precise answers.  The fibonacci numbers actually have a closed-form solution involving the golden ratio, but it too requires computing nth powers – this time of some exact representation of the algebraic reals if we don’t want to get rounding error – so it’s not going to be asymptotically better, and is likely to have much worse constant factors.  That said, if all you want is an approximation of results well within the range of a double precision floating point number, you can get that using the closed form, in constant time!

approxfib n = round (phi ** x / sqrt 5)
    where x   = fromIntegral (n+1)
          phi = (1 + sqrt 5) / 2

Edit: Thanks to mathnerd for pointing out that the psi part of Binet’s formula is always less than 1/2, so just rounding the answer is enough without actually subtracting it.

Using double precision floating point, this is exact up to the 70th fibonacci number (approximately 300 trillion), so it does pretty well for small numbers!  The rounding error is very low (as in, less than a billionth of a percent) up through the 1473rd number; but then we hit the upper end of the range of double precision floating point numbers, and things go bad.  If you want an exact answer or arbitrarily large numbers, it won’t do the job; but for reasonable ranges, it gives a decent approximation.

Edit 2: It’s also interesting to note that the closed form solution, known as Binet’s formula, can be derived from the matrix approach mentioned above.  The approach is to decompose the constant matrix from earlier into a diagonal matrix of eigenvalues via a spectral decomposition.  Then its powers can be computed by powers of the diagonal matrix of eigenvalues, which then leads to Binet’s formula.  Conversely, if you do decide to look for an efficient exact representation for computing Binet’s formula with arbitrary size numbers, you end up deciding to work in the field \mathbb{Q}(\varphi), which is an extension field of dimension 2 over the rationals.  If you then derive multiplication formulas for that field, they look very familiar: you end up doing exactly the same math on pairs of numbers that we did for the matrix solution.  Thanks to brandonpelfrey, poulson, and fredrihj from Reddit for the conversation leading to that observation.

And there you have it, more than you ever wanted to know about computing fibonacci numbers.

July 17, 2011 / cdsmith

“Industry-Driven” Computer Science Education

A comment I heard this weekend, from an honest-to-goodness professor in a (mediocre, but not laughable) computer science program, went something like this.  I’m leaving the source anonymous, to protect the guilty.

Students are making more money selling iPhone apps than others who have graduated and have jobs […] Companies that are hiring our graduates are moving away from plain old C++ and Java to Objective C, and we, as a computer science department, are scrambling to keep up.

In other words, they are trying to figure out how to teach students to build iPhone apps, because their students might be able to make some quick money, and potential employers are supposedly moving to Objective C.  (Okay, that second one is probably just imaginary.  I should mention at this point that the conversation was sparked by this professor promoting the book he’s selling on building iPhone apps, so this isn’t an unbiased source.)

Now, in general, computer science education is always going to be defined by a tension between the theoretical and the practical, between helping students to understand the nature and scope and limitations of computation in general, and giving them specific skills that will help land paying jobs as computer programmers.  It would be nice to pretend that the two goals aren’t in tension at all, that achieving the first is the best approach to the second.  But taken to an extreme, that would be a lie.

Most people, I think, would probably argue for some kind of balance between the two goals.  Sure, we ought to give students practice in the day to day craft of computer programming as part of their education.  After all, for better or worse, the majority of computer science students are going to graduate and look for jobs in a world where functional programming and automata and anything beyond the bare basics of complexity theory are considered niche subjects.

Still, aren’t universities still different from technical training programs?  Surely we have some higher purpose in mind than to teach students the details of the Spring framework in Java?  Surely we aim for an education that applies no matter your choice of operating system, or target device, or web framework?

Apparently not.

So why is this attitude wrong?  Okay, there’s the obvious things… the ones that initially got me a bit fired up when I heard this.  There’s the measurement of success in dollars.  There’s the complete dismissal of a rich and interesting field of human knowledge in favor of getting people quick jobs.  But even if we assume that the goal of computer science education is to produce larger salaries among the program’s graduates, this is a deeply flawed attitude toward the educational system, for three reasons.

1) A career is 30 years long!

Thirty years ago was the year 1981.  None of the programming languages that are popular for new software today existed in 1981.  Object oriented programming was confined to small and largely research-based communities (and meant something rather different than C++!)  So with that in mind, what makes us expect that even the paradigms of today’s world are likely to be dominant in the middle of students’ careers in 15 years, much less for 30 years until they are preparing to retire?

Now we’re going even further; the person quoted above is talking about rearranging their entire computer science education around a platform and device that didn’t even exist four years ago, and may or may not still be a significant development platform in another three or four!  Remember, undergraduate CS education is a once in a lifetime experience… let’s not make it a crap shoot in terms of which two or three year period you happen to hit that age.

2) This is misunderstanding the reasons for even the short-term profits made in new fields.

This idea – that lots of people made a lot of money developing iPhone apps, so colleges ought to start teaching it – is doubly ironic because none of those people were taught how to make iPhone apps in college.  Indeed, but the time today’s college freshmen are finishing up, it’s likely that iPhone apps won’t be the kind of free for all gold mine they were in early days.  Just like no one made money on flimsy dot-com ideas after 2001, these students will be graduating into a world where mobile apps are saturated and money is made by incremental improvements and careful market research and adaptation.

If your goal really is to help your students get rich on a software fad, you should be preparing them to be able to capitalize on the next software fad, not the one that’s already well known to the world!  Of course, we don’t know what the next fad will be… but we do know a little bit about giving people the general skills to adapt to a variety of software platforms and paradigms because they understand what they are doing and what the landscape of computation looks like.

3) You can’t teach a class in how to drop out of college and become a millionaire.

Worst of all, though, is that the motivation given here is ridiculous on its face.  Many of our students, the claim goes, are making more money selling iPhone apps than they would be finishing their educations and getting jobs.  But is that a statement about the iPhone apps, or is it a statement about those students?

It’s hard to deny that occasionally students either drop out of college or have side projects in college, and accomplish really great things.  Most of the time, their college education has little to do with it.  From Bill Gates to Linus Torvalds to the kids writing successful mobile phone apps today, there are always those who have a remarkable drive to accomplish something – whether it’s to make a fortune, or to create something awesome for the world – and do it.  But does that mean that all college students would strike it rich if only we’d taught them the skills they needed to do what Bill Gates did?

No!  There are a lot of factors involved there; ambition, motivation, personality, and a lot of luck… but if there’s one thing we can be pretty sure of, it’s that the sort of student who can’t be bothered to read a book on their own and learn Objective C and the iPhone SDK is probably not cut out to be the next wizz kid that strikes it rich.  That requires a bit of independence and self-motivation no matter how you look at it.

What should we do about the students that drop out and succeed?  Be happy for them!  But if a student wants to leave and do something on their own, they are not the target audience for a university computer science program.  The program is wasting its breath tailoring itself for those students.  They aren’t going to stay anyway!  Even if they do eventually come back and finish up their university studies, it won’t be to help them keep up on the latest fad; it’ll be because they recognize the value of a broader education.

Okay, enough ranting for now.

June 28, 2011 / cdsmith

I’m using Snap in production!

Okay, it’s a tad less impressive than I might have hoped, but we’ve now officially deployed Snap into production in my job at Brindle Waye!

The funny part is, we didn’t deploy a web application.  Instead, we provide offline preview in your choice of web browser for our web-based training authoring tool.  We build the page and save it, but various web browsers don’t like running a lot of JavaScript stuff in a web browser running from a file-scheme URL.  So in the latest version released a couple days ago, we actually spawn a Snap-based web server to serve the pages from some unused high port number on the loopback interface.  Works great!

Why’d we use Snap for a static web server?  Initially,

  • We wanted to bundle the thing with our application and not fiddle with configuration files, registry settings, and the like.
  • After a lot of searching, we could find no suitable, free, portable, simple web servers that don’t require excessive amounts of configuration.
  • Snap was handy, and it was no problem to throw together a two line application to serve a directory.  (Two because we read the directory path from the command line on the first!)

Turns out that as we solidified and idiot-proofed the feature before release, it was nice that we used Snap for other reasons, too.  We ran into some quirky requirements that would have been tough to satisfy with another server, like:

  • If the initially chosen port is already in use, we want to walk up port numbers one by one (we’re already in the dynamic/private range) trying to bind to different ports until we find one.  Of course we then want to change the port number on the provided URL, so we launch the browser from the Snap-based server following a successful bind.  Not only would these be tough to accomplish with a static web server; I don’t even know how I’d get something like Tomcat to do it!  Being able to write your own main in the Haskell web programming world is nice.
  • We want to have the local web server shut down if there’s x minutes of inactivity.  (I’ve honestly forgotten what x is here.)  We normally terminate it on our own, but in case the main application crashes or there’s some bug where it’s left running, it’s nice to have that safeguard.  I’m not sure how we could have measured time since the last incoming request using a static server.
  • To prevent some weird cache issues (related to the fact that if you swap between courses, we serve serving files from different physical directories with the same URL), we wanted to add Cache-control: no-store to prevent caching.  Normally that requires yet more configuration files in Apache, and in a servlet container I might have had to lose the default file serving and write the file contents out by hand to the relevant OutputStream… but with Snap I just toss in a line right before the call to serveDirectory and add the header!

It definitely feels good that something like this — a really quirky set of requirements, for serving files in a setting that was completely unexpected to the people who wrote the server — was difficult to imagine with any other software we found, but easy in Haskell!  Granted, we didn’t look at too many other languages’ application servers, but we did search the Java world (because that’s the language the authoring tool is written in, and we already install a JVM) for a while.  Haskell was a refreshing change from tools where just copying all the relevant files to the right places was a daunting task.

This isn’t specific to Snap, either; I could have done this just as easily with Happstack, and probably Yesod (or bare WAI) as well.  This just tells me that as a Haskell web programming community, we’re doing something right.