Skip to content
July 20, 2011 / cdsmith

On fibonacci numbers as an example of asymptotic complexity

Most people have seen the elegant recursive definition of fibonacci numbers:  In Haskell,

fib 0 = 1
fib 1 = 1
fib n = fib (n-1) + fib (n-2)

and the fact that this performs horridly for higher values of n is well-known.  But there are a couple cute aspects of this example that are less known, so I thought I’d just take a moment to point them out.

First of all, what is the asymptotic complexity of the earlier implementation?  It’s exponential, sure, but let’s take a closer look at the analysis.  Let’s define a function fibtime to approximate the number of time steps needed to calculate the result of fib.  Since it’s straight-forward recursion where each individual function application performs constant work, and since we’re assuming arithmetic can be done in constant time (this isn’t actually true!  See comments later…) we can measure fibtime with the number of function applications needed to get the answer; all of the other work goes into the constant factor.  So it takes only one function application to evaluate the base cases; but for the inductive case, we need to evaluate two smaller instances of the problem as well:

fibtime 0 = 1
fibtime 1 = 1
fibtime n = fibtime (n-1) + fibtime (n-2) + 1

Look familiar?  Well, almost… there’s an extra “+1” in there.  Solve the recurrence relation, though, and that comes out in the constant factor, too… so the time complexity of computing fib n is O(fib n).  Cute, huh?

Okay, so we want something faster.  The normal answer is to convert to tail recursion (or write the obvious imperative algorithm; they amount to the same thing):

Edit 3: Thanks to Raphael in the comments for the comments that led to writing this code in a clearer style, and unifying it with the later version.

fib n = fst . fibpair
    where fibpair 0 = (0,1)
          fibpair n = (b,a+b) where (a,b) = fibpair (n-1)

The asymptotic analysis here is easy, and the running time is O(n).  Equivalently, Haskell allows a nifty definition using lazy lists:

fib n = fibs !! n
    where fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

While this looks substantially different, it turns out to be the same algorithm written in a more declarative style.  So great, we’ve gone from exponential to linear.  Most people stop there.

But can we do better?  Yes, it turns out we can!  A key observation here is that

\left( \begin{tabular}{cc} b-a & a \\ a & b \end{tabular}\right) \times \left( \begin{tabular}{cc} 0 & 1 \\ 1 & 1 \end{tabular}\right) = \left(\begin{tabular}{cc} (a+b)-b & b \\ b & a+b \end{tabular}\right)

In other words, you can encode the state (the a and b from the tail recursive version above) into a matrix, such that making progress involves just multiplying by a constant matrix.  But matrix multiplication is expensive, and even multiplication of constant 2×2 matrices takes some time, so this would still be linear with a somewhat worse constant factor… right?

Wrong!  The key point is that matrix multiplication is associative, and that means that we can rearrange the multiplications to avoid duplicating work.   How do I compute the 16th power of a matrix?  The best way is not to multiply the matrix by itself 16 times, but rather to compute the 8th power, and then square that.  If the power I want is odd, then I will just compute the power one below, and multiply by the original matrix once, but crucially, this can only happen once before I have an even power again!  So overall, computing the nth power of a matrix can be done in O(log n) time.

We don’t really want to represent matrices explicitly, so we instead just encode the first matrix above as the ordered pair (a,b), as we did earlier.  (Note that in this encoding, the identity matrix, which is the zero’th power of any matrix, is just (0,1), which was the same base case earlier… in fact we’re just adding one special case to the earlier code.)  Doing some basic algebra, the result turns out to look like this:

fib n = fst . fibpair
    fibpair 0          = (0,1)
    fibpair n | even n = (2*a*b-a^2, a^2+b^2) where (a,b) = fibpair (n `quot` 2)
              | odd  n = (b, a+b)             where (a,b) = fibpair (n - 1)

So, ever wanted to know the ten millionth fibonacci number?  No problem!  In case you were curious; the answer is 2,089,877 decimal digits long.

Wait a second here… at this point, you should be questioning whether our code really runs in logarithmic time!  Dear reader, I’ve been lying to you.  Just the length of the output from the fibonacci function is actually linear in the value of n!  So it’s outright impossible to produce that output in better than O(n) time.  So was this all a waste?  No, not really.  We made a simplifying assumption up front that all arithmetic can be done in constant time.  For most reasonable values of input, we make that assumption all the time, and it’s useful in practice.  However, it does fail when we hit very large values that exceed normal fixed size data types.  This algorithm doesn’t really run in logarithmic time… but in the exact same sense, neither was the previous version really linear.

Okay, we’ve got that down.  So…. can we do better?

Not really, if we need precise answers.  The fibonacci numbers actually have a closed-form solution involving the golden ratio, but it too requires computing nth powers – this time of some exact representation of the algebraic reals if we don’t want to get rounding error – so it’s not going to be asymptotically better, and is likely to have much worse constant factors.  That said, if all you want is an approximation of results well within the range of a double precision floating point number, you can get that using the closed form, in constant time!

approxfib n = round (phi ** x / sqrt 5)
    where x   = fromIntegral (n+1)
          phi = (1 + sqrt 5) / 2

Edit: Thanks to mathnerd for pointing out that the psi part of Binet’s formula is always less than 1/2, so just rounding the answer is enough without actually subtracting it.

Using double precision floating point, this is exact up to the 70th fibonacci number (approximately 300 trillion), so it does pretty well for small numbers!  The rounding error is very low (as in, less than a billionth of a percent) up through the 1473rd number; but then we hit the upper end of the range of double precision floating point numbers, and things go bad.  If you want an exact answer or arbitrarily large numbers, it won’t do the job; but for reasonable ranges, it gives a decent approximation.

Edit 2: It’s also interesting to note that the closed form solution, known as Binet’s formula, can be derived from the matrix approach mentioned above.  The approach is to decompose the constant matrix from earlier into a diagonal matrix of eigenvalues via a spectral decomposition.  Then its powers can be computed by powers of the diagonal matrix of eigenvalues, which then leads to Binet’s formula.  Conversely, if you do decide to look for an efficient exact representation for computing Binet’s formula with arbitrary size numbers, you end up deciding to work in the field \mathbb{Q}(\varphi), which is an extension field of dimension 2 over the rationals.  If you then derive multiplication formulas for that field, they look very familiar: you end up doing exactly the same math on pairs of numbers that we did for the matrix solution.  Thanks to brandonpelfrey, poulson, and fredrihj from Reddit for the conversation leading to that observation.

And there you have it, more than you ever wanted to know about computing fibonacci numbers.

July 17, 2011 / cdsmith

“Industry-Driven” Computer Science Education

A comment I heard this weekend, from an honest-to-goodness professor in a (mediocre, but not laughable) computer science program, went something like this.  I’m leaving the source anonymous, to protect the guilty.

Students are making more money selling iPhone apps than others who have graduated and have jobs […] Companies that are hiring our graduates are moving away from plain old C++ and Java to Objective C, and we, as a computer science department, are scrambling to keep up.

In other words, they are trying to figure out how to teach students to build iPhone apps, because their students might be able to make some quick money, and potential employers are supposedly moving to Objective C.  (Okay, that second one is probably just imaginary.  I should mention at this point that the conversation was sparked by this professor promoting the book he’s selling on building iPhone apps, so this isn’t an unbiased source.)

Now, in general, computer science education is always going to be defined by a tension between the theoretical and the practical, between helping students to understand the nature and scope and limitations of computation in general, and giving them specific skills that will help land paying jobs as computer programmers.  It would be nice to pretend that the two goals aren’t in tension at all, that achieving the first is the best approach to the second.  But taken to an extreme, that would be a lie.

Most people, I think, would probably argue for some kind of balance between the two goals.  Sure, we ought to give students practice in the day to day craft of computer programming as part of their education.  After all, for better or worse, the majority of computer science students are going to graduate and look for jobs in a world where functional programming and automata and anything beyond the bare basics of complexity theory are considered niche subjects.

Still, aren’t universities still different from technical training programs?  Surely we have some higher purpose in mind than to teach students the details of the Spring framework in Java?  Surely we aim for an education that applies no matter your choice of operating system, or target device, or web framework?

Apparently not.

So why is this attitude wrong?  Okay, there’s the obvious things… the ones that initially got me a bit fired up when I heard this.  There’s the measurement of success in dollars.  There’s the complete dismissal of a rich and interesting field of human knowledge in favor of getting people quick jobs.  But even if we assume that the goal of computer science education is to produce larger salaries among the program’s graduates, this is a deeply flawed attitude toward the educational system, for three reasons.

1) A career is 30 years long!

Thirty years ago was the year 1981.  None of the programming languages that are popular for new software today existed in 1981.  Object oriented programming was confined to small and largely research-based communities (and meant something rather different than C++!)  So with that in mind, what makes us expect that even the paradigms of today’s world are likely to be dominant in the middle of students’ careers in 15 years, much less for 30 years until they are preparing to retire?

Now we’re going even further; the person quoted above is talking about rearranging their entire computer science education around a platform and device that didn’t even exist four years ago, and may or may not still be a significant development platform in another three or four!  Remember, undergraduate CS education is a once in a lifetime experience… let’s not make it a crap shoot in terms of which two or three year period you happen to hit that age.

2) This is misunderstanding the reasons for even the short-term profits made in new fields.

This idea – that lots of people made a lot of money developing iPhone apps, so colleges ought to start teaching it – is doubly ironic because none of those people were taught how to make iPhone apps in college.  Indeed, but the time today’s college freshmen are finishing up, it’s likely that iPhone apps won’t be the kind of free for all gold mine they were in early days.  Just like no one made money on flimsy dot-com ideas after 2001, these students will be graduating into a world where mobile apps are saturated and money is made by incremental improvements and careful market research and adaptation.

If your goal really is to help your students get rich on a software fad, you should be preparing them to be able to capitalize on the next software fad, not the one that’s already well known to the world!  Of course, we don’t know what the next fad will be… but we do know a little bit about giving people the general skills to adapt to a variety of software platforms and paradigms because they understand what they are doing and what the landscape of computation looks like.

3) You can’t teach a class in how to drop out of college and become a millionaire.

Worst of all, though, is that the motivation given here is ridiculous on its face.  Many of our students, the claim goes, are making more money selling iPhone apps than they would be finishing their educations and getting jobs.  But is that a statement about the iPhone apps, or is it a statement about those students?

It’s hard to deny that occasionally students either drop out of college or have side projects in college, and accomplish really great things.  Most of the time, their college education has little to do with it.  From Bill Gates to Linus Torvalds to the kids writing successful mobile phone apps today, there are always those who have a remarkable drive to accomplish something – whether it’s to make a fortune, or to create something awesome for the world – and do it.  But does that mean that all college students would strike it rich if only we’d taught them the skills they needed to do what Bill Gates did?

No!  There are a lot of factors involved there; ambition, motivation, personality, and a lot of luck… but if there’s one thing we can be pretty sure of, it’s that the sort of student who can’t be bothered to read a book on their own and learn Objective C and the iPhone SDK is probably not cut out to be the next wizz kid that strikes it rich.  That requires a bit of independence and self-motivation no matter how you look at it.

What should we do about the students that drop out and succeed?  Be happy for them!  But if a student wants to leave and do something on their own, they are not the target audience for a university computer science program.  The program is wasting its breath tailoring itself for those students.  They aren’t going to stay anyway!  Even if they do eventually come back and finish up their university studies, it won’t be to help them keep up on the latest fad; it’ll be because they recognize the value of a broader education.

Okay, enough ranting for now.

June 28, 2011 / cdsmith

I’m using Snap in production!

Okay, it’s a tad less impressive than I might have hoped, but we’ve now officially deployed Snap into production in my job at Brindle Waye!

The funny part is, we didn’t deploy a web application.  Instead, we provide offline preview in your choice of web browser for our web-based training authoring tool.  We build the page and save it, but various web browsers don’t like running a lot of JavaScript stuff in a web browser running from a file-scheme URL.  So in the latest version released a couple days ago, we actually spawn a Snap-based web server to serve the pages from some unused high port number on the loopback interface.  Works great!

Why’d we use Snap for a static web server?  Initially,

  • We wanted to bundle the thing with our application and not fiddle with configuration files, registry settings, and the like.
  • After a lot of searching, we could find no suitable, free, portable, simple web servers that don’t require excessive amounts of configuration.
  • Snap was handy, and it was no problem to throw together a two line application to serve a directory.  (Two because we read the directory path from the command line on the first!)

Turns out that as we solidified and idiot-proofed the feature before release, it was nice that we used Snap for other reasons, too.  We ran into some quirky requirements that would have been tough to satisfy with another server, like:

  • If the initially chosen port is already in use, we want to walk up port numbers one by one (we’re already in the dynamic/private range) trying to bind to different ports until we find one.  Of course we then want to change the port number on the provided URL, so we launch the browser from the Snap-based server following a successful bind.  Not only would these be tough to accomplish with a static web server; I don’t even know how I’d get something like Tomcat to do it!  Being able to write your own main in the Haskell web programming world is nice.
  • We want to have the local web server shut down if there’s x minutes of inactivity.  (I’ve honestly forgotten what x is here.)  We normally terminate it on our own, but in case the main application crashes or there’s some bug where it’s left running, it’s nice to have that safeguard.  I’m not sure how we could have measured time since the last incoming request using a static server.
  • To prevent some weird cache issues (related to the fact that if you swap between courses, we serve serving files from different physical directories with the same URL), we wanted to add Cache-control: no-store to prevent caching.  Normally that requires yet more configuration files in Apache, and in a servlet container I might have had to lose the default file serving and write the file contents out by hand to the relevant OutputStream… but with Snap I just toss in a line right before the call to serveDirectory and add the header!

It definitely feels good that something like this — a really quirky set of requirements, for serving files in a setting that was completely unexpected to the people who wrote the server — was difficult to imagine with any other software we found, but easy in Haskell!  Granted, we didn’t look at too many other languages’ application servers, but we did search the Java world (because that’s the language the authoring tool is written in, and we already install a JVM) for a while.  Haskell was a refreshing change from tools where just copying all the relevant files to the right places was a daunting task.

This isn’t specific to Snap, either; I could have done this just as easily with Happstack, and probably Yesod (or bare WAI) as well.  This just tells me that as a Haskell web programming community, we’re doing something right.

June 8, 2011 / cdsmith

Finding the longest path in a maze

Well, I asked for it… my last post on building mazes in Haskell led to the follow-on question: how do you pick a starting and ending point for the maze?

Defining the Question

There’s not an obvious answer, because we need a set of criteria for choosing starting and ending points.  Depending on the application, the following might all be requirements:

  • Perhaps the starting and ending points need to be at the edges of the maze (e.g., for printed mazes in children’s game books).  Perhaps they need to be on opposite sides of the maze (e.g., for hedge mazes protecting a wizard’s castle).  Or perhaps they can be anywhere at all (e.g., locations of staircases for a rogue-like game).
  • Are we looking to build a path with the longest total length?  With the most turns?  With the most choices?

I’ll answer the question as follows:

Suppose that an entrance and an exit can be placed anywhere.  How can you find the entrance and exit points that lead to the longest path through the maze?

Turning the Maze Into a Graph

My representation for mazes in the previous installment wasn’t ideal for this task.  I chose to consider a maze to be a list of walls: nice for drawing it, but less nice for deciding whether there’s a wall between two adjacent cells.  So the first task will be to convert the list of walls into an adjacency list representation of a graph.  For efficiency’s sake, I’ll go via a Set data structure, which I’ve imported qualified with the prefix ‘S’.

mazeExits :: Int -> Int -> [Wall] -> Array Cell [Cell]
mazeExits w h walls = array rng [ (c, exits c) | c <- range rng ]
    rng         = ((0,0), (w-1,h-1))
    wallSet     = S.fromList walls
    exits (x,y) = [ (x-1,y) | x > 0, not (V (x-1,y) `S.member` wallSet) ]
               ++ [ (x+1,y) | x < w-1, not (V (x,y) `S.member` wallSet) ]
               ++ [ (x,y-1) | y > 0, not (H (x,y-1) `S.member` wallSet) ]
               ++ [ (x,y+1) | y < h-1, not (H (x,y) `S.member` wallSet) ]

I also derived an Ord instance for Wall, so it can fit in the Set.  The list comprehensions are a little unusual there, having no binding operations… but it certainly looks nicer than the alternative with if statements!

Doing the Search

Now we need a search algorithm to find the longest path in the graph.  Note that the graph is a tree, which helps a lot, since we don’t need to worry about cycles, and any node can be chosen as a root.  Depth first search works here, but the details are a bit tricky.  The longest path in a subtree rooted at c is one of the following two things:

  1. The longest subpath in one of its subtrees, or
  2. The concatenation of the longest two paths from c to a leaf of two distinct subtrees.

At each step of the depth first search, we want to find two things: the longest paths to leaves in each of the subtrees, and the longest paths contained entirely within each of the subtrees.  We can then consider all of the possibilities, and calculate the same two for the larger tree.  The result looks like this:

longestPath :: Array Cell [Cell] -> [Cell]
longestPath exits = fst (go (0,0) (0,0))
    -- First result of go is the longest path entirely within the subtree.
    -- Second result of go is the longest path from the root to a leaf.
    go :: Cell -> Cell -> ([Cell], [Cell])
    go p c = let results    = map (go c) [ c' | c' <- exits ! c, c' /= p ]
                 rootPaths  = map snd results ++ [ [], [] ]
                 sorted     = sortBy (compare `on` (negate . length)) rootPaths
                 localSub   = let (a:b:_) = sorted in reverse a ++ [c] ++ b
                 allSubs    = localSub : map fst results
                 longestSub = maximumBy (compare `on` length) allSubs
             in  (longestSub, c : head sorted)

I’ve arbitrarily chosen (0,0) as my starting point, and then implemented the DFS algorithm described earlier.  This gives us, as desired, the longest path between any points in the map.

Drawing the Path

The drawing code is straightforward.  After the obvious plumbing to get the path to rendering code, we modify drawMaze to add this to the end.

    setSourceRGB 1 0 0
    moveTo (15 + 10 * fromIntegral x0) (15 + 10 * fromIntegral y0)
    forM_ path $ \(x,y) -> do
        lineTo (15 + 10 * fromIntegral x) (15 + 10 * fromIntegral y)

Looking at the Result

Here are some mazes with the longest paths marked.

June 6, 2011 / cdsmith

Mazes in Haskell, My Version

Earlier today, I read this article by Mihai Maruseac about generating mazes in Haskell.  It’s very interesting, but it turns out I’m a maze generation algorithm bigot.  As far as I’m concerned, there’s only one right way to generate mazes, and it’s this:

  1. Construct the maze with all possible walls (so each cell is isolated).
  2. Pick a random wall such that there’s no way to get from one side to the other.
  3. Tear down that wall.
  4. Repeat until the entire maze is navigable.

Sadly, this isn’t how Mihai decided to do it, so I was forced (forced, I tell you!) to spend some time writing my own maze generator.

Equivalence Relations

It turns out that the tricky step here is #2: how can you efficiently tell whether there’s a way to get from one side of a wall to the other.  A maze solver would do it, of course, but solving the maze once per wall isn’t the most scalable answer out there.  Instead, you want to keep track of the separate “rooms” in your maze as you go.

An efficient data structure for this task, called “union find”, has been known to the imperative world for decades: Tarjan showed back in the 70s that it runs in time that’s very nearly linear (with some additional factor that grows like the inverse of the Ackerman function… that is, ludicrously slowly).  Sadly, this seems to be one of those data structures that’s confined to the imperative world.  There’s not an obvious translation into the functional world.  I could just use an imperative implementation, in IO or ST… but that would be too easy.  Instead, I decided to find an implementation of an externally-pure, internally-stateful version of that algorithm.  I failed to find one, so I wrote one instead.  Here it is: persistent-equivalence.

I based the general technique off of one by Conchon and Filliatre.  They used a persistent array under the hood, and threw in a bit of unsafe mutation to implement path compression.  Well, the path compression bit is easy: it’s just an atomicModifyIORef.  It’s perfectly safe, since no exposed function can ever give a different result depending on whether it’s been modified or not.  In fact, I’m not even certain it needs to atomic, but I’ve played it safe for now.  The second major change made by Conchon and Filliatre is sadly less safe: they arranged to make changes to the persistent array (naively a DiffArray) to “reroot” it when old versions are accessed.  However, their code for doing this is clearly a minefield in the presence of multithreading… I was up for tackling the task, until I realized that STM is forbidden inside unsafePerformIO, and the interactions between various locks are mind-boggling…

Rather than enter the depths of thread-safety hell, or else potentially expose an API that claims it’s safe but really isn’t, I instead gave up.  DiffArray is good enough for us anyway, since we won’t be doing any backtracking.

Just as a side comment, this structure is a widely known example of an imperative structure that’s hard to translate into the functional world… but when you *do* translate at least its public interface, the result is rather beautiful.  I’ve never seen a specification of “union find” that I’d consider particularly enlightening… but when it’s converted to a functional interface, it’s immediately clear what you’re dealing with: equivalence relations.  Instead of talking about some operation names that were made up for this specific purpose, we’re looking at a very simple idea from mathematics.  The imperative viewpoint, though, obscured this by encouraging you to only speak in terms of the operations, and avoid ever talking about a specific equivalence relation you’ve got.  I’m much happier with the functional interface.

Generating the Maze

Now that I’ve got an implementation of equivalence relations, I’m well on my way to having a maze.  I declare a few data types for cells and walls:

-- Vertical walls are to the right of their cell (so the x component
-- must be less than width - 1), and horizontal walls are to the top
-- of their cell (so the y component must be less than height - 1).

type Cell = (Int, Int)
data Wall = H Cell | V Cell deriving (Eq, Show)

And I write the code to generate a maze, which works out in a nice recursive style.

process rooms []     = []
process rooms (H (x,y) : ws)
    | equiv rooms (x,y) (x,y+1) = H (x,y) : process rooms ws
    | otherwise                 = process (equate (x,y) (x,y+1) rooms) ws
process rooms (V (x,y) : ws)
    | equiv rooms (x,y) (x+1,y) = V (x,y) : process rooms ws
    | otherwise                 = process (equate (x,y) (x+1,y) rooms) ws

genMaze :: RandomGen gen => Int -> Int -> gen -> [Wall]
genMaze w h gen = finalWalls
  where allWalls = [ H (x,y) | x <- [0 .. w-1], y <- [0 .. h-2] ]
                ++ [ V (x,y) | x <- [0 .. w-2], y <- [0 .. h-1] ]
        startRooms = emptyEquivalence ((0,0), (w-1, h-1))
        startWalls = shuffle' allWalls (length allWalls) gen
        finalWalls = process startRooms startWalls

To generate a maze, you make a list of all the walls in the chosen size, shuffle them (using the random-shuffle package from Hackage), build an initial empty equivalence relation between cells (that is, each cell is its own separate room), and start considering the walls one by one in the random order chosen.  For each wall, if its two sides are already connected (the cells are in the same equivalence class), you keep the wall.  Else, you knock it down, and proceed with a new equivalence relation considering those cells connected now.  This is exactly the algorithm given at the start.

Showing the Result

I turned to Gtk2Hs for a quick GUI displaying the resulting maze.  The code is not terribly interesting, so I’ll just drop it in for the sake of completeness.

First, we have Cairo code for drawing the maze (for the sake of simplicity, I’ve hard-coded each cell to 30×30 pixels):

drawMaze :: Int -> Int -> [Wall] -> Render ()
drawMaze w h walls = do
    rectangle 10 10 (30 * fromIntegral w) (30 * fromIntegral h)
    forM_ walls $ \wall -> case wall of
        H (x,y) -> do
            moveTo (10 + 30 * fromIntegral x) (40 + 30 * fromIntegral y)
            lineTo (40 + 30 * fromIntegral x) (40 + 30 * fromIntegral y)
        V (x,y) -> do
            moveTo (40 + 30 * fromIntegral x) (10 + 30 * fromIntegral y)
            lineTo (40 + 30 * fromIntegral x) (40 + 30 * fromIntegral y)

Next we’ll build a window to display it:

displayMaze :: Int -> Int -> [Wall] -> IO ()
displayMaze w h walls = do
    wnd <- windowNew
    wnd `on` deleteEvent $ liftIO mainQuit >> return False
    set wnd [
        windowDefaultWidth  := 20 + 30 * w,
        windowDefaultHeight := 20 + 30 * h ]

    da <- drawingAreaNew
    containerAdd wnd da
    da `on` exposeEvent $ do
        dw <- eventWindow
        liftIO $ renderWithDrawable dw (drawMaze w h walls)
        return False

    widgetShowAll wnd

Finally, we pull that all together…

main = do
    [read -> w, read -> h] <- getArgs
    gen <- newStdGen
    displayMaze w h (genMaze w h gen)

There you are, mazes in Haskell, my way!


Just for the fun of it, I present a couple mazes:

April 26, 2011 / cdsmith

Composing state with functions and lenses

Here’s the scenario: you’re writing some stateful code.  Maybe it’s threaded state (a State monad), or maybe it’s just fixed shared state (a Reader monad).  So you’ve got a lot of types flying around like:

Reader X a

Reader X a -> Reader X b

State X a

State X a -> State X b

But then you take your stateful code, and try to compose it with someone else’s stateful code, and their state is different.  That is, they have:

Reader Y a

Reader Y a -> Reader Y b

State Y a

State Y a -> State Y b

Question: how do you get these pieces of code to work together?

Clearly you’ll need some kind of relationship between the types X and Y, or you have no hope.  But what kind of relationship do you need here?  We’ll consider each of the types in turn.

Case 1: Reader X a / Reader Y a

In this case, you’ve got a Reader X a, and a Reader Y a, and you want the combine them.  It turns out all you need here is a function from one to the other, and you can turn these into compatible types to compose them nicely.  The following is in the mtl package already.

withReader :: (p -> q) -> Reader q a -> Reader p a

That’s not surprising, actually.  After all, Reader x y is conceptually just a newtype wrapper around x -> y, so withReader is a fancy name for function composition!

withReader f r = reader (runReader r . f)

Note the contravariance there… you pass in a function p -> q, but what you get back is Reader q a -> Reader p a, in the opposite order.  That makes a lot of sense, though, if you think it through.  (Exercise for the reader: think it through.)

Case 2: (Reader X a -> Reader X b) / (Reader Y a -> Reader Y b)

Another situation that comes up is that we’ve got a way of wrapping reader monads.  This happens particularly often if you’re building up values by making changes to other values.  For example, one of the two primitives from the MonadReader class, local, gives you precisely this kind of map between Reader monads.

The first thing we notice here is that a function from one state type to the other cannot possibly be good enough, because a conversion doesn’t even have any clear meaning on those types.  What turns out to work for us, though, is a lens.  A lens can be thought of as a getter/setter pair, and I’ll use the definition from the excellent fclabels package.  Here’s what you need to know:

data a :-> b
lens :: (a -> b) -> (b -> a -> a) -> (a :-> b)
getL :: (a :-> b) -> (a -> b)
setL :: (a :-> b) -> (b -> a -> a)

In other words, a :-> b (note the colon) is the type of lenses from a to b.  You construct them by providing a getter and a setter to the lens function, and you can extract the getter and setter from getL and setL.  They can also be composed like functions, and have identities (in other words, they form the arrows of a category).

With both getters and setters in mind, we can set out to compose the types earlier.

wrapWithReader :: (x :-> y)
               -> (Reader y a -> Reader y b)
               -> (Reader x a -> Reader x b)
wrapWithReader l f r = reader (\x ->
    runReader (f (reader (\y -> runReader r (setL l y x)))) (getL l x))

This may look complex, but mostly the complexity is in constructing and deconstructing the Reader monad newtype.  The definition is straight-forward aside from that: to turn the Reader x a into a corresponding Reader x b, you simply consider the Reader y a that results from fixing the non-y bits of the input, transform it, and then map it back.

Case 3: State X a / State Y a

The third case is where we have a state monad rather than a reader monad.  Since changes of state in the state monad are almost interchangeable with a lens, it turns out a lens is what we need here, too.  We can implement this without too much trouble.

withState :: (x :-> y) -> State y a -> State x a
withState l s = state (\x -> let (a,y) = runState s (getL l x)
                             in  (a, setL l y x))

In other words, we pull the y out of the x, run the state computation with it, and then push the resulting y back into the x to get a modified x.  Works like a charm.

Case 4: (State X a -> State X b) / (State Y a -> State Y b)

The final case, and the most complicated one yet, arises if you have a function to modify state types, and need to change the type of the state.  Sadly, even a lens is not sufficient to assign a reasonable meaning to this conversion.  To make sense of such a transformation, you need to know something even stronger: we’ll do it where there is an isomorphism between the types X and Y.  Then the composition can be seen as transforming the functions by simply treating one as the other.

Fortunately, fclabels has the types we need still!

data x :<->: y
(:<->:) :: (x -> y) -> (y -> x) -> (x :<->: y)
fw :: (x :<->: y) -> (x -> y)
bw :: (x :<->: y) -> (y -> x)

An isomorphism is just a pair of inverse functions between the types, meaning they are essentially interchangeable.  Then it’s easy to build the state wrapper converter, which interchanges them:

wrapWithState :: (x :<->: y)
              -> (State y a -> State y a)
              -> (State x a -> State x a)
wrapWithState iso f = t (bw iso) (fw iso) . f . t (fw iso) (bw iso)
    where t fw bw s = state (second fw . runState s . bw)

And voila, composable state types!

(Side note: It is possible to build something with the type given above for wrapWithState but using just a lens instead of an isomorphism.  Sadly, it doesn’t act as you’d expect for doing the composition.  Also, Gregory Collins pointed out to me that you can implement wrapWithState with the expected behavior and just a lens, if you give it a rank 2 type and require that the wrapper function be universally quantified on the result type.  Neither of these quite what we’re looking for, though, and the isomorphism is needed to get something with the obvious meaning.)

(Second side note: I’ve done this with pure state and reader monads for simplicity; but it’s easy to generalize to StateT and ReaderT, if that’s what you want.)

April 15, 2011 / cdsmith

A Correspondence Involving Getters and Setters

One of the reasons that I love Haskell is that it leads you to fascinating thought experiments.  Here’s one of mine.  The conclusions aren’t particularly earth-shattering, but they are interesting.

One of the most common things to do in an imperative programming language is to build getters and setters for the properties of an object.  In Java, they may look like this:

public X getFoo();

public void setFoo(X val);

The obvious mapping from there into a purely functional approach gives you this, for a record type R and a field type F:

getFoo :: R -> F

setFoo :: F -> R -> R

The fact that we have two separate functions here is unpleasing to me, though.  Without being quite able to explain why, I’d really like to have just one type that completely describes the property “foo”.  A product type is definitely cheating… but this would definitely satisfy me, if it works:

foo :: forall t. (F -> (F,t)) -> (R -> (R,t))

I’m interested in this type for two reasons: first, because it’s fairly easy to embed both a getter and a setter together into such a type.  Suppose you give me the functions getFoo and setFoo.  Then I can certainly embed them into a foo, in such a way that they can be recovered.

foo g r = let (f,v) = g (getFoo r) in (setFoo f r, v)

getFoo’ = snd . foo (\x -> (x,x))

setFoo’ v = fst . foo (\x -> (v,()))

It’s a straight-forward matter of substitution to see that getFoo’ and setFoo’ are identical to their original counterparts.  So one can construct a value of the form of foo given any getter and setter combination, and given any such value of the type of foo, one can extract a getter and a setter.  The second reason I care about that type, though, is that has a natural meaning aside from just embedding a getter/setter pair.  Recall that the State monad (with state type S, for example) is a newtype wrapper around (forall t. S -> (S,t)).  So this can be seen as a state transformer.  It takes a stateful transformation, and changes the type of the state.

Now, the rather more involved question is whether there exist state transformers (values of the type of foo) that do not arise in that way as the straightforward embedding of getter and setter functions.  In other words, could foo be something more than just the encoding of a getter and a setter into a function?

Alas, the answer is yes.  It would be nice if the product type of getters and setters were isomorphic to the type of state transformers, and that is very nearly true… but not quite.  To see the reasoning work, first note that the type (a -> (b,c)) is isomorphic to (a -> b, a -> c).  (This is the type isomorphism version of distributing an exponent over a product).  This lets use split up foo into two parts as follows:

foo1 :: forall t. (F  -> F) -> (F -> t) -> R -> R

foo2 :: forall t. (F  -> F) -> (F -> t) -> R -> t

We can simplify a little by arguing based on the universal quantification.  Note that foo1 is given as a parameter a function of type (F -> t), but it cannot possibly make any use of the value, since it does not know anything about the type t.  Furthermore, foo2 must produce a value of type t, and can do so only through its parameter of type (F -> t), which can only be used for that purpose.  So these turn out to be equivalent to the following simpler types:

modifyFoo :: (F -> F) -> R -> R

filteredGetFoo :: (F -> F) -> R -> F

I’ve named them suggestively, because I have a bit of intuition for what these things tend to mean.  Let’s now look at what happens to the getFoo and setFoo functions that we were able to define from the original foo:

setFoo v = modifyFoo (const v)

getFoo = filteredGetFoo id

This all looks as you might expect… but remember that the question is whether modifyFoo and filteredGetFoo are completely determined by the getter / setter pair arising in that way.  Clearly they are not.  In particular, note that you can iterate a constant function 1 or more times, and always get the same answer no matter the number of iterations; and the identity function similarly for zero or more times.  So some interesting logic can be built into modifyFoo or filteredGetFoo with respect to iterating the function passed as a parameter (a constant number of times, or maybe until some predicate holds, or perhaps something more complex), and though this would change the behavior of the modify and filteredGet operations for some inputs, it would have no effect on the get and put operations.

Still, we’ve got something interesting here.  I wonder if there are interesting “non-standard” definitions of modify and filteredGet for some common record type.  If so, then they would lead to interesting transformations on values of the State monad, which don’t arise from get and set in the normal way.  Makes you wonder, doesn’t it?


Get every new post delivered to your Inbox.

Join 86 other followers