Sententia cdsmithus

March 7, 2008

Functions and Partial Orders

Filed under: Haskell, math — cdsmith @ 4:03 pm

In a few other blogs (especially here), there has been some discussion of the mathematical foundations of functions in programming languages, starting from topology. Just for fun, I will approach from the other direction, describing functions that can be defined in various languages in terms of partially ordered sets. Of course, for those of you who know this already, then you can go on and define topologies on these posets a la Dana Scott, so it amounts to the same thing. But this way has more pretty pictures.

Functions and Partial Information

Suppose we want to look at functions from the natural numbers to the natural numbers. (By the natural numbers, I mean non-negative integers.) There are, of course, infinitely many such functions. Let’s pick a few that are interesting:

  1. The constant function f(x) = 5.
  2. The identity function g(x) = x.
  3. The successor function h(x) = x + 1.
  4. The doubling function p(x) = 2x.
  5. The square function q(x) = x2.

That should give us a good start. These functions are all quite well-defined over the integers. Indeed, I can implement them all in my favorite programming language, and I’ll get exactly that (note: only because my favorite programming language does arbitrary precision integers, of course.)

The next question is this: what happens if I don’t know everything about the input to the function? What can I infer about the output given partial information about the input? Here’s some of the things I might ask of that sort:

x x = 2 x is even x is at least 5 x could be anything
f(x) f(x) = 5 f(x) = 5 f(x) = 5 f(x) = 5
g(x) g(x) = 2 g(x) is even g(x) is at least 5 g(x) could be anything
h(x) h(x) = 3 h(x) is odd h(x) is at least 6 h(x) is at least 1
p(x) p(x) = 4 p(x) is even p(x) is at least 10
p(x) is even
p(x) is even
q(x) q(x) = 4 q(x) is even
q(x) is a perfect square
q(x) is at least 25
q(x) is a perfect square
q(x) is a perfect square

Getting back this sort of partial information is trickier than implementing the function over totally known arguments. That’s what this blog post is about.

Partial Orders and Up-Sets

The trick is to build a partial order on the domain of the function. A partial order is some relation, ≤, that has three properties:

  • Reflexive: For all x, xx.
  • Anti-symmetric: If xy and yx, then x = y.
  • Transitive: If xy and yz, then xz.

Everyone is familiar with the ≤ that means “is no larger than.” But there are other perfectly good definitions as well. For example “is a factor of” is a perfectly good meaning for ≤. Now here’s where the pretty pictures come in. We can draw a diagram for a partial order, where there’s an upward path from x to y precisely when xy.

For example, here is part of the diagram for the familiar partial order “is no larger than” on the natural numbers:

Familiar Partial Order on the Naturals

And here is part of the diagram for “is a factor of”:

Divisibility Order on the Naturals

Each of these partial orders contains some information about the relationships between natural numbers. In particular, one can identify certain properties of the natural numbers as belonging to up-sets: sets that have the property that if they contain something, they also contain everything that it greater than it. So “at least 5″ is the up-set of naturals in the familiar order that contains 5 and everything above it. We can call this the up-set generated by 5. In the second (divisibility) order, the up-set generated by 5 can be described as “divisible by 5″.

To use these orders to get information about partial inputs to functions, we need two things:

  1. Up-sets that capture the partial information we have.
  2. Functions that preserve that order.

Let’s shoot for the first one. Looking at the kinds of partial information we had:

  • x is even. This is the up-set generated by 2 in the divisibility order.
  • x is at least 5. This is the up-set generated by 5 in the familiar order.
  • x could be anything. This is the up-set generated by the bottom element of any partial order we like, so long as it has a bottom element.
  • x is a perfect square. We’ll need a quite complicated order for this one. We can order the natural numbers by the divisibility order on the least common divisor of the exponents of their prime factorizations. It turns out this isn’t even a valid partial order (it violates anti-symmetry), but I can treat it as a partial order on equivalence classes, and things work out. Then perfect squares are the up-set generated by [2] (the equivalence class of numbers whose gcd of the exponents in their prime factorization is precisely 2) in this order.

Once we’ve chosen a partial order whose up-sets capture the information we want, we have to verify that the function actually preserves the order. That is, if we take an up-set generated by x, and pass everything in it through the function, will the result live in the up-set generated by f(x). The successor function h(x) = x + 1 that we defined earlier has this property in the familiar order. If I have a set of all numbers bigger than 4, and I add one to them all, I get all the numbers bigger than 5. However, h(x) does not preserve up-sets in the divisibility order. If I have all numbers divisible by 3, and I add one to them all, there’s no guarantee that the resulting numbers will be divisible by 4! So I have to be careful about using an order-preserving function, if I want any kind of partial information from up-sets.

What a mess! Sure, if I can define the right partial order, and then prove that the function preserves up-sets on the partial order, and sometimes pull this trick with equivalence classes, and then I get to say something about the behavior of my function on partial information. But that seems less than fulfilling somehow. It feels like I’m working too hard, and now I can’t even define some functions, depending on which order I choose.

Sorting Out the Mess

So what do we do with all of this?

The first thing to do is to pick an order and stick with it. We’ll lose some partial information, but we’ll get to keep other partial information. The logical choice for natural numbers is the familiar order. It’s fairly common for me to want to know that a number is at least 3. It’s somewhat less common for me to want to know that a number is a multiple of 3. But I won’t assume any choice, and in a future blog post, I’ll demonstrate how to build numeric data types in Haskell that parallel several possible choices, and exhibit different laziness characteristics as a result.

The next thing we want to do is recover the ability to define all functions again. The problem with defining some functions is that there’s something “greater than” the number we’re looking at that won’t stay that way once we’re done. The obvious solution is to just make sure there’s nothing “greater than” a number. We can invent a whole set of pseudo-numbers: ~0, ~1, ~2, and so on. Then we’ll put the numbers themselves above that. Here’s the new set of natural numbers with the familiar order:

The Lazy Naturals by the Familiar Order

And voila! I can define all my favorite functions with the actual numbers. What remains is to extend my function onto the pseudo-numbers as well. In this order, the pseudo-number ~n means “at least n“. There’s always one safe way to do this: just map everything to ~0, the minimal element. However, if I still want to get partial information about the function, I need to do better.

Here’s the result for the divisibility order:

Lazy Naturals with Divisibility Order

Here we have the same thing, except that “~n” now means “divisible by n“. Again, I need to extend my functions to the pseudo-numbers. I can still map everything to the bottom element (in this case, “~1″), but this throws away any partial information I may have been keeping.

Finally, one more option. This corresponds to a partial order on an equivalence class, sort of like what came out of the perfect squares example. This one, though, is the trivial equivalence class that lumps everything together. The result is the strict natural numbers, in which I must know the entire number to get any information out of any (non-constant) function.

Strict Natural Numbers

Here only a constant function could do anything except map ~0 back to ~0. Therefore, unless your function is constant, the only way to get any information back from your input is to know the precise value of the input. This is an important arrangement of the natural numbers because for efficiency reasons, in the particular case of integers, this is precisely what all real programming languages do. We’re just using natural numbers as an example, though; the ideas here apply equally well to more complex structures where laziness exists.

It turns out, now, that once we’ve chosen a partial order for the pseudo-numbers, there is sometimes a best possible way to extend any function over the naturals to include pseudo-numbers. To determine the value of any function f at ~n, we’d like to find the greatest lower bound of f(x) for all x~n. Now if the partial order we chose happens to be something called a complete meet-semilattice (all of the orders we’ve talked about are), then that greatest lower bound exists.

Connections to Programming Languages

As you may have guessed by now by my occasional references, this is all connected to laziness of functions in programming languages. I had intended to write about that now, but this has gotten too long already, so I am instead going to stop here and finish the other half at a later time. Hope you enjoyed it!

July 17, 2007

A Neat Problem

Filed under: Haskell, math — cdsmith @ 11:41 pm

Here’s a neat game, with a really neat solution.  I’ll post the solution later, but this post shows a Haskell implementation of the game.  I first encountered this game as a puzzle in a game (an otherwise unremarkable one) called Keepsake by The Adventure Company.

The game looks like this.  There are five displays, which display the letters A through E.  The letters cycle through that sequence, wrapping back to A at the end.  There are also five buttons, which are numbered 1 through 5.  Each of the button changes each display my some random amount (from zero to four steps forward).  Pressing a button five times consecutively will always get things back to where they were before it was pressed. The goal is to get from the (randomly chosen) starting point to the (randomly chosen) ending point.

For example, if the starting point were ABCDE, the target were EDBDE, and button 3 advanced the first and third dials by two and the second by 1 (what I’ll call a “signature” of [2, 1, 2, 0, 0]), then the correct answer could be obtained by pushing button 3 twice.  The first press would change it to CCEDE, and the second to EDBDE (because the third dial wraps).

module Main where

import Control.Monad (replicateM)
import System.Random
import System.IO

Three fundamental pieces of information are used to define instances of this problem.  There is a starting display, a target display, and the actions of each of the five buttons.  The program chooses random values for these three, prints the target, and then starts processing moves by the user.

main :: IO ()
main = do   mat  <- buildMat
            disp <- replicateM 5 $ randomRIO ('A', 'E')
            tgt  <- replicateM 5 $ randomRIO ('A', 'E')
            putStrLn $ "Target = " ++ tgt
            continue mat disp tgt

The actions of each of the five buttons are represented in a matrix of sorts.  Each button has a row in the matrix; and the columns represent the effect on each of the letters in the display.

type Matrix = [[Int]]

Random values for the matrix are chosen as follows.

randomMat :: IO Matrix
randomMat = replicateM 5 $ replicateM 5 $ randomRIO (0,4)

There is one quirk, though.  We want the problem to have a solution.  Clearly there are some random matrices for which the problem has no solution.  (For a trivial example, imagine a matrix of all zeros; then it’s impossible to change the display.)  It turns out that, for reasons I will explain when I post the solution, it is sufficient for the matrix to have a non-zero (mod 5) determinant.  So here’s the code to calculate the determinant (by cofactor expansion along the first column)

det :: Matrix -> Int
det mat | length mat == 1 = head (head mat)
        | otherwise       = sum $ map det' [0 .. length mat - 1]
    where det' n = let x      = head (mat !! n)
                       submat = map tail (take n mat ++ drop (n + 1) mat)
                       sign   = if n `mod` 2 == 0 then 1 else -1
                   in  x * sign * det submat

The choice of a matrix, then, is performed by continuing to generate matrices until we find one with a non-zero (mod 5) determinant.

buildMat :: IO Matrix
buildMat = do m <- randomMat
              if det m `mod` 5 == 0 then buildMat else return m

Once the problem is generated, the next step is to keep asking for which button to press until the user accomplishes the goal.

continue :: [[Int]] -> [Char] -> [Char] -> IO ()
continue mat disp tgt | disp == tgt = do
    putStrLn "Congratulations!  You win."

                      | otherwise   = do
    putStrLn disp
    putStr "Enter 1 through 5: "
    hFlush stdout
    s        <- getLine
    let n     = read s
    let disp' = apply mat (n - 1) disp
    continue mat disp' tgt

Only two more functions are needed.  First, apply takes the matrix and a button number, and calculates the result of pressing that button.

apply :: Matrix -> Int -> [Char] -> [Char]
apply mat n vals = zipWith ($) (map add (mat !! n)) vals

Finally, the add function rotates a single letter of the display by a given amount.

add :: Int -> Char -> Char
add n s = iterate nxt s !! n
    where nxt 'A' = 'B' ; nxt 'B' = 'C'
          nxt 'C' = 'D' ; nxt 'D' = 'E'
          nxt 'E' = 'A'

Putting all of this together gives an implementation of the puzzle.  The puzzle can be fun to play with, but there’s also a neat straight-forward mathematical technique for finding the solution, which I’ll post in a few days.  Until then, have fun!

July 5, 2007

Learning Haskell and Number Theory: The End of GCD

Filed under: Haskell, math — cdsmith @ 10:41 pm

This is the fifth installment of my series “Learning Haskell and Number Theory”, in which I work through Gareth and Mary Jones’ text, Elementary Number Theory, and translate its ideas and algorithms into Haskell.  In the last installment, I used the Haskell foldl function to convert a binary greatest common denominator into one that operates on a list.  This time, I fix a glitch in that implementation, the dreaded “space leak,” and look at strict and lazy evaluation in Haskell.

The implementation of gcdmany from last time looked like this:

gcdmany :: forall a. Integral a => [a] -> a
gcdmany = foldl gcd1 0

This is a correct implementation, but not a good one.  Before we can understand why, though, we need a diversion through tail recursion.  Everyone who’s done much functional programming knows the importance of tail recursion.  It turns O(n) space into O(1) space.  For example, this implementation of factorial:

fact 0 = 1
fact n = n * fact (n - 1)

uses O(n) space because it’s not tail recursive.  The evaluation of fact (in a strict language; I’ll talk about Haskell’s quirks later) would look like this:

fact 6
6 * (fact 5)
6 * (5 * (fact 4))
6 * (5 * (4 * (fact 3)))
6 * (5 * (4 * (3 * (fact 2))))
6 * (5 * (4 * (3 * (2 * (fact 1)))))
6 * (5 * (4 * (3 * (2 * (1 * (fact 0))))))
6 * (5 * (4 * (3 * (2 * (1 * 1)))))
6 * (5 * (4 * (3 * (2 * 1))))
6 * (5 * (4 * (3 * 2)))
6 * (5 * (4 * 6))
6 * (5 * 24)
6 * 120
720

Each line represents the set of computations that needs to be stored in computer memory at any stage of execution.  You can see that the memory needed grows with the value of n.  (It doesn’t matter much for factorial, because merely storing the answer also uses space that’s O(n) in the size of the input.  That’s beside the point.)

An alternative implementation is:

fact n = fact' 1 n
    where fact' a 0 = a
          fact' a n = fact' (a * n) (n - 1)

This one is made tail recursive, by following a very common technique of keeping an accumulator as an explicit parameter (in this case, a) with an outer setup function.  If you look back at the fourth installment, you’ll see that this is exactly what I did to turn the first implementation of gcdmany into the second one.  I introduced an explicit accumulator parameter, and then used it to convert the routine into a tail recursive one with setup.  In a strict language (like, say, the ML family of languages) that would be sufficient to turn the fact function into O(1) space.

The execution of this one (again, in a strict language) looks like this:

fact 6
fact'   1 6
fact'   6 5
fact'  30 4
fact' 120 3
fact' 360 2
fact' 720 1
fact' 720 0
720

This is much better!  In our hypothetical strict language, we’ve solved the problem.  In Haskell, though, it’s not quite good enough.  Haskell, being a lazy language, won’t even do simple multiplication until we ask for the result.  The execution, therefore, looks like this:

fact 6
fact' 1 6
fact' (1 * 6) 5
fact' (1 * 6 * 5) 4
fact' (1 * 6 * 5 * 4) 3
fact' (1 * 6 * 5 * 4 * 3) 2
fact' (1 * 6 * 5 * 4 * 3 * 2) 1
fact' (1 * 6 * 5 * 4 * 3 * 2 * 1) 0
1 * 6 * 5 * 4 * 3 * 2 * 1
6 * 5 * 4 * 3 * 2 * 1
30 * 4 * 3 * 2 * 1
120 * 3 * 2 * 1
360 * 2 * 1
720 * 1
720

And, oops!  We’re back to using O(n) space again.  This is the dreaded “space leak”.

Back to our gcdmany function.  It, too, contains such a space leak.  Instead of computing intermediate greatest common divisors as it applies arguments one by one, it actually accumulates the GCD operations for the whole list, before it starts to evaluate any of them.

We’d like to fix this.  Haskell provides a way to do so: selective strict evaluation.  In other words, we can tell Haskell that certain things must be evaluated strictly rather than put off for later.  Doing this never changes the answer to a successful computation, but it can do two things:

  1. Make a computation run faster, and with less memory.
  2. Make a computation fail or run forever, when it might otherwise have succeeded.

The first effect is what we want here.  The second, though, is interesting as well.  Consider the following code:

mylist = [1, 2, 3, undefined]
main   = print (length mylist == 4)

Running this program will print True.  In a strict language, this program would fail because the fourth element of the list is undefined.  (To elaborate more, you might design a language to avoid failing if the term is known to be undefined.  But what if teh function that computes it goes into an infinite loop?  In Haskell, it wouldn’t matter because we never use that value.  In a strict language, it would matter quite a lot!)

In the case of gcdmany, though, I can verify that I am not doing anything that might cause the system to go into an infinite loop or fail.  It is, therefore, safe, to replace it with a strict version.

The first way of doing this is standard Haskell 98.  It looks like this:

gcdmany :: forall a. Integral a => [a] -> a
gcdmany xs = gcdmany' 0 xs
    where gcdmany' p []     = p
          gcdmany' p (x:xs) = (gcd1 p x) `seq` gcdmany' (gcd1 p x) xs

The magic function seq makes this all work.  If you read the description of seq in the standard library, it’s quite confusing for beginners to Haskell.  What it really means is that is evaluates the first bit of code (so, as a side effect, if the first bit goes into an infinite loop or throws an exception, it’ll do that too), and then goes on with the second one as if the first one didn’t exist.  However, evaluating the first one means that instead of it being work put off until later, the result is already known.  That gets us back to the nice O(1) memory usage.

The second way of doing this does the same thing, but is shorter and only works in GHC with the -fallow-bang-patterns option.

gcdmany :: forall a. Integral a => [a] -> a
gcdmany xs = gcdmany' 0 xs
    where gcdmany' !p []     = p
          gcdmany' !p (x:xs) = gcdmany' (gcd1 p x) xs

The ! in front of the argument p does the same thing as seq but is much nicer to read.

Both of these require that we write out gcdmany by hand, though.  Before, we were able to make it a lot nicer using foldl.  The third option fixes that.  It’s not part of the standard Haskell 98 prelude, but it does exist in GHC.

gcdmany :: forall a. Integral a => [a] -> a
gcdmany = foldl' gcd1 0

The foldl’ function is just a strict version of foldl.  It does exactly what we want.  So with all those words, the solution was achieved by adding one small character.  (Why, though?  Because of the ability to use higher order functions to do functional abstraction!)

In general, one wants to use foldl’ when folding over an operation that produces a distinct result, and foldr when folding with a data constructor or an operation that builds more complex data structures.  In the first case, you avoid space leaks.  In the second case, you can use lazy evaluation to only build a piece at a time of the data structure you’re operating on, and thus save even more memory through lazy evaluation.

That’s all for now.  Next time, I’ll look at some cool connections between Euclid’s GCD algorithm and the Fibonacci numbers, and then move on to chapter 2.

June 24, 2007

Learning Haskell and Number Theory: GCD and Higher Order Functions

Filed under: Haskell, math — cdsmith @ 9:24 pm

This is the fourth installment of my series “Learning Haskell and Number Theory”, in which I work through Gareth and Mary Jones’ text book, Elementary Number Theory, and translate its ideas and algorithms into Haskell.  In the last installment, I wrote a function to compute the greatest common divisor of two numbers.  In this installment, we will play with higher order functions, and culminate with the extension of greatest common divisors to an arbitrary number of inputs.

The second of the two GCD implementations in the previous installment looked like this:

gcd2 0 0 = undefined
gcd2 a b = let next (x,y)    = (y, x `rem` y)
               vals          = iterate next (abs a, abs b)
               Just (ans, _) = find ((== 0) . snd) vals
           in  ans

The expression iterate next (abs a, abs b) is interesting, in that its first parameter (next) is itself a function. This general idea can be extended to all sorts of tasks, and is one of the cornerstones of functional programming.  It goes by a name: higher order functions.  (The general intuition here is that a first order function is one that operates on basic pieces of data, such as integers or strings; a second order function is one that operates on first order functions, and so on.  I don’t like this intuition because it ignores the fact that what’s chosen as “simple” data in any given language is arbitrary; but I don’t get to pick these names.)  A higher order function can take other function as parameters, and it can evaluate to a function as its result.  If it does both of these, it can be seen as a sort of transformer of functions.

I intend to write a higher-order function, and then talk about some of them that are provided in the Haskell Prelude.  Before I can get there, though, I need to clean up something that many of you pointed to in comments earlier, and that’s the meaning of the greatest common divisor of zero and zero.  The Jones text calls it undefined, which makes very literal sense.  Everything divides zero, and the set of integers has no greatest element.  Therefore, there can be no greatest divisor of zero.  However, as many of you pointed out, defining gcd(0, 0) = 0 has some nice benefits.

One of those benefits is that it allows me to continue where I want to go.  (I’m still following the book, but in fact the book is wrong here.  As an exercise, if you’ve got the book, see if you can find the error in the book’s answer to exercise 1.9.  There must be an error, but the statement for which it wants a proof is not true!)

Therefore, we’ve got:

gcd1 :: forall a. Integral a => a -> a -> a
gcd1 a b = gcd1' (abs a) (abs b)
    where gcd1' a 0 = a
          gcd1' a b = gcd1' b (a `rem` b)

Suppose we want the greatest common divisor of more than two numbers.  Following example 1.9 of the Jones text, we could define:

gcd' :: forall a. Integral a => a -> a -> a -> a
gcd' a b c = gcd (gcd a b) c

gcd'' :: forall a. Integral a => a -> a -> a -> a -> a
gcd'' a b c d = gcd (gcd (gcd a b) c) d

gcd''' :: forall a. Integral a => a -> a -> a -> a -> a -> a
gcd''' a b c d e = gcd (gcd (gcd (gcd a b) c) d) e

Oops… looks like we’re going on forever here. We could define gcd'''' or even worse.  Better is to exploit this fact:

*NumTheory> :qc \a -> gcd1 0 a == abs a
+++ OK, passed 100 tests.

Great!  So that gives us a starting point, 0, into which we can add numbers one by one until we have the greatest common divisor of them all.  We need a function that converts a list of integral values into a single integral value representing the GCD.

gcdmany :: forall a. Integral a => [a] -> a
gcdmany [] = 0
gcdmany xs = let x = init xs
                 y = last xs
             in  gcd1 (gcdmany x) y

The new functions init and last do what they say. The init function returns the initial part of a list — more precisely, all but the last element. The last function returns the last element of the list. Note that x is again a list!  There is, of course, one more thing that may be tricky about this code: it is recursive, like most useful things in Haskell.  Some careful thought, though, should convince you that this does exactly what the book suggests: from left to right, it applies the greatest common divisor algorithm to each pair of numbers until it gets to the end.

It’s quite inefficient, though.  At turns out that in haskell, it’s much faster to pick the first element from a list than the last one, because lists are singly linked in the forward dimension.  A more complicated, but faster, implementation is:

gcdmany :: forall a. Integral a => [a] -> a
gcdmany xs = gcdmany' 0 xs
    where gcdmany' p []     = p
          gcdmany' p (x:xs) = gcdmany' (gcd1 p x) xs

If you’ve got the hang of Haskell, you can check that this works.  The only thing I haven’t mentioned yet is the list constructor, :, which is basically LISP’s cons function if you know other functional languages.  If the code remains a mystery to you (which would be normal if you’re new to the language) you can study it, or just trust me.  The problem is that this is a little more complicated than we’d like to write again and again, and it turns out that you want to do exactly this lots and lots in Haskell code!  A higher order function would be handy, and the standard library provides one, called foldl, that does exactly what we’re looking for.

gcdmany :: forall a. Integral a => [a] -> a
gcdmany xs = foldl gcd1 0 xs

The last thing we can do, just be fancy, is called eta reduction.  This is the observation that if f(x) = g(x) for all x, then f = g.  This property is sometimes known as extensional equivalence, and it’s a property of pure functions, by definition. The xs at the end of the left and right-hand sides of the Haskell equation match up just like the xs in the math equation, and we can remove them.

gcdmany :: forall a. Integral a => [a] -> a
gcdmany = foldl gcd1 0

And that’s pretty simple.

June 6, 2007

Learning Number Theory and Haskell: Greatest Common Divisor

Filed under: Haskell, math — cdsmith @ 3:12 pm

This is part three of my series, “Learning Number Theory and Haskell”, in which I work through Gareth and Mary Jones’ Elementary Number Theory and translate the ideas and concepts into Haskell. In the previous two parts, I used QuickCheck to verify some theorems and translate some ideas into Haskell code.

It is widely speculated that the first real algorithm worthy of the name was due to Euclid, when he described how to calculate the greatest common divisor of two terms. What better place to start, then in writing a first serious algorithm in Haskell? (Besides, the textbook does this next.) In the process, I’ll use “equational reasoning” (a popular word in functional programming) to improve the code. Then I’ll try something completely different to finish off

First, let’s write a type signature. Greatest common divisors can be calculated with integral values only, and maps two numbers to a result. The signature is:

gcd1 :: forall a. Integral a => a -> a -> a

For a first version of the algorithm, I plan to transcribe the proof from the text as code. There are a few trivial cases involving zero gotten near the top of page 6. We can write those separately using pattern matching.

gcd1 0 0 = undefined
gcd1 a 0 = abs a
gcd1 0 b = abs b

There are a few more special cases for non-zero arguments.

gcd1 a b | a <  0    = gcd1 (-a) b
         | b <  0    = gcd1 a (-b)
         | a == b    = a
         | a <  b    = gcd1 b a
                                 -- Phew!
         | otherwise = gcd1 b (a `rem` b)

Let’s test against the built-in gcd function from the prelude. The only subtlety here is that since gcd 0 0 is undefined, we should avoid calling it.

*NumTheory> :qc \a b -> (a /= 0 || b /= 0) ==> gcd1 a b == gcd a b
+++ OK, passed 100 tests.

We’ve got a working gcd1. However, it is terribly unsatisfying. By following the textbook to the letter, we ended up with a function defined with eight separate cases. Yuck!

The problem is that our number theory textbook handled a lot of stuff as special cases to get it out of the way easily for the proof. Many of those special cases aren’t all that special after all, and we can clean this up by finding equivalent cases and combining them. This is really easy to do and get right in Haskell, once you’ve got the hang of it, because Haskell functions can be read and substituted just like algebra equations. This concept is often called “equational reasoning”, and is cited as an advantage of pure functional programming. The ease of doing this is a direct consequence of not having side effects.

The last two cases cover when a < b, and a > b, respectively. However, the last case subsumes the next to last one, because if a < b (with a and b both positive), then a `rem` b is equivalent to a. That eliminates one case. Similarly, the third from last case is unnecessary. If a = b, then gcd1 a (a `rem` b) evaluates to gcd1 a 0, which in turn evaluates to abs a. Since we know a > 0 if the evaluation gets this far, this is the same as a. This case is also unnecessary. A similar argument proves that gcd1 0 b = b is unnecessary. That leaves:

gcd1 0 0 = undefined
gcd1 a 0 = abs a
gcd1 a b | a <  0    = gcd1 (-a) b
         | b <  0    = gcd1 a (-b)
         | otherwise = gcd1 b (a `rem` b)

This is better, but it’s even shorter to promote the check for negative values of a and b and for gcd1 0 0 to an outer function. The where keyword can be used to define an inner function that is available only within a particular top-level function. The result looks like this:

gcd1 0 0 = undefined
gcd1 a b = gcd1' (abs a) (abs b)
    where gcd1' a 0 = a
          gcd1' a b = gcd1' b (a `rem` b)

Eliminating all of these special cases could be interpreted as an indication that we didn’t need them to begin with.  As an exercise, try rewriting the proof in the text without assuming anything except that a and b are non-negative and are not both zero.

Rerunning the test suite to verify:

*NumTheory> :qc \a b -> (a /= 0 || b /= 0) ==> gcd1 a b == gcd a b
+++ OK, passed 100 tests.

Another implementation of the greatest common divisor is interesting (though, ultimately, it doesn’t perform as well.) One way to look at the algorithm is that we are inductively defining a list of number pairs which all have the same divisors. We start with (|a|, |b|) as the first element in the list, and then define a function to get from there to the next element. That looks like this:

gcd2 0 0 = undefined
gcd2 a b = let next (x,y)    = (y, x `rem` y)
               vals          = iterate next (abs a, abs b)
               Just (ans, _) = find ((== 0) . snd) vals
           in  ans

We first define a next function that takes us from one pair of numbers to the next. Then we use a Haskell standard library function, iterate, to build a list of values resulting from successive applications of next to the pair of numbers. Finally, we look for the first entry from that list that has 0 for its second number. The result is the first number of that pair. You can use QuickCheck to verify that it gives the same answer as the other two algorithms above.

This introduces an explicit data structure to represent the structure of the algorithm. Sometimes, doing this can make algorithms clearer. In this case, it’s just an interesting alternative. It involves some new concepts: an infinitely long list, a higher-order function, and another interesting use of pattern matching. If you don’t understand it yet, take some time to poke around the standard library documentation. Don’t despair, though; I’ll talk about more of these ideas in future installments.

June 3, 2007

Learning Number Theory and Haskell: More QuickCheck and Divisors

Filed under: Haskell, math — cdsmith @ 9:44 am

This is part two of my series Learning Number Theory and Haskell. Last time, we implemented a division algorithm according to an exact specification from Jones Elementary Number Theory. We used the QuickCheck testing library to make ourselves fairly confident that the code was actually correct (in other words, behaved according to certain expectations).

More QuickCheck

Since QuickCheck may be feeling pretty new right now, let’s play around with it for a bit. Exercises 1.1 and 1.2 of the Jones text give some interesting theorems about the remainders of perfect squares. We will verify these theorems with a hundred or so random test cases each. (Note that I’ve now stopped using the jonesQuot and jonesRem functions defined earlier. It was nice to be able to reproduce that behavior, but it’s nicer to become skilled at using the standard library functions instead of inventing our own replacements. We’ll just be careful about negative numbers from here on.)

Prelude> :qc \\n -> (n^2 `rem` 4) `elem` [0, 1]
+++ OK, passed 100 tests.
Prelude> :qc \\n -> (n^2 `rem` 3) `elem` [0, 1]
+++ OK, passed 100 tests.
Prelude> :qc \\n -> (n^2 `rem` 5) `elem` [0, 1, 4]
+++ OK, passed 100 tests.
Prelude> :qc \\n -> (n^2 `rem` 6) `elem` [0, 1, 3, 4]
+++ OK, passed 100 tests.

Yep, they all check out. Of course, you should still write proofs because, as you’re probably quite familiar by now, QuickCheck is suitable for contradicting things or providing evidence in favor, but not for proving things.

Generalizing the Proof Method

I was interested in this concept, so I extended it to arbitrary divisors with a simple Haskell program that generalizes the method of the proofs in the text. If you’re following along to learn Haskell, this makes use of two shorthands for creating lists. The innermost list [0 .. d - 1] is an arithmetic sequence, and hopefully it’s fairly obvious what that does. The outer list is built using a list comprehension. The vertical bar should still be read as “such that”, and in this context, <- is read as “element of”. So the outermost list is the list of all (n^2) `rem` d such that n is an element of [0 .. d - 1]. These techniques make a lot of mathematical code in Haskell very concise, and keep it closer to math syntax. Finally, nub just removes the duplicates from a list. You can :edit NumTheory.hs again, and add the following:

import Data.List (nub) -- Add this at the top of the file, by the other import

squareRems d = nub $ [(n^2) `rem` d | n <- [0 .. d - 1] ]

We can check this, but we’ll need to be careful about negative values of the divisor since I used rem instead of jonesRem.

Prelude> :l NumTheory
[1 of 1] Compiling NumTheory        ( NumTheory.hs, interpreted )
Ok, modules loaded: NumTheory.
*NumTheory> :qc \\n d -> (d > 0) ==> ((n^2) `rem` d) `elem` squareRems d
+++ OK, passed 100 tests.

This algorithm can be supported by a proof that is not given in the text. The theorem is that the remainder of a2 when divided by d is equal to the remainder of some k2, where 0 <= k < d. This is proven by writing a as qd + r by the division algorithm. The a2 = q2d2 + 2qdr + r2. In turn, r2 can be written as pd + s by the division algorithm. Then a2 = q2d2 + 2qdr + pd + s, which in turn is equal to d(q2d + 2qr + p) + s. Thus, a2 gives a remainder of s, but so does r2, and we know that 0 <= r < d. It should, therefore, be no surprise that the QuickCheck test succeeds.

Divisors

The next set of theorems from the text deal with divisors.  If we’d like to verify these theorems using QuickCheck, we need a way to decide if one number divides another one or not. Following Jones, we’ll say (b | a), read as “b divides a,” iff there is some integer q such that a = qb. Recall that for any pair of numbers a and b, with b not equal to 0, there is a unique pair of integers (q, r) such that a = qb + r and 0 <= r < b. In the special case that b is not zero, it’s clear that b | a iff r = 0 for the unique values of q and r given by the division algorithm.

b `divides` a = a `rem` b == 0

This leaves out the possibility that b = 0. The function above will fail if that occurs, because rem a b is undefined when b = 0. So we go back to the definition, which is that b divides a iff there is a q so that a = qb. If b = 0, that equation becomes a = 0b, which is only true if a = 0. Adding that as a special case, the function looks like this:

0 `divides` a  = a == 0
b `divides` a  = a `rem` b == 0

(This is a definition by pattern matching, which is one of the nicest syntax aspects of the Haskell language. The order is significant here, as the second version of the function will match all possible parameters that aren’t matched beforehand.)

I’ve convinced myself that the function is correct, but it’s time for some good old QuickCheck. The text gives a few corner cases explicitly at the top of page 4, so we can test them.

*NumTheory> :qc \\n -> n `divides` 0
+++ OK, passed 100 tests.
*NumTheory> :qc \\n -> 1 `divides` n
+++ OK, passed 100 tests.
*NumTheory> :qc \\n -> n `divides` n
+++ OK, passed 100 tests.

Excellent! Now we’d like to test the general case so we can be really sure that divides is correct… but how? All we have is that b divides a iff for some q, a = qb. We need to know something more about q than it merely being an integer to make this work. A convenient property is given in exercise 1.3 (d) from the Jones text. If a satisfactory q exists, it must divide a itself, so it is between -a and a. The case not covered by this reasoning is when a = 0, but then q = 0 will definitely do the trick! Let’s write a test.

*NumTheory> :qc \\a b -> (b `divides` a) == any (\\q -> a == q * b) [(- abs a) .. abs a]
+++ OK, passed 100 tests.

Here, I build a list of all possible values for q, and then use the built-in any function to see if any of them work. Notice that the first parameter to any is another function, which I’ve written using the anonymous lambda syntax. This is far too slow to be used for an implementation, but it works fine as a test.

Testing a Property of Divisors

Let us now make use of our new function. Exercise 1.4 of the text asks whether it is the case that if a | b and c | d, then (a + c) | (b + d). If you’re following along with me in the text, which I hope you are, you should convince yourself of the answer ahead of time by more mundane means. QuickCheck easily gives the answer.

(Here’s a comprehension quiz. The mere fact that QuickCheck can provide a definite answer should tell you what the answer is. Why?)

*NumTheory> :qc \\a b c d -> ((a `divides` b) && (c `divides` d)) ==> (a + c) `divides` (b + d)
*** Failed! Falsifiable (after 5 tests and 7 shrinks):
1
1
1
0

So not only do we get an answer (no, the conjecture is false!), but we get a counter-example. If a = 1, b = 1, c = 1, and d = 0, then it is true that a | b because b = (-2)a and c | d because d = (-1)c, but it is not true that (a + c) | (b + d).  a + c is 2, and b + d is 1, and clearly there is no integer q such that 1 = 2q.

June 2, 2007

Learning Number Theory and Haskell: The Division Algorithm

Filed under: Haskell, math — cdsmith @ 1:06 pm

This series of blog posts is a chronicle of my working my way through Gareth and Mary Jones’ Elementary Number Theory and translating the ideas into the Haskell programming language. This is the first installment of the series, which I’m calling “Learning Number Theory and Haskell.”

(As a quick side note, I know many readers already know Haskell. I hope this will get interesting for you soon, but my stated goal is to write for beginners… so you may want to skim the first few entries, or treat this as an opportunity to comment on whether I’ve chosen the best way to present ideas for new Haskellers.)

The Jones text begins with a description of the “division algorithm.” (If you’re following along in the text, which I hope you are, I’m combining theorem 1.1 with corollary 1.2 here.) This is seemingly an odd name for a theorem, but here it is. For all integers a and b, where b is not 0, there is a unique q and r such that a = qb + r and 0 <= r < |b|. The text calls these numbers the quotient and remainder. I call them an excuse to go hunting through the Haskell standard library a bit.

There are a few candidates in the standard library for the divison of numbers into a quotient and remainder. Haskell has quot and rem functions, which look promising. It also has div and mod functions, though. Let’s play around a bit and see which (if any) do what we want.

Since I said I’d start from the beginning, here’s the beginning. There are two commands to run GHC: ghc and ghci. We will start out in GHCi, which allows you to type lines of Haskell code and have them evaluated immediately. In some social circles, this is known as an REPL, which stands for “Read-Eval-Print Loop”. It makes poking around a lot easier, since you don’t need to save, recompile, or write a main function before seeing results. Later on, we can use the ghc command to compile programs or libraries that can be run on their own.

$ ghci -fglasgow-exts

(I added an option there that I haven’t mentioned yet. That option enables some language extensions that I don’t intend to use. I added it because it causes GHCi to use a slightly different form of type signatures that I think are actually easier to understand than the default.)

Finally, it’s time to ask GHCi for basic information about those functions. Specifically, we will ask for their types. One of the major keys to Haskell is that if you know the type of something, you are three fourths of the way to understanding it.

Prelude> :t quot
quot :: forall a. (Integral a) => a -> a -> a
Prelude> :t rem
rem :: forall a. (Integral a) => a -> a -> a
Prelude> :t div
div :: forall a. (Integral a) => a -> a -> a
Prelude> :t mod
mod :: forall a. (Integral a) => a -> a -> a

First of all, Prelude> is the basic GHCi prompt. Anything after that on a line is something I typed. We’re more interested in the responses, which are all about the same. Those are type signatures, and types are fundamental to Haskell. I think it might be helpful, though, to work our way up from simpler types toward the more complicated types above.

Prelude> :t not
not :: Bool -> Bool

The response to the :t command uses the :: syntax, which provides type annotations in Haskell. You should read :: as “has type”, so that statement is: not has type Bool -> Bool. That, of course, raises the question of what Bool -> Bool means as a type. First, Bool is the Boolean data type, with values of True or False. Second, -> describes a function mapping from one type to another. The not function, then, takes either True or False as an argument, and returns either True or False as a result. Simple enough.

Prelude> :t length
length :: forall a. [a] -> Int

There are two extra elements here. The first, which we won’t be using yet, is that [Bool] denotes a list whose elements are type Bool. The important thing is that in Haskell, you can’t just have a list. You must have a list of Bool or a list of Integer, or some other type. The length function, though, wants to operate on all possible types of lists. Therefore, it uses polymorphism. The keyword forall introduces a generic name for a type. It’s basically a logic quantifier, which in other contexts is often written as an upside-down A. The type here says that for all possible types a, the length function can calculate an Int from a list of a.

I like the -fglasgow-exts command line argument to ghci because it makes this explicit. Without that argument, the type above would have just been written as [a] -> Int. This is a valid abbreviation for the type given above; if some name in a type signature begins with a lower-case letter, it is assumed to be a type variable, even if there is no explicit forall clause. On the other hand, types beginning with upper-case letters are specific type names, like Int or Bool.

Prelude> :t negate
negate :: forall a. (Num a) => a -> a

The new concept introduced here is that of type classes. There are many different types that all represent numbers in Haskell. It would be painful if things like negate (or +, for that matter) could only work on one of those many types. (As a side note, I suppose it’s not universally agreed that this would be painful, because ML does it that way.) On the other hand, it makes no sense for negate to operate on all possible types, either! How do you keep a programmer from negating a banana warehouse? (Hmm, I feel like there should be a punchline here.) The solution is type classes. A type class, which starts with an upper-case letter much like a type, defines a group of many types (called instances) for which some set of operations is possible. You can read the => operator as an if/then statement, so the statement is: negate has a type so that for all types a, if a belongs to the type class Num, then negate can take an argument of type a and return a result of type a. Whew, that’s a mouthful!  It says what we want, though.  Now back to the beginning.

Prelude> :t quot
quot :: forall a. (Integral a) => a -> a -> a
Prelude> :t rem
rem :: forall a. (Integral a) => a -> a -> a
Prelude> :t div
div :: forall a. (Integral a) => a -> a -> a
Prelude> :t mod
mod :: forall a. (Integral a) => a -> a -> a

The only new thing here is that these functions take two parameters of type a, and return a result of type a. The Integral type class is exactly what you’d expect: it describes types whose values are mathematical integers. The two interesting types in this type class are Integer, which is an arbitrary precision type (in other words, it never overflows), and Int, which stores at least 30-bit signed values, depending on the implementation.

We now know the types, so let’s see how these functions behave. In other words, let’s play! (Any function of two parameters can be used as an infix operator by enclosing it in back-ticks, which are probably above the tilde on your keyboard. That’s what I’m doing below.)

Prelude> 6 `div` 4
1
Prelude> 6 `mod` 4
2
Prelude> 6 `quot` 4
1
Prelude> 6 `rem` 4
2

So far, so good. It looks like both pairs of functions behave like the Jones text’s quotient and remainder.  However, a single test case does not make a correct function! We could try a few more test cases.  Let’s try 100 of them instead! A nice systematic way to try lots of test cases is with a nifty library called QuickCheck. This library will verify stuff with bunches of random test cases. It tells us whether a property is false, or likely to be true. The wording there is very deliberate. It will definitely not tell us that the statement is true. That would require a general proof! QuickCheck will just try some random inputs to look for counter-examples.

I get tired of typing long commands, so I’ve added the following line to my GHCi configuration file, ~/.ghci:

:def qc \\c -> return $ ":m + Test.QuickCheck.Property \\nTest.QuickCheck.quickCheck (" ++ c ++ ") \\n:m - Test.QuickCheck.Property"

To use QuickCheck, we need functions that take parameters (values for a and b) and evaluate to either True (success) or False (failure). That’s convenient, because the Jones text gives us two of them. Specifically, the theorem asks for: a = qb + r, and 0 <= r < |b|. We can define these as anonymous functions using the lambda syntax in Haskell. Here’s a simple example of lambda syntax:

Prelude> (\\ x -> x * x) 4
16

The anonymous function begins with a backslash (\) and a list of parameters. The arrow (->) introduces the body of the function. This example defines an anonymous function to calculate the square of a number, and then applies it to 4 to get 16. We’ll do the same thing for our test cases. We are given that b is not zero, and QuickCheck defines a handy operator ==> to express that. Let’s try this with quot and rem.

Prelude> :qc \\a b -> (b /= 0) ==> (a `quot` b) * b + (a `rem` b) == a
+++ OK, passed 100 tests.
Prelude> :qc \\a b -> (b /= 0) ==> (0 <= a `rem` b) && (a `rem` b < abs b)
*** Failed! Falsifiable (after 9 tests and 5 shrinks):
-5
-3

Oops! The first test case succeeded, but the second test case failed. Specifically, it failed with values of a = -5 and b = -3. QuickCheck generates random test cases, so you will probably see a different test case fail. Hopefully your test did fail, though! Clearly, then, the quot and rem functions don’t satisfy the requirements of the division algorithm, because the result of rem is out of range for this input! Looking closer at the test case, we find that using quot and rem gives q and r of 1 and -2, while we wanted 2 and 1.

Let’s try div and mod.

Prelude> :qc \\a b -> (b /= 0) ==> (a `div` b) * b + (a `mod` b) == a
+++ OK, passed 100 tests.
Prelude> :qc \\a b -> (b /= 0) ==> (0 <= a `mod` b) && (a `mod` b < abs b)
*** Failed! Falsifiable (after 7 tests and 3 shrinks):
1
-2

So neither of the Haskell standard library functions agree with the spec that we got from the text. Both pairs of functions satisfy the first equality from the text, a = qb + r, all the time. If you’re wondering how they differ, the answer is that when the signs of a and b are different, rem gives the remainder the same sign as a, while mod gives the remainder the same sign as b. Thus, quot and rem fail our purposes precisely when a is negative and r is non-zero, while div and mod fail precisely when b is negative and r is non-zero.

We now know that we have two different incorrect functions. What about a correct one? We’ll have to write it; but fortunately it’s given in the Jones text. We’re in GHCi, and I’d prefer not to write a lot of code at a command prompt, so use GHCi’s :edit command.

Prelude> :edit NumTheory.hs

That should open up a text editor. (If you prefer a different editor, then set your EDITOR environment variable.) Now enter the following code, save the file, and exit the editor.

module NumTheory where

import Data.Ratio

jonesQuot, jonesRem :: Integral a => a -> a -> a

a `jonesQuot` b | b < 0 = ceiling (a % b)
                | b > 0 = floor   (a % b)

a `jonesRem` b = a - b * (a `jonesQuot` b)

Here’s what’s going on. The module statement defines this as a module with the name NumTheory. The import statement lets us use the Rational type, and the % operator to build it. I chose the Rational type to avoid the rounding error that would occur with floating point numbers. Next, there’s a type signature. Haskell could infer the type signature, but there are two advantages to specifying it ourselves. First, the signature inferred by Haskell is too general and is confusing. Second, the explicit type signature makes it abundantly clear how to use the function without having to ask GHCi for help. Next, jonesQuot has two cases corresponding to the main theorem and the corollary in the Jones text. (The vertical bar can be read as “such that” or “for” and is called a guard clause.) In one case, a floor is needed, while the other case requires a ceiling (this is in the book, if you’re reading it.) The % operator constructs a Rational from two Integral numbers.

Use the GHCi :l command to load the module, and you have precisely the functions we went searching for. We have QuickCheck to help us be more certain.

Prelude> :l NumTheory
[1 of 1] Compiling NumTheory        ( NumTheory.hs, interpreted )
Ok, modules loaded: NumTheory.
*NumTheory> :qc \\a b -> (b /= 0) ==> (a `jonesQuot` b) * b + (a `jonesRem` b) == a
+++ OK, passed 100 tests.
*NumTheory> :qc \\a b -> (b /= 0) ==> (0 <= a `jonesRem` b) && (a `jonesRem` b < abs b)
+++ OK, passed 100 tests.

Ah, the feeling of success! Of course, QuickCheck didn’t prove anything except that failure probably happens less than one out of a hundred inputs. Proofs of the theorem and simple algebra to establish that the algorithm is correct are in the Jones text, and you’re encouraged to read them. Just remember that there is no (okay, only a little) competition between testing and proof. Knuth’s famous quote comes to mind: “Beware of bugs in the above code; I have only proved it correct, not tried it.” Proofs don’t always catch boneheaded coding mistakes.

It’s worth a brief mention that, in this case, the interaction between testing and proof is particularly appealing. The text proves that there is one and only one function for which those two QuickCheck tests would always succeed. In other words, we actually have a proof that our test cases test everything of importance! That’s pretty cool; we have proven the completeness of the test suite. (The passing tests still don’t prove the correctness of the code, though. Why not?)

Learning Number Theory and Haskell: The Plan

Filed under: Haskell, math — cdsmith @ 9:42 am

Now that I’ve been added to planet.haskell.org, I feel like I should write a lot about Haskell. I also need to learn number theory over the summer. I thought, why not combine the two? Thus, I have hatched the idea of this “series” of blog posts, which I am calling Learning Number Theory and Haskell. My plan for this summer is to structure a set of writings on Haskell around my reading of the textbook Elementary Number Theory by Gareth and Mary Jones.

My writing will be aimed at beginners in both Haskell and number theory. My goal will be to:

  • Talk about all of the basic ideas of Haskell, using an exploratory approach.
  • Try for as much “text coverage” of the number theory book as I can.
  • Respond to your feedback on what you find interesting and worthwhile.

If you share similar interests, I’d encourage you to follow along. It may help if you actually purchase a copy of the textbook; if not, you will just miss out on much of the number theory part, as I don’t intend to rewrite the book in the course of my blog.

By way of acknowledgement, I partially got this idea from Rydeheard and Burstall’s book Computational Category Theory, which is available online here. I should point out that I haven’t actually read more than the first chapter of that book, and it is really not at all like what I’m going to do.  However, Rydeheard and Burstall’s work does raise the intriguing idea that it may be possible to build a software library that, in a sense, embodies the theory of a mathematical field.  I find that concept fascinating.

I may or may not create a library at all. The first few entries, which I’ve already sketched out, don’t involve writing anything permanent. I will be somewhat limited by the fact that I’m working at a beginner’s level in Haskell. The series will start with some hacking around in GHCi, and grow from there. Again, I hope to get some feedback as I go along on which directions to pursue, and which to abandon.

So, I hope you’re with me! I’ll be posting the first installment later today.

Blog at WordPress.com.