June 6, 2007 / cdsmith

Learning Number Theory and Haskell: Greatest Common Divisor

This is part three of my series, “Learning Number Theory and Haskell”, in which I work through Gareth and Mary Jones’ Elementary Number Theory and translate the ideas and concepts into Haskell. In the previous two parts, I used QuickCheck to verify some theorems and translate some ideas into Haskell code.

It is widely speculated that the first real algorithm worthy of the name was due to Euclid, when he described how to calculate the greatest common divisor of two terms. What better place to start, then in writing a first serious algorithm in Haskell? (Besides, the textbook does this next.) In the process, I’ll use “equational reasoning” (a popular word in functional programming) to improve the code. Then I’ll try something completely different to finish off

First, let’s write a type signature. Greatest common divisors can be calculated with integral values only, and maps two numbers to a result. The signature is:

gcd1 :: forall a. Integral a => a -> a -> a

For a first version of the algorithm, I plan to transcribe the proof from the text as code. There are a few trivial cases involving zero gotten near the top of page 6. We can write those separately using pattern matching.

gcd1 0 0 = undefined
gcd1 a 0 = abs a
gcd1 0 b = abs b

There are a few more special cases for non-zero arguments.

gcd1 a b | a <  0    = gcd1 (-a) b
         | b <  0    = gcd1 a (-b)
         | a == b    = a
         | a <  b    = gcd1 b a
                                 -- Phew!
         | otherwise = gcd1 b (a `rem` b)

Let’s test against the built-in gcd function from the prelude. The only subtlety here is that since gcd 0 0 is undefined, we should avoid calling it.

*NumTheory> :qc \a b -> (a /= 0 || b /= 0) ==> gcd1 a b == gcd a b
+++ OK, passed 100 tests.

We’ve got a working gcd1. However, it is terribly unsatisfying. By following the textbook to the letter, we ended up with a function defined with eight separate cases. Yuck!

The problem is that our number theory textbook handled a lot of stuff as special cases to get it out of the way easily for the proof. Many of those special cases aren’t all that special after all, and we can clean this up by finding equivalent cases and combining them. This is really easy to do and get right in Haskell, once you’ve got the hang of it, because Haskell functions can be read and substituted just like algebra equations. This concept is often called “equational reasoning”, and is cited as an advantage of pure functional programming. The ease of doing this is a direct consequence of not having side effects.

The last two cases cover when a < b, and a > b, respectively. However, the last case subsumes the next to last one, because if a < b (with a and b both positive), then a `rem` b is equivalent to a. That eliminates one case. Similarly, the third from last case is unnecessary. If a = b, then gcd1 a (a `rem` b) evaluates to gcd1 a 0, which in turn evaluates to abs a. Since we know a > 0 if the evaluation gets this far, this is the same as a. This case is also unnecessary. A similar argument proves that gcd1 0 b = b is unnecessary. That leaves:

gcd1 0 0 = undefined
gcd1 a 0 = abs a
gcd1 a b | a <  0    = gcd1 (-a) b
         | b <  0    = gcd1 a (-b)
         | otherwise = gcd1 b (a `rem` b)

This is better, but it’s even shorter to promote the check for negative values of a and b and for gcd1 0 0 to an outer function. The where keyword can be used to define an inner function that is available only within a particular top-level function. The result looks like this:

gcd1 0 0 = undefined
gcd1 a b = gcd1' (abs a) (abs b)
    where gcd1' a 0 = a
          gcd1' a b = gcd1' b (a `rem` b)

Eliminating all of these special cases could be interpreted as an indication that we didn’t need them to begin with. As an exercise, try rewriting the proof in the text without assuming anything except that a and b are non-negative and are not both zero.

Rerunning the test suite to verify:

*NumTheory> :qc \a b -> (a /= 0 || b /= 0) ==> gcd1 a b == gcd a b
+++ OK, passed 100 tests.

Another implementation of the greatest common divisor is interesting (though, ultimately, it doesn’t perform as well.) One way to look at the algorithm is that we are inductively defining a list of number pairs which all have the same divisors. We start with (|a|, |b|) as the first element in the list, and then define a function to get from there to the next element. That looks like this:

gcd2 0 0 = undefined
gcd2 a b = let next (x,y)    = (y, x `rem` y)
               vals          = iterate next (abs a, abs b)
               Just (ans, _) = find ((== 0) . snd) vals
           in  ans

We first define a next function that takes us from one pair of numbers to the next. Then we use a Haskell standard library function, iterate, to build a list of values resulting from successive applications of next to the pair of numbers. Finally, we look for the first entry from that list that has 0 for its second number. The result is the first number of that pair. You can use QuickCheck to verify that it gives the same answer as the other two algorithms above.

This introduces an explicit data structure to represent the structure of the algorithm. Sometimes, doing this can make algorithms clearer. In this case, it’s just an interesting alternative. It involves some new concepts: an infinitely long list, a higher-order function, and another interesting use of pattern matching. If you don’t understand it yet, take some time to poke around the standard library documentation. Don’t despair, though; I’ll talk about more of these ideas in future installments.