Catching a Mathematical Error Using Haskell’s Type System
I found this worth sharing. It’s the story of an error in a mathematics paper. Perhaps not too terribly surprising, you might think. But it really is surprising. The paper in question is quite well-written and clear. It’s central to its field, and has been read in detail by, I’m sure, hundreds if not thousands of people. It’s 25 years old. It doesn’t involve a lot of references to other results, but rather builds its entire argument right there in this one paper. In other words, it’s exactly the sort of mathematics that you’d expect to be correct. (The error, by the way, is not major and is easily overcome, and the result is still true. So I’m not claiming that 25 years of research is invalid or anything like that!)
Here’s the story. John Franks published a central result in the theory of subshifts of finite type, in 1983. Here’s everything you need to know about that field:
Subshifts of finite type can be described by directed graphs (loops and parallel edges allowed.) There are some interesting operations that we can perform on these graphs: in-splitting, out-splitting, in-amalgamation, out-amalgamation, expansion, and contraction. These operations arise in a really cool way, but you don’t need to worry about that. The main point, for the moment, is that although they come about quite naturally, they are not easy to work with directly.
Franks noticed that a certain matrix of integers seems to come up quite a bit in understanding the structure of these subshifts: that matrix is A – I, where A is the adjacency matrix of the graph, and I is the identity matrix of the right size. (In other words, the (i,j) entry of A is the number of edges from vertex i to vertex j.) He also noticed that he can perform row and column operations on that matrix — adding or subtracting a row or column to/from another — by doing those other fancy operations above in a certain way. This was huge. Now if I want to get from one graph G to another graph H, using those splits and amalgamations and expansions and such, it actually suffices to get from the matrix A – I to the matrix B – I (where A and B are the adjacency matrices of G and H) by using elementary row and column operations!
(There are a few technicalities, but skip this paragraph if you don’t care about them. You first have to arrange for G and H to have a loop at every vertex, so that A – I and B – I are non-negative. Then you have to make sure that your row and column operations don’t leave you with negative entries in your matrices. These are handled correctly in the paper, but are not necessary for the point I’m making, so I won’t mention them again.)
Then Franks sets out to do this.
First, he shows that he can arrange so that the entire first column of the matrix A – I is equal to the greatest common divisor of all the entries. Starting from this, he uses an elimination algorithm similar to what you often see in introductory linear algebra courses: you clear out some row and column except for the single cell where they coincide, and then ignore that row and column and recursively perform the same operation on the remaining smaller matrix. Franks chooses, in particular, to clear out the first column and the second row.
But here’s the error. Elementary row and column operations on this sub-matrix carry over to the larger matrix… but when arranging for the first column to be equal to the g.c.d. of all of the entries, Franks accidentally used some different operation — namely, conjugation by a permutation matrix. A seemingly trivial operation, since it corresponds to a graph isomorphism. In a sense, it doesn’t even change the graph at all, so it’s hardly even an operation. However, when you delete the first column and second row, and then conjugate the resulting matrix by a permutation matrix, this does not correspond to any kind of good operation on the larger graph. The proof of this central result in the theory of subshifts of finite type, therefore, contains an error.
How did I find this? Actually, I found it because I was implementing Franks’ construction in Haskell, and Haskell has a type system that was remarkably helpful in reasoning about the result. I started by writing a type class (called FlowEquiv) for representations of graphs on which I can perform those weird operations: splits, amalgamations, etc. As I described in my last blog entry, I was able to capture a lot of higher level concepts by writing wrapper types that are instances of this type class. For example, these operations can be performed on the transpose of a graph, and when they are, they correspond to other operations on the original graph. I was able to write down that correspondence explicitly by having a wrapper type called Transposed implement the same type class.
Naturally, when I came to the point of needing to delete the first column and second row, I looked for a way to do the same thing. I couldn’t do it. I poked at this for a couple days, and reached the conclusion that, in fact, there is no obvious correspondence between these operations (splits and amalgamations and such) on a sub-matrix, and the corresponding operations on the larger matrix. Clearly, a new abstraction was needed, so I defined a new type class, ReducibleMatrix, defining just those operations that carry over nicely to sub-matrices: namely, the row and column operations. Then the huge insight of Franks, the ability to compose splits and amalgamations and expansions and the like to form elementary row and column operations, corresponds to this line of code:
instance FlowEquiv a => ReducibleMatrix a where
and Franks proof is realized by the definitions of the functions from ReducibleMatrix. So then I set out to implement those operations of Franks that need to be performed on submatrices in terms of the ReducibleMatrix type class. And that’s when I saw it. I could not convert my implementation of that one piece (turning the first column of A – I into the gcd of the entries) on the ReducibleMatrix type class in the way that Franks described it. Why? Because of that conjugation by a permutation matrix. In a sense, Haskell’s type system found the error in Franks’ proof.
Epilogue: it turns out there’s an easy modification of Franks’ proof that works around this issue. That corresponded to implementing the g.c.d. piece in Haskell in a different way. The correction is minor, and doesn’t really have an effect on any other results, so it will appear as a brief remark in an upcoming paper I’m writing on Leavitt path algebras.
Anyway, I thought that was worth sharing. In this case, Haskell’s type system did catch not just a programming error, but an error in a mathematical result that had to the best of my knowledge gone undetected by likely hundreds of people for 25 years.