Skip to content
April 20, 2009 / cdsmith

Code for Manipulating Graphs in Haskell

Here’s a bunch of code I wrote, most of it about a year ago, for doing things with the graphs from the Data.Graph module in Haskell.  The choice of functions, from among the many generally useful functions acting on graphs, comes from a specific project.  The actually functionality is pretty generic, though.  So I’m just throwing this out there. If someone else wanted to package it and throw it on Hackage (likely with a different module name), they would be welcome to do so.

    Module containing utility functions for operating on Data.Graph.
module GraphUtil where

import Data.Graph
import Data.List
import Data.Tree
import Data.Array
import Data.Maybe
import Data.Function (on)
import qualified Data.Map as M
import Data.Map (Map)

    Computes all possible (directed) graphs over n vertices.  Allows
    loops and antiparallel edges, but does not allow parallel edges.
allGraphs n = map (buildG (1,n))
                  (subsequences [ (a,b) | a <- [1..n], b <- [1..n] ])

    Finds all cycles in a graph.  A cycle is given as a finite list of
    the vertices in order of occurrence, where each vertex only appears

    The point where the cycle is broken to form a linear list is arbitrary, so
    each cycle is merely one representative of the equivalence class
    generated by the relation identifying (v:vs) with vs++[v].  In
    particular, this function chooses the representative with the lowest
    numbered starting vertex.
cycles g = concatMap cycles' (vertices g)
    where cycles' v   = build [] v v
          build p s v =
            let p'         = p ++ [v]
                local      = [ p' | x <- (g!v), x == s ]
                good w     = w > s && not (w `elem` p')
                ws         = filter good (g ! v)
                extensions = concatMap (build p' s) ws
            in  local ++ extensions

    Computes a table of the number of loops at each vertex.
loopdegree g = array (bounds g) [ (v, count v (g!v)) | v <- vertices g]
    where count v [] = 0
          count v (x:xs) | x == v    = 1 + count v xs
                         | otherwise = count v xs

    Tests two graphs to see if they are isomorphic.  This is a worst-case
    exponential algorithm, but should perform acceptably in practice on
    small graphs.  It uses backtracking, associating vertices one by one
    until it either finds a complete isomorphism, or reaches a vertex
    that can't be matched with a remaining vertex in the other graph.
isIsomorphic g1 g2 = let v1   = vertices g1
                         v2   = vertices g2
                     in  (length v1 == length v2) && test [] v1 v2

    where {-
            Takes the first vertex v of g1, looks for vertices w of g2
            that work "so far", and tries to construct an isomorphism
            that maps v to w.
          test m []     [] = True
          test m (v:vs) ws =
            let cs = filter (similar m v) ws
            in  any (\w -> test ((v,w):m) vs (delete w ws)) cs

            Tests whether a given mapping v -> w for v in g1, w in g2
            works "so far".  In order for the mapping to work, the
            vertices must agree on their in-degree, out-degree, and
            number of loops; and already mapped adjacent vertices via
            in- and out-edges must correspond.
          similar m v w    =    (in1   ! v) == (in2   ! w)
                             && (out1  ! v) == (out2  ! w)
                             && (loop1 ! v) == (loop2 ! w)
                             && match ((v,w):m) (g1 !v) (g2 !w)
                             && match ((v,w):m) (g1'!v) (g2'!w)

            Tests whether a list of vertices agrees in those edges that
            are already mapped to each other via the association list m.
          match m vs ws =
            let kvs = mapMaybe (\v -> find ((== v) . fst) m) vs
                kws = mapMaybe (\w -> find ((== w) . snd) m) ws
            in  sort kvs == sort kws

            Some global information about the graphs that can be
            calculated only once to save time.  This includes the
            degrees and number of loops at each vertex, and the
            transpose of the graphs (used to find in-edges).
          g1'   = transposeG g1
          g2'   = transposeG g2
          in1   = outdegree  g1'
          out1  = outdegree  g1
          in2   = outdegree  g2'
          out2  = outdegree  g2
          loop1 = loopdegree g1
          loop2 = loopdegree g2

    Returns the degree sequence of a given graph.  The degree sequence is
    a sorted sequence of tuples representing the in-degree, out-degree,
    and loop-degree, respectively, of each vertex in the graph.  The
    degree sequence has the desirable properties that it is:

        (a) a graph property (that is, invariant under isomorphism)

        (b) relatively cheap to compute, and

        (c) classifies non-isomorphic graphs very effectively into small
degsequence g = sort (zip3 (elems (indegree g))
                           (elems (outdegree g))
                           (elems (loopdegree g)))

    Given a list of graphs, removes the duplicates up to isomorphism.

    The implementation takes advantage of the fact that the degree
    sequence is the same for any pair of isomorphic graphs.  Therefore,
    the process maintains a map from degree sequences found so far to
    their respective graphs.  This removes the need to compare most sets
    of graphs in a normal list.  If the list contains only graphs with
    the same degree sequence, then this function will be very slow, as
    it will perform O(n^2) isomorphism tests, each of which are worst-case
isonub :: [Graph] -> [Graph]
isonub gs = go M.empty gs
    where go hs []     = []
          go hs (g:gs) =
            let dseq = degsequence g
                poss = M.findWithDefault [] dseq hs
            in  if any (isIsomorphic g) poss
                    then     go hs                          gs
                    else g : go (M.insert dseq (g:poss) hs) gs

    Finds the shortest paths from the given vertex to any other vertex
    of the graph.  The paths returned are lists of vertices, so if there
    are parallel edges in the graph, each result may actually correspond
    to multiple paths.

    The algorithm is essentially breadth first search, but done in bulk
    for each increase in depth, ensuring that all shortest paths are
    found for each vertex.
shortestPaths g v = go (M.singleton v [[v]]) [v]
    where go ans []  = (map reverse) ans
          go ans ws  = let new  = map (step ans) ws
                           next = foldl (M.unionWith (++)) ans new
                       in  go next (M.keys next \\ M.keys ans)
          step ans w = let new   = filter (not . (`M.member` ans)) (g!w)
                           soFar = ans M.! w
                           paths = map (\x -> map (x:) soFar) new
                       in  M.fromList (zip new paths)

    Computes all possible directed subforests of a given graph.  A
    subforest is a subgraph with the property that there is at most one
    edge entering any vertex.

    The approach is to first choose the set of vertices, which may be
    any subset of the vertices of the given graph, and then choose the
    edge (if any) entering each vertex.  Parallel edges are ignored,
    since they would lead to isomorphic subforests.
subforests :: Graph -> [Graph]
subforests g = filter ((== []) . cycles)
                $ concatMap subsAt
                $ subsequences (vertices g)
    where g'        = transposeG g
          subsAt vs = let vmap      = zip vs [1..]
                          tr v      = fromJust (lookup v vmap)
                          tre (v,w) = (tr v, tr w)
                          inAt v    = filter (`elem` vs) $ nub $ (g' ! v)
                      in  [ buildG (1, length vs) fixedEdges
                                | edges <- [ [] : [ [(w,v)] | w <- inAt v ] | v <- vs ],
                                  let fixedEdges = map tre (concat edges) ]


Leave a Comment
  1. Ivan Miljenovic / Apr 21 2009 6:32 pm

    Would you know how your isIsomorphic and isonub functions compare to Jean-Philippe Bernardy’s HGAL package (available via Hackage)? And how do you define “small” graphs? (I’m needing to remove isomorphic duplicates from potentially millions of graphs of various orders…)

  2. cdsmith / Apr 21 2009 6:58 pm

    I have little doubt that the HGAL implementation of isIsomorphic is considerably faster in most cases, though you could give it a shot. I don’t see anything like isonub in the hgal package, though if I understand correctly, you could map canonicGraph from the Data.Graph.Automorphism package over the list, and then just sort the list and remove consecutive duplicates using the standard Ord and Eq instances.

    I was unable to get hgal installed quickly when I wrote this (and again now, I get errors from cabal install, related to building an old version of the array package)… and since my application only had to deal with tens of thousands of graphs of at most about 5 vertices, this worked just as well.

  3. Ivan Miljenovic / Apr 21 2009 7:12 pm

    I’ve been using HGAL basically by doing “map fst . nubBy ((==) `on` snd) . map (\f -> (f,canonicGraph . toGraph f))” (where toGraph is function on my data type I’m using). I’m probably going to try re-implementing HGAL on my own directly from Brendan McKay’s paper to see if I can improve performance somehow, as my app spends more time doing canonicGraph than anything else :s

  4. cdsmith / Apr 22 2009 4:00 pm

    Ivan, if you have a large enough number of graphs, using nubBy is unwise. You’re actually better off sorting (Graph is an instance of Ord), and then removing consecutive duplicates. (map head . groupBy ((==) `on` snd . sort) would replace your nubBy in that case.

  5. Ivan Miljenovic / Apr 22 2009 4:33 pm

    Actually, I’m using a Set to store the current canonical graphs found, after much urging by the people on #haskell.

    Even so, however, I beg to differ that nubBy is unwise: about 2/3 of my time is spent running canonicGraph (which I do _before_ removing duplicates). Considering that for one test I ran, I had over 2.5 million values out of which only around 140 were unique, nubBy isn’t that much of a problem.

    The big advantage of nubBy as opposed to using “map head . group . sort” or “Set.toList . Set.fromList” is that nubBy is _lazy_, so that my program can start spitting out values as soon as it finds them as opposed to having to sort the entire list (which gets _very_ large) before it can start printing values.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: