Skip to content
June 21, 2007 / cdsmith

A Hack for Approaching Haskell Libraries

One of the most difficult problems I find in learning new libraries and modules is where to start.  If you start in the obvious places, there are inevitably zillions of abstractions that just don’t make sense yet.  As a quick kludge, I write the following Haskell code; it examines a set of modules in a directory, and sorts them by the number of dependencies on other modules.  The result is that if module A depends on module B, then B will be listed before A in the result (except for circular dependencies).  If you read the source code in the order given here, there’s more chance of understanding it all.

So here’s the code to build such a list:

import System
import System.Directory
import IO
import Data.List
import Data.Char
import Data.Maybe
import Data.Ord

{-
    Read a Haskell *.hs file and get a list of all the modules
    that it imports.
-}
findDeps base pkg = do
        let hi = base ++ (map dotToSlash pkg) ++ ".hs"
        ex <- doesFileExist hi
        if not ex then return [] else do
            src <- readFile hi
            let imps = filter isImport (lines src)
            return $ catMaybes $ map getTargetModule imps

    where dotToSlash '.' = '/'
          dotToSlash x   = x

          isImport (' ':t)  = isImport t
          isImport ('\\t':t) = isImport t
          isImport t        = "import" `isPrefixOf` t

          getTargetModule s = let pre = takeWhile (/= '(') s in
                              find (isUpper . head) (words pre)

{-
    Find the transitive, reflexive closure of the relation defined
    by the findDeps function.  This returns a list of ALL modules
    that this one uses or depends on, directly or indirectly.
-}
allDeps base mod = allDeps' [mod] [mod] where
    allDeps' (m:ms) full = do d <- findDeps base m
                              let d' = filter (not . flip elem full) d
                              t <- allDeps' (d' ++ ms) (d' ++ full)
                              return (m : t)
    allDeps' [] _        = return []

{-
    Usage: OrderByComplexity  

        = directory where source code is found.  This MUST
                end in '/'
     = file that lists the modules you're interested in,
                one per line.  This is often taken from a .cabal
-}
main = do [ base, pkgFile ] <- getArgs
          pkgStr            <- readFile pkgFile
          let pkgs           = lines pkgStr
          mods              <- mapM (allDeps base) pkgs
          let deps           = zip pkgs mods
          let deps'          = sortBy (comparing fst) deps
          mapM_ print $ map fst $ sortBy (comparing $ length . snd) deps'

4 Comments

Leave a Comment
  1. sorear / Jun 21 2007 3:39 pm

    Hmm, were you aware of the UNIX ‘tsort’ utility? It would be able to automate most of this.

  2. Bill Mill / Jun 24 2007 9:16 pm

    Just out of curiousity as a person who’s getting better at reading Haskell, but not writing it, why did you structure the code such that dotToSlash, getTargetModule, and isImport are inside a “where” clause, instead of as their own functions?

    It confused me a bit when reading your code; I scanned toplevel for two of those, before looking inside the findDeps block.

  3. cdsmith / Jun 24 2007 9:43 pm

    Bill, I did so only because they are functions that have little intrinsic value except in that context. I felt that writing the code that way brought out the structure of what I was doing better than having six different top-level functions.

  4. Bill Mill / Jun 25 2007 5:54 am

    Thanks. Other than that one surprising bit, I thought it was very readable code.

Leave a reply to cdsmith Cancel reply