Sententia cdsmithus

June 21, 2007

A Hack for Approaching Haskell Libraries

Filed under: Haskell — cdsmith @ 10:55 am

One of the most difficult problems I find in learning new libraries and modules is where to start.  If you start in the obvious places, there are inevitably zillions of abstractions that just don’t make sense yet.  As a quick kludge, I write the following Haskell code; it examines a set of modules in a directory, and sorts them by the number of dependencies on other modules.  The result is that if module A depends on module B, then B will be listed before A in the result (except for circular dependencies).  If you read the source code in the order given here, there’s more chance of understanding it all.

So here’s the code to build such a list:

import System
import System.Directory
import IO
import Data.List
import Data.Char
import Data.Maybe
import Data.Ord

{-
    Read a Haskell *.hs file and get a list of all the modules
    that it imports.
-}
findDeps base pkg = do
        let hi = base ++ (map dotToSlash pkg) ++ ".hs"
        ex <- doesFileExist hi
        if not ex then return [] else do
            src <- readFile hi
            let imps = filter isImport (lines src)
            return $ catMaybes $ map getTargetModule imps

    where dotToSlash '.' = '/'
          dotToSlash x   = x

          isImport (' ':t)  = isImport t
          isImport ('\\t':t) = isImport t
          isImport t        = "import" `isPrefixOf` t

          getTargetModule s = let pre = takeWhile (/= '(') s in
                              find (isUpper . head) (words pre)

{-
    Find the transitive, reflexive closure of the relation defined
    by the findDeps function.  This returns a list of ALL modules
    that this one uses or depends on, directly or indirectly.
-}
allDeps base mod = allDeps' [mod] [mod] where
    allDeps' (m:ms) full = do d <- findDeps base m
                              let d' = filter (not . flip elem full) d
                              t <- allDeps' (d' ++ ms) (d' ++ full)
                              return (m : t)
    allDeps' [] _        = return []

{-
    Usage: OrderByComplexity  

        = directory where source code is found.  This MUST
                end in '/'
     = file that lists the modules you're interested in,
                one per line.  This is often taken from a .cabal
-}
main = do [ base, pkgFile ] <- getArgs
          pkgStr            <- readFile pkgFile
          let pkgs           = lines pkgStr
          mods              <- mapM (allDeps base) pkgs
          let deps           = zip pkgs mods
          let deps'          = sortBy (comparing fst) deps
          mapM_ print $ map fst $ sortBy (comparing $ length . snd) deps'

4 Comments »

  1. Hmm, were you aware of the UNIX ‘tsort’ utility? It would be able to automate most of this.

    Comment by sorear — June 21, 2007 @ 3:39 pm | Reply

  2. Just out of curiousity as a person who’s getting better at reading Haskell, but not writing it, why did you structure the code such that dotToSlash, getTargetModule, and isImport are inside a “where” clause, instead of as their own functions?

    It confused me a bit when reading your code; I scanned toplevel for two of those, before looking inside the findDeps block.

    Comment by Bill Mill — June 24, 2007 @ 9:16 pm | Reply

  3. Bill, I did so only because they are functions that have little intrinsic value except in that context. I felt that writing the code that way brought out the structure of what I was doing better than having six different top-level functions.

    Comment by cdsmith — June 24, 2007 @ 9:43 pm | Reply

  4. Thanks. Other than that one surprising bit, I thought it was very readable code.

    Comment by Bill Mill — June 25, 2007 @ 5:54 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a comment

Blog at WordPress.com.