January 16, 2011 / cdsmith

Haskell’s Own DLL Hell

I touched on this issue in a more positive way in this recent entry. But now I’m going to be more negative about it. You see, here’s the thing: Significant Haskell software projects are struggling under the weight of the Haskell equivalent of “DLL hell”.

If you’re not familiar with the term, here’s a definition of “DLL hell”, a concept painfully familiar to many Windows system administrators:

“In computing, DLL hell is a colloquial term for the complications that arise when working with dynamic link libraries (DLLs) used with Microsoft Windows operating systems.”

The idea is that various applications on the computer share libraries. These libraries have different versions, and different programs often need different versions. The “hell” starts when some programs overwrite the libraries with other, incompatible, versions, or when one program somehow turns out to need both of two incompatible sets of libraries.

To be sure, Haskell is better equipped to handle the problem than historical Windows executables. In old versions of Windows, programs often blindly copied the DLLs they needed into a system-wide location (Windows\System) without regard to any versioning at all. However, in Haskell, we are dealing with the problem on a different order of magnitude. Whereas a DLL on Windows is generally a pretty substantial project in its own right, many Haskell package on Hackage consist of just a few lines of code (put in a positive light, they do one thing, and do it well!), and as a result, many other projects depend on dozens of different packages, either directly or indirectly.

This has consequences: my experience is that it’s currently near-impossible to embark on most nontrivial Haskell software projects without spending a substantial percentage of one’s time fighting dependency version issues. This is especially true of “real world” sorts of projects: those that often involve pulling together a lot of different functionality, rather than solving specific, well-defined computational problems.

That’s where we are. The question is what we do about it. I’d like to propose some questions for discussion:

1. Is it a good idea to follow the Package Versioning Policy? Should the PVP be modified?

This one is certainly going to be controversial. For background, the PVP is a set of guidelines for things like specifying the dependencies of a Haskell package, and how to manage your own package’s version number. The short version is that packages shuld bump at least the minor version number (the x in version 1.x) every time they make a potentially breaking change, such as whenever any entity is removed or renamed or their types are changed. Furthermore, the PVP suggests that dependencies should have have upper bounds on their versions. The goal here is that if you make a change that might break someone else’s package, you should create a new version and their package will continue to build against the old version.

There are two possible effects of following the PVP by adding upper bounds on dependencies:

Someone might try to install some package, and because of an upper bound, Cabal builds it against an older version of some library. This causes the build to succeed, where it otherwise would have failed because you removed or renamed something.
Someone might try to install some package, and because of an upper bound, Cabal fails to find the right combination of dependencies, and refuses to build it at all.

Just to be argumentative, I’ll mention that it’s pretty clear to me from personal experience that #2 happens a lot more often than #1. People following the package versioning policy by specifying upper bounds is far more likely to prevent ‘cabal install’ from succeeding, than to allow it to succeed. Upper bounds, on balance, make it harder to get Haskell libraries installed successfully. When (again, from personal experience) attempts to build any nontrivial Haskell application have a less than 30% chance or succeeding anyway, should we be all that worried about the theoretical chance that a build might fail?

That’s just one side of the story, though. It’s true that an error due to failed dependencies makes it clearer what is going on than a random failure involving an unresolved symbol or type mismatch during a compile. So this is an open question in my mind.

Perhaps the less contentious way to ask the question would be this: should Cabal be modified to give a warning instead of an error for upper bounds when they would prevent the package from building at all? (And if so, perhaps it should get a new strong upper bound, which indicates someone actually knows that the build fails.)

2. How close are we to the goal of getting GHC and Cabal to tell the difference between exported and non-exported dependencies?

It would be one thing if the problem here were actual incompatibilities in code. If I’m using libraries that rely on different versions of the same package, and they both export things that rely on types or instances from that package, then I should expect the build to fail. But a lot of the time (I’d guess a majority of the time!) that’s not the case. One place this comes up a lot is with network‘s dependency on parsec. But it doesn’t actually export any parsers; its use of parsec is an implementation detail.

Similar issues arise in many other situations. Many library dependencies are a matter of implementation, not public interface. Even where it’s not currently the case, if this is fixed, it will change the community’s best practice to include using a lot more newtype wrappers rather than re-exporting other package’s types, or splitting packages if they provide substantial functionality that does not need the re-exports.

I mentioned earlier that Haskell deals with this problem on a whole different scale than other languages: part of the reason is this lack of distinction between implementation detail and exposed interfaces.

3. Can we stop cabal-install from breaking the package database?

A very special case of this problem happens in a particularly disturbing way. It goes like this:

Package foo depends on bar-0.5 and baz-1.2.
Package flox depends on bar-0.5 and baz-1.1.
Package bar-0.5 depends on baz, but has no preference between versions 1.1 and 1.2.

The way this works now is as follows: When I cabal install foo, Cabal first builds baz-1.2, and then bar-0.5, and then foo. But if I later cabal install flox, then Cabal will build baz-1.1 (this is fine, since multiple package versions can exist at once), then it will rebuild bar-0.5, linking against baz-1.1. (This does not coexist; because bar has the same version number, it gets overwritten.). At this point, foo is broken. Running ghc-pkg check will complain about it being broken because of a missing dependency.

I’m not sure what the right thing to do is here; I suppose that if bar-0.5 re-exports types from baz, then its “version” needs to include the version information from baz somewhere. In any case, this is an extremely confusing, and extremely common, issue to run into, and it results in an inconsistent package database without so much as a warning. Something really needs to be done.

4. What is the best way to deal with this in the interim?

Yackage is a great idea. (Or any other way to maintain a local Hackage; I’m aware of discussion about whether it might be better to just use the new hackage server code eventually, and I don’t think the implementation particularly matters.) Michael Snoyman’s goal was really more about maintaining collections of packages that he’s maintaining… but for a real-world software project that otherwise won’t build, it sounds like a great way to keep track of local modifications to other people’s packages.

Another idea that might work really well is to just be able to ask cabal-install to remember a modification to various packages’ build-depends’ fields persistently. So instead of having to download, build, and manually install these packages just to change their package.cabal files, cabal-install would continue updating from Hackage, but reconcile your existing build-depends requests (“relax fizbuzz’s build-depends to build with foobar-0.5”) automatically. Even better, make it easy to get a list of which of these local changes are still at odds with public packages, to report to either the maintainer.

All of these are options for mitigating the problem. But first, I think we need to realize that this is a serious problem. I’m afraid there’s a bit of sampling bias here; I have to believe that these problems aren’t getting solved because many established Haskellers tend to work on projects with very narrowly tailored scope… and this is true because many people who want to work on more general (“real world” by my definition above) projects have often fled to languages that don’t make it a week-long job to get the dependencies for a project to all build at once.