Skip to content
October 3, 2009 / cdsmith

Playing With Records

Haskell has an existing record syntax that works for some purposes, and doesn’t work very well for some others.  For this blog entry today, I’ll try to build a better alternative as a library within the language.  It will have some advantages (mainly in flexibility), and some disadvantages (mainly in difficulty of type checking, and in having poorer syntactic niceties).  I’m also about 100% certain that this isn’t the first time someone has done these things; but perhaps it’s the first time someone has described them in this way, or perhaps you can discuss existing systems in the comments.

The Current System

Here’s an example of the existing record system:

data Employee = Employee { name :: String, salary :: Double }

That defines a new type for employees, and two named fields called name and salary, each of which has a different type.  I can define an employee in one of two ways:

chris = Employee "Chris" 0.01
bob   = Employee { name = "Bob", salary = 1000000 }

The first one uses positional values, and the second uses named values.  Both define exactly the same sort of value, and I can use the field names to access their properties

name chris {- The result is "Chris" -}
salary bob {- The result is 1000000 -}

The property names can also be used to set values.  For example, I can double my salary.

chris' = chris { salary = 0.02 }

The important thing to note here is that currently, the field names for a record are used in two ways: first, they operate as labels in the syntax that uses braces; and second, they work as functions to get the value of a field.

Building the Alternative

We’ll get started with a few language extensions:

{-# LANGUAGE MultiParamTypeClasses     #-}
{-# LANGUAGE FunctionalDependencies    #-}
{-# LANGUAGE FlexibleInstances         #-}
{-# LANGUAGE FlexibleContexts          #-}
{-# LANGUAGE ExistentialQuantification #-}

To start, we’ll need some data type to represent a property of a record.  The type will depend on the types of the record, and the field.  A property consists of a way to retrieve the value, and a way to change the value.

data Property rec a = Property { set :: a -> rec -> rec,
                                 get :: rec -> a }

Yes, I’m aware of the irony of using the existing record syntax to define a replacement!

At least for backward compatibility purposes, I’d like to retain the ability to use the property name alone as the getter function.  I’ll present this in the order I thought when I wrote it, so hopefully the thought process can be duplicated.  I started with an example.

data Employee = Employee {- name   :: -} String
                         {- salary :: -} Double

First, I know that I want a type class, because that’s the only way that one name can stand for both a function and a named data type like Property.  That type class needs to encode the type of the record, and the type of the field, and also needs a type for the property name to map to.

name   :: Accessor Employee String prop => prop
salary :: Accessor Employee Double prop => prop

When I use one of those variables, I want it to possibly represent a value of the appropriate Property type, or possibly a value of the getter function type.  To define them, though, I want to just start with the property.  So the purpose of this type class is to convert from a Property to either itself, or else a getter function.  Then, as is often needed with multi-parameter type classes, I need some functional dependencies to help the type checker.

class Accessor rec a prop | prop -> rec, prop -> a where
    fromProperty :: Property rec a -> prop

instance Accessor rec a (Property rec a) where fromProperty = id
instance Accessor rec a (rec -> a)       where fromProperty = get

I can now define the variables I declared types for earlier.

name   = fromProperty $ Property setter getter
    where setter n' (Employee n s) = Employee n' s
          getter (Employee n s)    = n

salary = fromProperty $ Property setter getter
    where setter s' (Employee n s) = Employee n s'
          getter (Employee n s)    = s

Using the New Records

Using the new record types starts to look somewhat similar to the old ones.  I can either use “get”, or I can just use the field name as an accessor function.

name chris {- The result is "Chris" -}
salary bob {- The result is 1000000 -}
get name chris {- The result is also "Chris" -}
get salary bob {- The result is also 1000000 -}

Setting a value is a little different.

chris' = set salary 0.02 chris

Not too bad.  This also has the advantage, versus the current record syntax, that it can be partially applied; in other words, it’s a first class function.

rewardLoyalty = set salary 1000000

If I want to set multiple fields at once, though, it gets a little uglier.

bobette = set name "Bobette" (set salary 2000000 bob)

Hmm, definitely not pretty.  Fortunately, since properties are a first-class value, we can fix this now.  I’ll exploit the very convenient fact that := is a constructor in Haskell.  I’ll also need existential types to contain the variation in types between different properties of an object.

data NameValuePair rec = forall a. Property rec a := a

setAll :: [NameValuePair rec] -> rec -> rec
setAll pairs r = foldl setOne r pairs
    where setOne r (p := v) = set p v r

and now the earlier example looks much nicer.

bobette = setAll [ name := "Bobette", salary := 20000 ] bob

It’s a nice property that we were able to solve the problem within the language just by virtue of having a first-class representation of the record in the language.

Definition of a new record type is a tad more complex.  The code above to do so for Employee was very ugly, to say the least.  However, this is the one area where we’d simply expect some kind of language support to help out.  We could build such a thing using Template Haskell.  In fact, it would very nearly not cause any kind of incompatibility if the existing record syntax were just modified so that instead of producing the field names as functions, it produced them as polymorphic Accessor types as described above.

Limitations

This is certainly not perfect, as record systems go.  The Accessor type class requires all manner of type system extensions (namely, MPTCs, fundeps, and flexible instances and contexts), and even so, I can’t currently convince it to fully realize all of the types.  For example, take this exchange with GHCi:

*Main> :t name chris
name chris :: (Accessor Employee String (Employee -> t)) => t

Even though the only type that could possibly be used there is String, GHCi fails to recognize this, and gives us a long type involving the Accessor type class.  Hopefully, there’s a way around this within the type system, which I’ve merely missed.  Without caution, though, the code above could lead to lots of unintended polymorphism that could result in a serious performance hit.  In fact, there’s perhaps quite a strong argument to be made that if a new record system is being defined, perhaps one ought to eschew backward compatibility, or treat it as deprecated and eventually to be removed… and instead require that everyone write “get name” instead of merely “name”.  This eliminates most of the really weird type stuff that’s going on here, and thereby improves the error messages considerably.  On the other hand, it feels a but unusual to write verbs like that in a functional programming language.

Pattern matching is another very serious limitation.  Since the individual values in a record can’t be accessed except by evaluating functions, they can’t be referred to in a pattern matching definition of a function.  While I tend to never do this with the existing record syntax anyway, it remains a serious limitation.  It’s worth mentioning, though, that view patterns alleviate this difficulty when pattern matching on just one field value.  With multiple field values, view patterns can be used in conjunction with &&& from the Control.Arrow module to accomplish the same thing; it’s just ugly.  It would be nice to see a minimal extension of the view pattern feature to get around this by allowing multiple views to be applied side-by-side to the same parameter.  Perhaps something like (and no, I haven’t checked what this does to grammar conflicts):

displayName :: Employee -> String
displayName (name -> n ; salary -> s) | s > 100000 = map toUpper n
                                      | otherwise  = map toLower n

GHC also provides language extensions for treating record field names as unambiguous in places where the intended field can be determined by the types of values in pattern matching.  Indeed, there have been other proposals for type-directed resolution of ambiguous types; but I’ve always seen it mixed in with dot-record syntax.  This would need to be a language extension still.  Record punning and wildcards, similarly, don’t look to be definable within a library.  Whether they are worth recovering is a different discussion on which I know people have varying opinions.  Perhaps, though, punning at least could be viewed as a feature of view patterns rather than of records.

More Fun With the Property Type

There are two more interesting things we can do with the Property type defined earlier.

One of the features of record syntax in Haskell is that data types with multiple data constructors can still be records, and different constructors can provide fields by the same names.  It’s not hard to see that the same can happen here.

data Person = Customer {- name :: -} String {- , creditRating :: -} Double
            | Employee {- name :: -} String {- , salary :: -} Double

name = fromProperty $ Property setter getter
    where setter n' (Customer n c) = Customer n' c
          setter n' (Employee n s) = Employee n' s
          getter (Customer n c)    = n
          getter (Employee n s)    = n

creditRating = fromProperty $ Property getter setter
    where setter c' (Customer n c) = Customer n c'
          getter (Customer n c)    = c

salary = fromProperty $ Property getter setter
    where setter s' (Employee n s) = Employee n s'
          getter (Employee n s)    = s

This behaves the same way as the corresponding instance would in normal Haskell record syntax.  The treatment of properties as first-class language-definable values, though, opens up additional possibilities.  For example, take this definition of a complex number:

data MyComplex = MyComplex {-  real :: -} Double {- , imag :: -} Double

real = fromProperty $ Property setter getter
    where setter r' (MyComplex r i) = MyComplex r' i
          getter (MyComplex r i)    = r

imag = fromProperty $ Property setter getter
    where setter i' (MyComplex r i) = MyComplex r i'
          getter (MyComplex r i)    = i

That’s all the same so far… but, one can also manipulate the magnitude and argument of a complex number as properties.

magnitude = fromProperty $ Property setter getter
    where setter m c             = let a = get argument c
                                   in MyComplex (m * cos a) (m * sin a)
          getter (MyComplex r i) = sqrt (r^2 + i^2)

argument = fromProperty $ Property setter getter
    where setter a c             = let m = get magnitude c
                                   in MyComplex (m * cos a) (m * sin a)
          getter (MyComplex r i) = atan2 i r

We’ve now defined virtual properties of this record type.  (Yes, it seems silly to do it this way, but mainly just because it’s silly to define complex numbers as a record type to begin with.)

5 Comments

Leave a Comment
  1. Gregory Collins / Oct 3 2009 4:03 pm
  2. brian / Oct 3 2009 11:10 pm

    Thanks for writing this up. I’ll experiment with the code.

    I’m mainly writing to say that the problem I’ve had with data-accessor is that I define my ADTs in separate modules so that I can name the fields like ‘salary’ and import them qualified and use them like ‘E.salary’ instead of having to name the fields, e.g., ’employeeSalary’. But this leads to code like e^.E.salary, not too nice.

    • cdsmith / Oct 3 2009 11:39 pm

      It turns out that this is, in principal, very similar to both data-accessor and fclabels. I wasn’t aware of them when I wrote this, and I strongly recommend using an existing package unless there’s a compelling reason for the alternative. I don’t see one yet. If you prefer using prefix functions instead of ^. and ^=, then data-accessor defines getVal and setVal, while fclabels defines get and set. On the other hand, people coalescing around existing libraries that meet their needs is a very good thing in the long term.

      There are, undoubtedly, some problems to be solved though, before a library-based record system can adequately replace the built-in system. These include: (a) pattern matching — which I’m now somewhat convinced will be solved with some minor extension to view patterns; (b) ambiguity among field names, which is the driving factor in the problem you’ve run into above; and (c) well, I was going to say type-safe construction of values with a static guarantee that you provided a value for all the fields… but then I experimented and discovered that even Haskell’s existing record system doesn’t give you that. Amazing!

Leave a comment