Sententia cdsmithus

October 4, 2009

View Patterns as Pattern Matching for Records

Filed under: Uncategorized — cdsmith @ 1:31 am

I’ve learned a lot in the last day about record systems for Haskell built as libraries.  I came up with what seemed like an obvious answer, and indeed it was essentially the same as two existing packages on Hackage — fclabels and data-accessor — the second of which is among the more popular downloads out there.  That’s always a good sign, that a significant number of people needed the same thing, and liked the same answer.  But there are some things, still, that are provided by the built-in package system, either as standard features or GHC extensions, that it’s unclear how to accomplish as a library.

The one that’s intriguing me most right now is pattern matching, because I think it almost got inadvertently solved a couple years ago.

Right now, I can write this for a record type in Haskell.

foo :: MyRecord -> Int
foo (MyRecord { field1 = Just k, field2 = 4, field3 = (x:_) }) = 4*k - x

That’s a pattern match that disassembles the record, by named fields, and matches each field using the full variety of patterns available in any other context.  Pretty powerful stuff.  Unfortunately, it’s built in to Haskell’s rather limited record system, and is among the casualties incurred when switching to a different system.  One needs to fall back to the underlying “native” record.  In the case of the fclabels and data-accessor packages, for example, this means using a whole separate set of names involving underscores, further polluting the namespace with symbols related only by similarity of name.  Not a purely practical disadvantage, perhaps, but one that I’m sure grates against the nerves of anyone looking for neat, elegant answers.

Fortunately, as I mentioned briefly in my last post, the answer already exists, and has already been added to GHC.  The answer here is view patterns.  In data-accessor package language, one can actually write the following to express part of the above.

foo ((^. field1) -> Just k) = k

(Admittedly, one does obtain a warning about pattern matching overlaps when doing this.  I’m unsure why.)

Unfortunately, this very nice approach doesn’t appear to currently extend as nicely to pattern matching on several fields at the same time.  A view pattern has one view, which matches an argument; a second view pattern matches a second argument, which isn’t what we want.  The best we can do is to define a view that selects several fields…

foo ((^. field1) &&& (^. field3) -> (Just k, (x:_))) = 4*k

Barely passable for two fields… but add a third field, and suddenly you have to know about the associativity of &&& (which, by the way, we took from the Control.Arrow package).  Besides, the distance and way that conditions on different fields are mixed together isn’t terribly appealing.  We’d really like to just be able to pattern match the same argument multiple times, requiring that all of them succeed.

Ironically, this is hardly a new problem.  It’s actually the same thing encountered in @-patterns.  One wants to pattern match twice there, too — once to bind a name to the entire value, and again to deconstruct it and bind names or match pieces.  Without view patterns, that was the only situation in which one might want to pattern match the same value in several ways; the only choice was to either open up a data constructor and match on it, or else don’t.  With view patterns, though, the possibility arises any number of ways.  (Data.FingerTree contains both the functions viewl and viewr, for instance; I’m not personally aware of an application that would like to pattern match on both at once, but it doesn’t seem terribly hard to imagine that it might occur.)

It makes sense to propose, in lieu of trying to invent a way to extend record pattern-matching syntax to a new kind of records, to simply provide a mechanism for simultaneously pattern-matching the same argument in several ways.  What we would obtain from this is to generalize the problems solved by @-patterns and record syntax at once, as well as potential future as-yet-unidentified problems.

In other words, the real feature we want is simultaneous patterns — multiple patterns that are matched side-by-side with a single value.  The semantics are that the pattern match succeeds only if all of them do, in which case all of the variables occurring in the pattern are appropriately bound.

In my last post, I proposed using a semicolon to separate the patterns.  Here’s the previous example in this form, using data-accessor syntax.

foo :: MyRecord -> Int
foo ((^.field1) -> Just k ; (^.field2) -> 4 ; (^.field3) -> (x:_)) = 4*k - x

With that, we’d have solved one of the outstanding issues in building a library replacement for Haskell’s distinguished record syntax.  Essentially, there’d just be no more need for a “special” record syntax for pattern matching.  We’d have done it not by adding a new distinguished record syntax, but rather by opening up the existing pattern matching features (including view patterns, of course) to accomodate a rather straight-forward extension.

If I don’t see an obvious reason not to pursue this, I may try to determine how difficult it would be to implement this as a GHC extension.

October 3, 2009

Playing With Records

Filed under: Uncategorized — cdsmith @ 1:44 pm

Haskell has an existing record syntax that works for some purposes, and doesn’t work very well for some others.  For this blog entry today, I’ll try to build a better alternative as a library within the language.  It will have some advantages (mainly in flexibility), and some disadvantages (mainly in difficulty of type checking, and in having poorer syntactic niceties).  I’m also about 100% certain that this isn’t the first time someone has done these things; but perhaps it’s the first time someone has described them in this way, or perhaps you can discuss existing systems in the comments.

The Current System

Here’s an example of the existing record system:

data Employee = Employee { name :: String, salary :: Double }

That defines a new type for employees, and two named fields called name and salary, each of which has a different type.  I can define an employee in one of two ways:

chris = Employee "Chris" 0.01
bob   = Employee { name = "Bob", salary = 1000000 }

The first one uses positional values, and the second uses named values.  Both define exactly the same sort of value, and I can use the field names to access their properties

name chris {- The result is "Chris" -}
salary bob {- The result is 1000000 -}

The property names can also be used to set values.  For example, I can double my salary.

chris' = chris { salary = 0.02 }

The important thing to note here is that currently, the field names for a record are used in two ways: first, they operate as labels in the syntax that uses braces; and second, they work as functions to get the value of a field.

Building the Alternative

We’ll get started with a few language extensions:

{-# LANGUAGE MultiParamTypeClasses     #-}
{-# LANGUAGE FunctionalDependencies    #-}
{-# LANGUAGE FlexibleInstances         #-}
{-# LANGUAGE FlexibleContexts          #-}
{-# LANGUAGE ExistentialQuantification #-}

To start, we’ll need some data type to represent a property of a record.  The type will depend on the types of the record, and the field.  A property consists of a way to retrieve the value, and a way to change the value.

data Property rec a = Property { set :: a -> rec -> rec,
                                 get :: rec -> a }

Yes, I’m aware of the irony of using the existing record syntax to define a replacement!

At least for backward compatibility purposes, I’d like to retain the ability to use the property name alone as the getter function.  I’ll present this in the order I thought when I wrote it, so hopefully the thought process can be duplicated.  I started with an example.

data Employee = Employee {- name   :: -} String
                         {- salary :: -} Double

First, I know that I want a type class, because that’s the only way that one name can stand for both a function and a named data type like Property.  That type class needs to encode the type of the record, and the type of the field, and also needs a type for the property name to map to.

name   :: Accessor Employee String prop => prop
salary :: Accessor Employee Double prop => prop

When I use one of those variables, I want it to possibly represent a value of the appropriate Property type, or possibly a value of the getter function type.  To define them, though, I want to just start with the property.  So the purpose of this type class is to convert from a Property to either itself, or else a getter function.  Then, as is often needed with multi-parameter type classes, I need some functional dependencies to help the type checker.

class Accessor rec a prop | prop -> rec, prop -> a where
    fromProperty :: Property rec a -> prop

instance Accessor rec a (Property rec a) where fromProperty = id
instance Accessor rec a (rec -> a)       where fromProperty = get

I can now define the variables I declared types for earlier.

name   = fromProperty $ Property setter getter
    where setter n' (Employee n s) = Employee n' s
          getter (Employee n s)    = n

salary = fromProperty $ Property setter getter
    where setter s' (Employee n s) = Employee n s'
          getter (Employee n s)    = s

Using the New Records

Using the new record types starts to look somewhat similar to the old ones.  I can either use “get”, or I can just use the field name as an accessor function.

name chris {- The result is "Chris" -}
salary bob {- The result is 1000000 -}
get name chris {- The result is also "Chris" -}
get salary bob {- The result is also 1000000 -}

Setting a value is a little different.

chris' = set salary 0.02 chris

Not too bad.  This also has the advantage, versus the current record syntax, that it can be partially applied; in other words, it’s a first class function.

rewardLoyalty = set salary 1000000

If I want to set multiple fields at once, though, it gets a little uglier.

bobette = set name "Bobette" (set salary 2000000 bob)

Hmm, definitely not pretty.  Fortunately, since properties are a first-class value, we can fix this now.  I’ll exploit the very convenient fact that := is a constructor in Haskell.  I’ll also need existential types to contain the variation in types between different properties of an object.

data NameValuePair rec = forall a. Property rec a := a

setAll :: [NameValuePair rec] -> rec -> rec
setAll pairs r = foldl setOne r pairs
    where setOne r (p := v) = set p v r

and now the earlier example looks much nicer.

bobette = setAll [ name := "Bobette", salary := 20000 ] bob

It’s a nice property that we were able to solve the problem within the language just by virtue of having a first-class representation of the record in the language.

Definition of a new record type is a tad more complex.  The code above to do so for Employee was very ugly, to say the least.  However, this is the one area where we’d simply expect some kind of language support to help out.  We could build such a thing using Template Haskell.  In fact, it would very nearly not cause any kind of incompatibility if the existing record syntax were just modified so that instead of producing the field names as functions, it produced them as polymorphic Accessor types as described above.

Limitations

This is certainly not perfect, as record systems go.  The Accessor type class requires all manner of type system extensions (namely, MPTCs, fundeps, and flexible instances and contexts), and even so, I can’t currently convince it to fully realize all of the types.  For example, take this exchange with GHCi:

*Main> :t name chris
name chris :: (Accessor Employee String (Employee -> t)) => t

Even though the only type that could possibly be used there is String, GHCi fails to recognize this, and gives us a long type involving the Accessor type class.  Hopefully, there’s a way around this within the type system, which I’ve merely missed.  Without caution, though, the code above could lead to lots of unintended polymorphism that could result in a serious performance hit.  In fact, there’s perhaps quite a strong argument to be made that if a new record system is being defined, perhaps one ought to eschew backward compatibility, or treat it as deprecated and eventually to be removed… and instead require that everyone write “get name” instead of merely “name”.  This eliminates most of the really weird type stuff that’s going on here, and thereby improves the error messages considerably.  On the other hand, it feels a but unusual to write verbs like that in a functional programming language.

Pattern matching is another very serious limitation.  Since the individual values in a record can’t be accessed except by evaluating functions, they can’t be referred to in a pattern matching definition of a function.  While I tend to never do this with the existing record syntax anyway, it remains a serious limitation.  It’s worth mentioning, though, that view patterns alleviate this difficulty when pattern matching on just one field value.  With multiple field values, view patterns can be used in conjunction with &&& from the Control.Arrow module to accomplish the same thing; it’s just ugly.  It would be nice to see a minimal extension of the view pattern feature to get around this by allowing multiple views to be applied side-by-side to the same parameter.  Perhaps something like (and no, I haven’t checked what this does to grammar conflicts):

displayName :: Employee -> String
displayName (name -> n ; salary -> s) | s > 100000 = map toUpper n
                                      | otherwise  = map toLower n

GHC also provides language extensions for treating record field names as unambiguous in places where the intended field can be determined by the types of values in pattern matching.  Indeed, there have been other proposals for type-directed resolution of ambiguous types; but I’ve always seen it mixed in with dot-record syntax.  This would need to be a language extension still.  Record punning and wildcards, similarly, don’t look to be definable within a library.  Whether they are worth recovering is a different discussion on which I know people have varying opinions.  Perhaps, though, punning at least could be viewed as a feature of view patterns rather than of records.

More Fun With the Property Type

There are two more interesting things we can do with the Property type defined earlier.

One of the features of record syntax in Haskell is that data types with multiple data constructors can still be records, and different constructors can provide fields by the same names.  It’s not hard to see that the same can happen here.

data Person = Customer {- name :: -} String {- , creditRating :: -} Double
            | Employee {- name :: -} String {- , salary :: -} Double

name = fromProperty $ Property setter getter
    where setter n' (Customer n c) = Customer n' c
          setter n' (Employee n s) = Employee n' s
          getter (Customer n c)    = n
          getter (Employee n s)    = n

creditRating = fromProperty $ Property getter setter
    where setter c' (Customer n c) = Customer n c'
          getter (Customer n c)    = c

salary = fromProperty $ Property getter setter
    where setter s' (Employee n s) = Employee n s'
          getter (Employee n s)    = s

This behaves the same way as the corresponding instance would in normal Haskell record syntax.  The treatment of properties as first-class language-definable values, though, opens up additional possibilities.  For example, take this definition of a complex number:

data MyComplex = MyComplex {-  real :: -} Double {- , imag :: -} Double

real = fromProperty $ Property setter getter
    where setter r' (MyComplex r i) = MyComplex r' i
          getter (MyComplex r i)    = r

imag = fromProperty $ Property setter getter
    where setter i' (MyComplex r i) = MyComplex r i'
          getter (MyComplex r i)    = i

That’s all the same so far… but, one can also manipulate the magnitude and argument of a complex number as properties.

magnitude = fromProperty $ Property setter getter
    where setter m c             = let a = get argument c
                                   in MyComplex (m * cos a) (m * sin a)
          getter (MyComplex r i) = sqrt (r^2 + i^2)

argument = fromProperty $ Property setter getter
    where setter a c             = let m = get magnitude c
                                   in MyComplex (m * cos a) (m * sin a)
          getter (MyComplex r i) = atan2 i r

We’ve now defined virtual properties of this record type.  (Yes, it seems silly to do it this way, but mainly just because it’s silly to define complex numbers as a record type to begin with.)

Blog at WordPress.com.