This is a summary of what I’ve been playing with in case people find it interesting.

In general, there are three ways to find the derivative of a function:

1. Do the symbolic manipulation of formulae that we all learned in school when we were 16 years old.  This assumes that one has the function as an algebraic expression of some kind in terms of known quantities.
2. Do it numerically, by computing (f(x + h) – f(x)) / h for some small value of h.  This suffers from numerical inaccuracies, and will generally cause rises in the blood pressure of numerical analysts.
3. Calculate the value of the function and its derivative simultaneously at a given point of evaluation.

The last option seems to have a lot of names: automatic differentiation, algorithmic differentiation, and a few more.  Just for fun, I wrote an implementation of a very simplified version of it in Haskell.

### The Implementation

```module AD where

data AD a = AD a a

instance Eq a => Eq (AD a) where
(AD x dx) == (AD y dy)    = x == y

instance Show a => Show (AD a) where
show (AD x dx) = show x ++ " + " ++ show dx ++ " eps"

instance Num a => Num (AD a) where
(AD x dx) + (AD y dy)       = AD (x + y)          (dx + dy)
(AD x dx) - (AD y dy)       = AD (x - y)          (dx - dy)
(AD x dx) * (AD y dy)       = AD (x * y)          (dx * y + x * dy)
negate (AD x dx)            = AD (negate x)       (negate dx)
abs (AD 0 _)                = error "not differentiable: |0|"
abs (AD x dx)               = AD (abs x)          (dx * signum x)
signum (AD 0 _)             = error "not differentiable: signum(0)"
signum (AD x dx)            = AD (signum x)       0
fromInteger i               = AD (fromInteger i)  0

instance Fractional a => Fractional (AD a) where
(AD x dx) / (AD y dy)       = AD (x / y)          ((dx * y - x * dy) / y^2)
recip (AD x dx)             = AD (1 / x)          ((-dx) / x^2)
fromRational x              = AD (fromRational x) 0

instance Floating a => Floating (AD a) where
pi                          = AD pi        0
exp (AD x dx)               = AD (exp x)   (dx * exp x)
sqrt (AD x dx)              = AD (sqrt x)  (dx / (2 * sqrt x))
log (AD x dx)               = AD (log x)   (dx / x)
(AD x dx) ** (AD y dy)      = AD (x ** y)  (dx * y * (x ** (y-1)) +
dy * (x ** y) * log x)
sin (AD x dx)               = AD (sin x)   ( dx * cos x)
cos (AD x dx)               = AD (cos x)   (-dx * sin x)
asin (AD x dx)              = AD (asin x)  ( dx / sqrt (1 - x^2))
acos (AD x dx)              = AD (acos x)  (-dx / sqrt (1 - x^2))
atan (AD x dx)              = AD (atan x)  (dx / (1 + x^2))
sinh (AD x dx)              = AD (sinh x)  (dx * cosh x)
cosh (AD x dx)              = AD (cosh x)  (dx * sinh x)
asinh (AD x dx)             = AD (asinh x) (dx / sqrt (x^2 + 1))
acosh (AD x dx)             = AD (acosh x) (dx / sqrt (x^2 - 1))
atanh (AD x dx)             = AD (atanh x) (dx / (1 - x^2))

diff           :: Num a        => (AD a -> AD t)                     -> a -> t
diffNum        :: Num b        => (forall a. Num a        => a -> a) -> b -> b
diffFractional :: Fractional b => (forall a. Fractional a => a -> a) -> b -> b
diffFloating   :: Floating b   => (forall a. Floating a   => a -> a) -> b -> b

diff f x           = let AD y dy = f (AD x 1) in dy
diffNum f x        = let AD y dy = f (AD x 1) in dy
diffFractional f x = let AD y dy = f (AD x 1) in dy
diffFloating f x   = let AD y dy = f (AD x 1) in dy```

Essentially, this declares a data type to represent the pair of a number and its first derivative.  It then defines how to perform primitive math operations on that data type, for three of Haskell’s numerical type classes.  Note that basically the entire implementation consists of talking about how derivatives fare in the face of certain operations.  In fact, you may recognize a lot of it from the algebra of differentials in calculus.  For example, if z = xy, then dz = x dy + y dx, and so forth.  (One quick note, because someone else asked me about this.  The exponentiation operator ** is complicated because both the base and the exponent could be non-constant.  So it must use both the power rule and the exponentiation rule.)

### My Argument With the Type System

The last section is certainly weird.  It defines four identical functions, because the type system gets in the way of doing what I really want.  What I would like is for the diff function to map functions over the numeric typeclasses to other functions over the same type classes.  For example, the derivative of a function with type (Num a => a -> a) will itself have type (Num a => a -> a).  The derivative of a function with type (Floating a => a -> a) will itself have type (Floating a => a -> a).  But the only way that works is for diff to get a type that depends on AD.  That’s confusing, because the type AD is an implementation detail; and I’d rather not even export it.  The user of this module won’t think they’ve written a function for the type AD; they’ve just written a function over any type that is an instance of, say, Floating.

To illustrate something closer to what would make me happy, I’ve also given the same function three better types, in diffNum, diffFractional, and diffFloating.  These rank 2 types provide the right level of abstraction; but unfortunately, it’s impossible to write only one function that works for all three numeric type classes.  Here’s what I’d really like to be able to say:

`diff :: forall A a. (A :> Floating, A a, Num a) => (forall b. (A b) => b -> b) -> a -> a`

Here, I’m using `:>` to mean “is a superclass of”, and A is meant to quantify over type classes. In other words, the type says that given any type class that’s a superclass of Floating, I can pass in a function that’s polymorphic over that type class, and get back its derivative (interpreting its domain and range as being, at the very least, numbers). But that’s not possible, so the functions above all have some of the advantages, and I’ve written all of them.

### Does It Work?

Here’s the yes part.

```\$ ghci -XRankNTypes autodiff.hs
> let f x = 3 * x^2 + 2 * x - 3
> let f' = diff f
> f' 3
20
> let g x = 1 / sqrt (exp x - asinh x)
> let g' = diff g
> g' 3
-0.12660691544665373```

So far, so good.

The “no” part of the answer is because there are a few specific situations in which the code will give the wrong answer.  First, the code I’ve written often gives a derivative when the derivative is really undefined.  Sometimes this is me being sloppy; for example, the log of -1 is undefined, but this code provides a derivative anyway.  I could fix this if I were willing to make my code a little uglier.  Second, uses of fromRational and fromInteger must be constants, because the code assumes they are.  It was pointed out by Luke Palmer that I can define a function of the correct type (Num a => a -> a) by using the Show superclass to convert a value to a string, then read it as an Integer, do all sorts of things to it, and then convert the result back to the type a by using fromInteger.  Doing so will give a derivative of zero, even if your function really has a different derivative.

A more nefarious example is this:

`f x = if x == 1 then 1 else x^2`

This is the same function as f x = x^2.  However, asking for the derivative at x = 1 will return 0, because the function returns the constant 1, whose first derivative is 0.  In particular, the technique runs into problems right on the boundaries of intervals where the function is defined piecewise.  It doesn’t appear to me that there’s a particularly good way of dealing with this, except to process the source code and modify if statements to refuse to calculate derivatives at those locations.  That’s far beyond the scope of a little playing with overloading, so I have no intention of doing it.

### Extending to Other Cool Stuff

If we just got functions computing values, this wouldn’t be very visual.  In the effort to avoid introducing GUI libraries, I’ll play the old trick of graphing functions sideways in ASCII art.  Let’s look at the second function in my example above… the weird one with square roots, exponentials, and inverse hyperbolic sines.

Here’s a test program.

```import AD

graph f ymin ystep xs = mapM_ (putStrLn . flip replicate '*'
. round
. (/ ystep)
. (subtract ymin))
(map f xs)

main = do let g x = 1 / sqrt (exp x - asinh x)
graph g               0       (1/75)  [0.0, 0.1 .. 4.0]
putStrLn "------------------------------------"
graph (diff g)        (-0.5)  (1/175) [0.0, 0.1 .. 4.0]
putStrLn "------------------------------------"
graph (diff (diff g)) (-0.75) (1/75)  [0.0, 0.1 .. 4.0]```

That just defines our function g again, and graphs it and its first two derivatives using ASCII art.  Here’s the result.

```***************************************************************************
***************************************************************************
**************************************************************************
*************************************************************************
***********************************************************************
*********************************************************************
*******************************************************************
****************************************************************
*************************************************************
**********************************************************
*******************************************************
****************************************************
*************************************************
***********************************************
********************************************
*****************************************
***************************************
*************************************
***********************************
*********************************
*******************************
*****************************
***************************
**************************
************************
***********************
**********************
*********************
********************
*******************
******************
*****************
****************
***************
**************
*************
*************
************
***********
***********
**********
------------------------------------
****************************************************************************************
******************************************************************************
*******************************************************************
********************************************************
*********************************************
***********************************
***************************
**********************
******************
*****************
*****************
******************
********************
***********************
**************************
******************************
*********************************
*************************************
****************************************
*******************************************
**********************************************
************************************************
***************************************************
*****************************************************
*******************************************************
*********************************************************
***********************************************************
*************************************************************
**************************************************************
****************************************************************
*****************************************************************
*******************************************************************
********************************************************************
*********************************************************************
**********************************************************************
***********************************************************************
************************************************************************
*************************************************************************
**************************************************************************
**************************************************************************
***************************************************************************
------------------------------------
*******************
************
********
********
************
******************
***************************
*************************************
**********************************************
******************************************************
************************************************************
****************************************************************
*******************************************************************
*********************************************************************
**********************************************************************
***********************************************************************
***********************************************************************
**********************************************************************
**********************************************************************
*********************************************************************
********************************************************************
*******************************************************************
*******************************************************************
******************************************************************
*****************************************************************
*****************************************************************
****************************************************************
***************************************************************
***************************************************************
**************************************************************
**************************************************************
**************************************************************
*************************************************************
*************************************************************
*************************************************************
************************************************************
************************************************************
************************************************************
************************************************************
***********************************************************
***********************************************************
```

1. Cale Gibbard / Nov 29 2007 3:30 pm

One solution would be to not export the data constructor for the AD type. Then the only things you can do with values of type AD are to use the provided functions anyway.

2. augustss / Nov 29 2007 4:01 pm

Perhaps you’ll find this interesting. It just the same as Jerzy described in his paper, but it provides all the derivatives, not just the first. http://augustss.blogspot.com/2007/04/overloading-haskell-numbers-part-2.html

The code is in Hackage, in a package called numbers.

3. Andy / Dec 1 2007 6:46 pm

Hi! Very nice code – it’s great to see an elegant language (Haskell) being used in the most elegant area of mathematics (calculus). ( Btw, what license is the code under – is it P.D.? )
– Andy

4. cdsmith / Dec 3 2007 10:49 pm

Andy, sure. Do whatever you like with it. Please pay attention to the warnings that it doesn’t always work.

5. Barak Pearlmutter / Dec 13 2007 9:50 am

But what if you define f using diff, then apply diff to f?
Turns out you can get the wrong answer, given the way you’ve
written the code. Jeff Siskind and I went nuts on this issue,
concluding that although diff is referentially transparent,
it cannot be implemented in pure Haskell: you need to use
some dirty trick. We exhibit a bunch of such dirty tricks
in Scheme, but can’t do it in Haskell.

One big question is whether some new pure mechanism could
be added that would allow this in Haskell. Existential
types might do the trick, but so far I have not figured out
how…

(details on my publications page, HOSC paper and IFL paper)

6. cdsmith / Dec 13 2007 12:17 pm

Barak,

That’s interesting. Do you have an easy example of how this can give incorrect results?

7. Barak Pearlmutter / Dec 15 2007 8:23 pm

Sure! This gives the wrong answer:

diff (\x -> (diff (x*) 2)) 1

(0 instead of 1) and this fails to type check:

diff (\x -> x*(diff (x*) 2)) 1

If you do some unpleasant manual coercion you can get these to give you the right answers,

diff (\x -> (diff ((AD x 0)*) 2)) 1

diff (\x -> x*(diff ((AD x 0)*) 2)) 1

As far as we can tell, that cannot be automated in Haskell.

8. Barak Pearlmutter / Dec 18 2007 3:51 pm

PS Although insertion of the lift invocations (lift x = AD x 0) cannot be automated, detection of missing invocations can be. This uses branding, see http://thread.gmane.org/gmane.comp.lang.haskell.cafe/22308/ for details.

PPS I am currently putting together the best of the forward-mode AD Haskell I’ve seen to try to make an actual usable version. Of course, this is all basically an exercise before the hard but important part: REVERSE-MODE AD!