# Some Playing with Derivatives

This is a summary of what I’ve been playing with in case people find it interesting.

In general, there are three ways to find the derivative of a function:

- Do the symbolic manipulation of formulae that we all learned in school when we were 16 years old. This assumes that one has the function as an algebraic expression of some kind in terms of known quantities.
- Do it numerically, by computing (f(x + h) – f(x)) / h for some small value of h. This suffers from numerical inaccuracies, and will generally cause rises in the blood pressure of numerical analysts.
- Calculate the value of the function and its derivative simultaneously at a given point of evaluation.

The last option seems to have a lot of names: automatic differentiation, algorithmic differentiation, and a few more. Just for fun, I wrote an implementation of a very simplified version of it in Haskell.

### The Implementation

module AD where data AD a = AD a a instance Eq a => Eq (AD a) where (AD x dx) == (AD y dy) = x == y instance Show a => Show (AD a) where show (AD x dx) = show x ++ " + " ++ show dx ++ " eps" instance Num a => Num (AD a) where (AD x dx) + (AD y dy) = AD (x + y) (dx + dy) (AD x dx) - (AD y dy) = AD (x - y) (dx - dy) (AD x dx) * (AD y dy) = AD (x * y) (dx * y + x * dy) negate (AD x dx) = AD (negate x) (negate dx) abs (AD 0 _) = error "not differentiable: |0|" abs (AD x dx) = AD (abs x) (dx * signum x) signum (AD 0 _) = error "not differentiable: signum(0)" signum (AD x dx) = AD (signum x) 0 fromInteger i = AD (fromInteger i) 0 instance Fractional a => Fractional (AD a) where (AD x dx) / (AD y dy) = AD (x / y) ((dx * y - x * dy) / y^2) recip (AD x dx) = AD (1 / x) ((-dx) / x^2) fromRational x = AD (fromRational x) 0 instance Floating a => Floating (AD a) where pi = AD pi 0 exp (AD x dx) = AD (exp x) (dx * exp x) sqrt (AD x dx) = AD (sqrt x) (dx / (2 * sqrt x)) log (AD x dx) = AD (log x) (dx / x) (AD x dx) ** (AD y dy) = AD (x ** y) (dx * y * (x ** (y-1)) + dy * (x ** y) * log x) sin (AD x dx) = AD (sin x) ( dx * cos x) cos (AD x dx) = AD (cos x) (-dx * sin x) asin (AD x dx) = AD (asin x) ( dx / sqrt (1 - x^2)) acos (AD x dx) = AD (acos x) (-dx / sqrt (1 - x^2)) atan (AD x dx) = AD (atan x) (dx / (1 + x^2)) sinh (AD x dx) = AD (sinh x) (dx * cosh x) cosh (AD x dx) = AD (cosh x) (dx * sinh x) asinh (AD x dx) = AD (asinh x) (dx / sqrt (x^2 + 1)) acosh (AD x dx) = AD (acosh x) (dx / sqrt (x^2 - 1)) atanh (AD x dx) = AD (atanh x) (dx / (1 - x^2)) diff :: Num a => (AD a -> AD t) -> a -> t diffNum :: Num b => (forall a. Num a => a -> a) -> b -> b diffFractional :: Fractional b => (forall a. Fractional a => a -> a) -> b -> b diffFloating :: Floating b => (forall a. Floating a => a -> a) -> b -> b diff f x = let AD y dy = f (AD x 1) in dy diffNum f x = let AD y dy = f (AD x 1) in dy diffFractional f x = let AD y dy = f (AD x 1) in dy diffFloating f x = let AD y dy = f (AD x 1) in dy

Essentially, this declares a data type to represent the pair of a number and its first derivative. It then defines how to perform primitive math operations on that data type, for three of Haskell’s numerical type classes. Note that basically the entire implementation consists of talking about how derivatives fare in the face of certain operations. In fact, you may recognize a lot of it from the algebra of differentials in calculus. For example, if z = xy, then dz = x dy + y dx, and so forth. (One quick note, because someone else asked me about this. The exponentiation operator ** is complicated because both the base and the exponent could be non-constant. So it must use *both* the power rule *and* the exponentiation rule.)

### My Argument With the Type System

The last section is certainly weird. It defines four identical functions, because the type system gets in the way of doing what I really want. What I would like is for the diff function to map functions over the numeric typeclasses to other functions over the same type classes. For example, the derivative of a function with type (Num a => a -> a) will itself have type (Num a => a -> a). The derivative of a function with type (Floating a => a -> a) will itself have type (Floating a => a -> a). But the only way that works is for diff to get a type that depends on AD. That’s confusing, because the type AD is an implementation detail; and I’d rather not even export it. The user of this module won’t think they’ve written a function for the type AD; they’ve just written a function over any type that is an instance of, say, Floating.

To illustrate something closer to what would make me happy, I’ve also given the same function three better types, in diffNum, diffFractional, and diffFloating. These rank 2 types provide the right level of abstraction; but unfortunately, it’s impossible to write only one function that works for all three numeric type classes. Here’s what I’d really like to be able to say:

diff :: forall A a. (A :> Floating, A a, Num a) => (forall b. (A b) => b -> b) -> a -> a

Here, I’m using `:>`

to mean “is a superclass of”, and A is meant to quantify over type classes. In other words, the type says that given *any* type class that’s a superclass of Floating, I can pass in a function that’s polymorphic over that type class, and get back its derivative (interpreting its domain and range as being, at the very least, numbers). But that’s not possible, so the functions above all have some of the advantages, and I’ve written all of them.

### Does It Work?

Glad you asked. Yes and no.

Here’s the yes part.

$ ghci -XRankNTypes autodiff.hs > let f x = 3 * x^2 + 2 * x - 3 > let f' = diff f > f' 3 20 > let g x = 1 / sqrt (exp x - asinh x) > let g' = diff g > g' 3 -0.12660691544665373

So far, so good.

The “no” part of the answer is because there are a few specific situations in which the code will give the wrong answer. First, the code I’ve written often gives a derivative when the derivative is really undefined. Sometimes this is me being sloppy; for example, the log of -1 is undefined, but this code provides a derivative anyway. I could fix this if I were willing to make my code a little uglier. Second, uses of fromRational and fromInteger **must** be constants, because the code assumes they are. It was pointed out by Luke Palmer that I can define a function of the correct type (Num a => a -> a) by using the Show superclass to convert a value to a string, then read it as an Integer, do all sorts of things to it, and then convert the result back to the type a by using fromInteger. Doing so will give a derivative of zero, even if your function really has a different derivative.

A more nefarious example is this:

f x = if x == 1 then 1 else x^2

This is the same function as f x = x^2. However, asking for the derivative at x = 1 will return 0, because the function returns the constant 1, whose first derivative is 0. In particular, the technique runs into problems right on the boundaries of intervals where the function is defined piecewise. It doesn’t appear to me that there’s a particularly good way of dealing with this, except to process the source code and modify if statements to refuse to calculate derivatives at those locations. That’s far beyond the scope of a little playing with overloading, so I have no intention of doing it.

### Extending to Other Cool Stuff

If we just got functions computing values, this wouldn’t be very visual. In the effort to avoid introducing GUI libraries, I’ll play the old trick of graphing functions sideways in ASCII art. Let’s look at the second function in my example above… the weird one with square roots, exponentials, and inverse hyperbolic sines.

Here’s a test program.

import AD graph f ymin ystep xs = mapM_ (putStrLn . flip replicate '*' . round . (/ ystep) . (subtract ymin)) (map f xs) main = do let g x = 1 / sqrt (exp x - asinh x) graph g 0 (1/75) [0.0, 0.1 .. 4.0] putStrLn "------------------------------------" graph (diff g) (-0.5) (1/175) [0.0, 0.1 .. 4.0] putStrLn "------------------------------------" graph (diff (diff g)) (-0.75) (1/75) [0.0, 0.1 .. 4.0]

That just defines our function g again, and graphs it and its first two derivatives using ASCII art. Here’s the result.

*************************************************************************** *************************************************************************** ************************************************************************** ************************************************************************* *********************************************************************** ********************************************************************* ******************************************************************* **************************************************************** ************************************************************* ********************************************************** ******************************************************* **************************************************** ************************************************* *********************************************** ******************************************** ***************************************** *************************************** ************************************* *********************************** ********************************* ******************************* ***************************** *************************** ************************** ************************ *********************** ********************** ********************* ******************** ******************* ****************** ***************** **************** *************** ************** ************* ************* ************ *********** *********** ********** ------------------------------------ **************************************************************************************** ****************************************************************************** ******************************************************************* ******************************************************** ********************************************* *********************************** *************************** ********************** ****************** ***************** ***************** ****************** ******************** *********************** ************************** ****************************** ********************************* ************************************* **************************************** ******************************************* ********************************************** ************************************************ *************************************************** ***************************************************** ******************************************************* ********************************************************* *********************************************************** ************************************************************* ************************************************************** **************************************************************** ***************************************************************** ******************************************************************* ******************************************************************** ********************************************************************* ********************************************************************** *********************************************************************** ************************************************************************ ************************************************************************* ************************************************************************** ************************************************************************** *************************************************************************** ------------------------------------ ******************* ************ ******** ******** ************ ****************** *************************** ************************************* ********************************************** ****************************************************** ************************************************************ **************************************************************** ******************************************************************* ********************************************************************* ********************************************************************** *********************************************************************** *********************************************************************** ********************************************************************** ********************************************************************** ********************************************************************* ******************************************************************** ******************************************************************* ******************************************************************* ****************************************************************** ***************************************************************** ***************************************************************** **************************************************************** *************************************************************** *************************************************************** ************************************************************** ************************************************************** ************************************************************** ************************************************************* ************************************************************* ************************************************************* ************************************************************ ************************************************************ ************************************************************ ************************************************************ *********************************************************** ***********************************************************

One solution would be to not export the data constructor for the AD type. Then the only things you can do with values of type AD are to use the provided functions anyway.

Perhaps you’ll find this interesting. It just the same as Jerzy described in his paper, but it provides all the derivatives, not just the first. http://augustss.blogspot.com/2007/04/overloading-haskell-numbers-part-2.html

The code is in Hackage, in a package called numbers.

Hi! Very nice code – it’s great to see an elegant language (Haskell) being used in the most elegant area of mathematics (calculus). ( Btw, what license is the code under – is it P.D.? )

– Andy

Andy, sure. Do whatever you like with it. Please pay attention to the warnings that it doesn’t always work.

But what if you define f using diff, then apply diff to f?

Turns out you can get the wrong answer, given the way you’ve

written the code. Jeff Siskind and I went nuts on this issue,

concluding that although diff is referentially transparent,

it cannot be implemented in pure Haskell: you need to use

some dirty trick. We exhibit a bunch of such dirty tricks

in Scheme, but can’t do it in Haskell.

One big question is whether some new pure mechanism could

be added that would allow this in Haskell. Existential

types might do the trick, but so far I have not figured out

how…

(details on my publications page, HOSC paper and IFL paper)

Barak,

That’s interesting. Do you have an easy example of how this can give incorrect results?

Sure! This gives the wrong answer:

diff (\x -> (diff (x*) 2)) 1

(0 instead of 1) and this fails to type check:

diff (\x -> x*(diff (x*) 2)) 1

If you do some unpleasant manual coercion you can get these to give you the right answers,

diff (\x -> (diff ((AD x 0)*) 2)) 1

diff (\x -> x*(diff ((AD x 0)*) 2)) 1

As far as we can tell, that cannot be automated in Haskell.

PS Although insertion of the lift invocations (lift x = AD x 0) cannot be automated, detection of missing invocations can be. This uses branding, see http://thread.gmane.org/gmane.comp.lang.haskell.cafe/22308/ for details.

PPS I am currently putting together the best of the forward-mode AD Haskell I’ve seen to try to make an actual usable version. Of course, this is all basically an exercise before the hard but important part: REVERSE-MODE AD!