Specialization as an optimization strategy in functional programming languages

In functional programming languages such as Haskell, we put a great amount of emphasis on splitting our programs into small reusable parts. As a result, we end up writing functions which, if taken literally, would do a great deal of duplicate or useless work when composed with other functions. Consider the following set of functions as a primary example;

f :: Maybe Int -> Int
f x = case x of
    Just a  -> a
    Nothing -> 0

g :: Int
g = f (Just 5)

h :: Int
h = f Nothing

In functions g and h we re-use the function f because it’s succinct to do so – however, at the performance penalty of a pattern match. Let’s imagine that f would be too large to inline directly. In such a situation we could naively assume that there is nothing else we can do to optimize this snippet of code, but that’s not actually the case.

Here we can introduce our first kind of specialization; specialization on arguments (or, introducing constraints). Consider what happens if we break f up into parts – one for each branch of the switch, like so;

f_Just :: Maybe Int -> Int
f_Just (Just a) = a

f_Nothing :: Int
f_Nothing = 0

f :: Maybe Int -> Int
f x = case x of
    Just a  -> f_Just a
    Nothing -> f_Nothing

Naively this doesn’t look like it does anything at all, but let’s consider for a moment what has happened; Now f is small enough to inline because the right hand side of each branch in the case expression (the large part we couldn’t inline) isn’t part of f anymore. After an inline pass we would be left with the following code for g and h;

g :: Int
g = f_Just (Just 5)

h :: Int
h = f_Nothing

We have managed to prevent the original pattern match from occurring by moving it to compile time, thus making our code more performant at runtime without forcing us to duplicate the original code using excessive inlining.

This method of specializing on arguments can be applied in reverse to specialize on return types and reduce the amount of boxing/unboxing which need occur. You may already know that an Int in Haskell is a box around the constructor for an unboxed integer I#. Primitive operations are only exposed for dealing with integers which are unboxed, thus for simplicity let’s imagine the (+) operator is really a function which breaks apart the outer boxing and uses (+#) (the unboxed primitive equivalent) followed by a prompt reboxing.

(+) :: Int -> Int -> Int
(I# a) + (I# b) = I# (a +# b)

This has some pretty nasty performance characteristics – a chain of addition would repeatedly construct more and more boxes, only to open them again a moment later! We can do better using specialization. By pulling out inner applications to constructors into separate functions, we can make sure we only inline operations which would be eliminated as duplicate effort without having to inline the full expression body. Consider;

-- Specialization of return

plus_Retn :: Int -> Int -> Int#
plus_Retn (I# a) (I# b) = a +# b

(+) :: Int -> Int -> Int
a + b = I# (plus_Retn a b)

-- Specialization of arguments

plus_Retn2 :: Int# -> Int# -> Int#
plus_Retn2 a b = a +# b

plus_Retn1 :: Int# -> Int -> Int#
plus_Retn1 a b = case b of { I# b' -> plus_Retn2 a b' }

plus_Retn :: Int -> Int -> Int#
plus_Retn a b = case a of { I# a' -> plus_Retn1 a' b }

(+) :: Int -> Int -> Int
a + b = I# (plus_Retn a b)

Now the entire body for (+) can be inlined directly into any usage site. In consequent passes plus_Retn, plus_Retn1 and plus_Retn2 will also be inlined.

f :: Int -> Int
f x = x + x + x

f :: Int -> Int
f x = case x of { I# z -> I# (plus_Retn2 (plus_Retn2 z z) z) }

We can also inline plus_Retn2, eventually reaching the most optimal implementation of the function f – all without the programmer needing to know a single thing about what’s going on behind the scenes! Complex boxing and unboxing scenarios can be systematically reduced to produce optimal code after inlining, all without having to inline the main body of the original expression.

f :: Int -> Int
f (I# x) = I# (x +# x +# x)

Nowadays you can safely expect that your Haskell compiler will already perform both of these excellent optimizations on your behalf.

Leave a comment