Previous Up Next

Chapter 3  Automated Transformations

This chapter will present an overview with examples covering the most common techniques applied in transformation of functional programs1.

We will start out with partial evaluation2, a basic technique for optimising programs, on which most other techniques can build. It exploits information that allows (partial) evaluation of programs before input is available. The implementation of our system mainly focuses on this technique, implementing it in a very general manner.

This will be followed by a presentation of a general strategy for transforming programs which are described by recursive equations: the fold/unfold method. Some of its special cases like deforestation (used for eliminating redundant intermediate datastructures) and the tupling transformation (can lead to nonlinear speedups by factoring out common computations in recursive calls) will be treated separately. Transformations of the fold/unfold type are of special interest to us: we will consider them in a later chapter on correctness and also give hints concerning their possible implementation in the corresponding chapter.

Finally, we will consider the Bird/Meertens Formalism, which exploits the fact that many functional programs are constructed out of specific building blocks. Mathematical properties of these building blocks allow us to rewrite their combinations to simpler forms, thus eliminating unnecessary computation.

3.1  Partial evaluation

3.1.1  Example

It is probably best to explain this method by example:


λx.
  (rec fy.
    y (case x of
        inl(l).inl(l)
        inr(r).inr(snd((f,r)))))
    (case (inl(λx.x)) of
      inl(l).l
      inr(r).fst(r))

It is fairly difficult to figure out what this function (lambda abstraction) does by just taking a short look at it: it uses higher-order functions, recursive functions, case statements and manipulation of pairs to compute its result.

In fact, the meaning of this program is relatively simple, i.e. can be computed by another function that does not need this many computation steps:

λx.x

It is just the identity function! How can we derive this result? First of all, it is interesting to note that the program is a value: it is a lambda abstraction. Applying the usual evaluation rules3 to it will not change the result. This is not the way in which we can improve it.

Therefore, to be able to optimise, we need rules that are able to “step” into the body of the function4 and enable us to find out, which of its parts could be evaluated statically: another term used for this is binding-time analysis. Having identified the parts that can be evaluated before the program gets its full input, we can apply simplification rules to those statically computable parts and specialise the program.

Indeed, if we take a second look at the program, we see that the second case statement can be evaluated immediately: the value of the argument is ’inl(λx.x)’. Therefore, considering the first case arm, the meaning of the whole second case statement is ’λx.x’.

This function is passed to the recursive function (as a higher-order function) and bound to ’y’ in its body. There it is applied to the meaning of another case statement. This time it is a bit more tricky to find out, what this case statement means, because we do not know ’x’, the value of sum type on which we perform the case switch, at compile time!

The first case arm can obviously not be partially evaluated, but the second can: the pair inside is known at compile time, and we see that the second element is requested. Therefore, the second case arm can be evaluated to ’inr(r)’. This is an interesting point. The partially evaluated case statement looks now as follows:


case x of
  inl(l).inl(l)
  inr(r).inr(r)

Whatever ’x’ is (’inl(l)’ or ’inr(r)’), the same will be returned again. Thus, we can simplify this case statement to just ’x’ alone without even considering its concrete value. Applying the higher-order function to it that we mentioned earlier (it is the identity function), we get ’x’ again — and have fully derived the identity function for the whole function as required. Each step in this derivation was actually fairly easy to follow, though it is somewhat tedious to do this manually.

3.1.2  Properties of partial evaluation

Partial evaluation usually does not lead to dramatic speedups due to various reasons: first of all, hardly any programmer would seriously write code as in the example above. Furthermore, none of the partial evaluation steps presented above eliminates more than a constant overhead. This means that speedups are typically of linear nature.

Still, this method is very important for transformation systems in general: many advanced transformations result in programs which can be improved by partial evaluation, e.g. the instantiation rule that is used by the fold/unfold strategy5.

Since advanced transformations usually involve some kind of search process, interleaving some of their applications with partial evaluation may significantly reduce the size of the search space. This was the main intention behind implementing a general purpose partial evaluator in this project.

A point which we have not touched in the whole discussion of partial evaluation so far is correctness. The reader may wonder, why this is an issue, as there did not seem to be any dangerous transformations involved in our example above: all of them clearly preserved the meaning of the program.

Unfortunately, the semantic properties of our language, it is a strict one, can lead to situations where this does not hold in general. Additionally, most real implementations of functional languages allow side effects like I/O or exceptions, which makes this issue even more complicated. Because of the importance of correctness of transformations, we will need to discuss this topic in more detail by examples in chapter 4.

A computational formalisation of partial evaluation as given in [] and [] seems to provide for a very strong basis to improve partial evaluation techniques in the presence of problems as mentioned above. Our implementation6 takes up this elegant approach and shows that this yields a most declarative view of partial evaluation in our logic implementation language LambdaProlog.

3.2  Fold/Unfold method

In their seminal paper [], Burstall and Darlington developed a general strategy, often referred to as fold/unfold method, for transforming functional programs that are represented as recursive equations. Many researches subsequently took up this approach and refined it in several variants.

Due to space restrictions, we will not give any fully worked example transformations7 of this method or of any of its variants. In later chapters we should, however, consider specific patterns that can arise during transformations, where we show problems concerning correctness or give hints how to implement them using the developed transformation framework.

Here an example of recursive equations that specify how to calculate the sum of elements in a list (’[ ]’ stands for the empty list, ’::’ is the list constructor):


sum [ ] = 0
sum (h::t) = h + sum t

The fold/unfold strategy consists of a set of rules that, when applied wisely, allows a very large class of optimising transformations on such recursive equations. As the word “wisely” indicates, however, the degree of automation is not always as high as one might hope: some examples require the discovery of so-called Eureka8-steps, steps which cannot be derived using the rule set, but rather require insight into the problem that may not be easily automated.

Here is a short overview of the rules (see [] for their detailed properties) - it also mentions the correspondence of the rules between recursive equations and our language representation.

Definition Rule

This rule allows adding new recursive equations to a program. In our language this would correspond to adding new functions. This is necessary, because some optimisations might require treatment of a computation in a separate function.

Unfolding Rule

Unfolding an expression means replacing it with the right-hand side of an equation whose left-hand side matches this expression. Variables in the replaced expression are substituted for values which were captured during matching. In our representation of the language this would correspond to function application: substituting the formal parameter of a function for its argument in the function body. This results in a new equation (a new function term) in which the expression is replaced. The important aspect to be careful about is that unfolding is only sound with respect to our strict semantics when evaluating the parameters of the function does not yield any side effects like non-termination, I/O, etc.9 Not respecting this may, for example, transform non-terminating programs to terminating ones.

Folding Rule

This is the opposite of the unfolding rule: we match an expression against the right-hand side of an equation and replace it with the left-hand side of the same, again substituting values that were bound in the matching process for the variables in the replaced expression. This means with respect to our language: we replace an expression with a call to a function whose body (right-hand side of a recursive equation) matches this same expression. This should in most cases preserve the meaning of the equation (the function term) in which the expression was replaced. Here we have a dual problem to the one of the unfolding rule: as we will see in our chapter on correctness, folding may transform terminating programs to non-terminating ones in certain cases.

Instantiation Rule

Application of the instantiation rule introduces an instance of an already existing equation. A small example: taking our sum-example from above and given the following rather redundant definition:

sum’ l = sum l 

we can apply the instantiation rule to sum', because we know all the instances of l, which is of type list. We can instantiate twice, once using ’[ ]’ and a second time using the cons-operation ’::’.


sum’ [ ] = sum [ ]
sum’ (h::t) = sum (h::t)

It should be noted that the right-hand sides of the new equations have more “information” now: they “know” what kind of argument is passed to the sum function, which could be exploited using a recursive unfolding step and partial evaluation followed by a folding step to simplify the function in such a way that the redundant function call to ’sum’ is completely eliminated. The resulting function would be equivalent to ’sum’ itself.

The instantiation rule essentially works on sum types10. Instantiating other types of values (e.g. pairs) alone does not offer information to the function body. The only way that choice can enter the system (information corresponds to choice) is through case statements that decide, which of two choices to take (depending on a sum type). Of course, pairs may also contain elements that have sum type. In this case, the instantiation rule may try all possible ways to instantiate the pair that contains them (recursively as required with nested types). Here an example, how the instantiation rule11 would work in our language. We assume that f is a function of type (τ12)→τ3 and x is of type τ12:


λxf.f x

We specialise the argument to f within two case arms:


λxf.
  case x of
    inl(l).f (inl(l))
    inr(r).f (inr(r))

This might allow a partial evaluator to proceed by applying the function f to the specialised value.

Where-Abstraction Rule

The where-abstraction rule introduces definitions which are local to a given recursive equation. It is actually not so much different from the definition rule for our purposes, because our language does not have any extra constructs to support this rule, anyway: lambda abstraction and function application can be used to achieve the same effect of binding definitions. However, we can (and will) extend the language with a so-called let-construct, which can come handy in some occasions as we will see in chapter 4. We leave explanation of further details to [].

Algebraic Replacement Rule

This rule allows exploiting equivalences of expressions to rewrite them into each other. For example, if we know that the program handles natural numbers and if we have defined a multiplication function for it, we can make use of the mathematical property of it being commutative: we could swap around parameters to the multiplication function, which might allow other transformation rules to match.

It should be noted that this rule is potentially very computationally intensive, but applying it in full generality it could allow the most powerful transformations, exploiting all kinds of equivalences.

3.2.1  Deforestation

A specific instance of the fold/unfold-strategy, namely deforestation, was developed by Phil Wadler in his often cited paper []. It addresses a problem associated with high level programming in functional languages: computations are “glued” together via intermediate datastructures. Yet, these intermediate datastructures do not necessarily represent parts of the result itself. This can be best shown with an example:

 sum (map sqrt l) 

’map’ takes a function (here: ’sqrt’, the square root function) and applies it to every element in list ’l’, which results in a new list that contains all these square roots. ’sum’ takes a list and sums up all elements. This is a very convenient way of writing code, much clearer than this version:


sum_sqrts [ ] = 0
sum_sqrts (h::t) = sqrt h + sum_sqrts t

However, the last version is more efficient instead: whereas ’map’ produces (allocates) an intermediate list just to remove (deallocate) it immediately after the computation, ’sum_sqrts’ does not. Allocating and deallocating memory is usually a very time consuming task and may also raise the memory requirements of the program. Therefore, we would like to have transformations that eliminate those intermediate lists (and possibly other kinds of intermediate datastructures, e.g. trees).

Wadler gives a set of rewrite rules for the deforestation algorithm and proves that transformations always terminate and that the resulting program is guaranteed to be at least as fast as the input program (= no performance loss). Still, for the deforestation algorithm to work, the program and its terms must fulfill certain requirements: the functions out of which a term is constructed must be in a special (“treeless”) form12, which also requires that they be linear13.

In the general case, Wadler’s method does not scale up to higher-order functions, which is a significant shortcoming in languages whose expressiveness heavily depends on them. Even though allocating datastructures can be computationally expensive, this can be done with only a constant factor of overhead. Thus, the performance gains of deforestation are typically only linear in nature.

Several related techniques have come up over time, which try to improve the method in different aspects: e.g. [], who explains the advantage of focusing on so-called catamorphisms14, a special pattern of recursion, which does not require so much search when looking for a point to apply the folding-rule.

3.2.2  Tupling transformation

The tupling transformation is another specialised technique derived from the fold/unfold-strategy. It allows introduction of accumulators into functional programs, which can have a tremendous impact on performance. Here a typical example15, where it leads to such improvements:


  fib n =
    if n <= 1 then 1
    else fib (n-1) + fib (n-2)

As the name indicates, this function computes the Fibonacci numbers. As declarative as this definition is, as inefficient is its execution as a program. The time complexity of this program is exponential due to the two recursive calls that are done in its body. But taking a closer look at this definition, we see that a significant part of this computation is redundant: ’fib (n-1)’ requires the computation ’fib (n-2)’ already so there is no need to recompute the value of the latter in the second call.

Introducing a tuple which accumulates the last two values (more is not needed), we can save the call to ’fib (n-2)’. This effectively eliminates this call from the program: instead of branching into two calls in each of the recursive steps, which causes the exponential time behaviour, there will be only one, which makes the function run in linear time. This is a significant improvement over the first algorithm:


fib2 n =
  if n <= 1 then 1 else aux (1,2) 2
  where aux (n2,n1) m = if n == m then n1
                        else aux (n1, n2+n1) (m+1)

As was mentioned, there is only one recursive call left. But we also see that this algorithm is much more complicated to understand than the first one. This demonstrates that more efficient algorithms can indeed be difficult to derive.

The general application pattern of the tupling strategy starts out trying to discover computations which have to be computed repeatedly in different recursive calls or at different levels of recursion. It factors those out and remembers them in an accumulator which is passed from each recursive call to the next one. Thus, the expression has to be computed only once, which can lead to considerable performance improvements, especially if this means that one or more recursive calls can be eliminated.

People who think that the efficient implementation of the Fibonacci function was easy to derive for them and is optimal might consider the example in the appendix on page ??. It is truly a monster of a function: if it was not for the name, hardly any programmer would have the faintest clue of what it does without running it. In fact, this function computes the Fibonacci numbers again exponentially faster than the one we have just derived! — It runs in logarithmic time! We will not go into the details of its derivation16, but this example should clearly demonstrate the very promising perspectives of functional program transformation: they even allow super-exponential speedups. Of course, much more research is necessary to make this power applicable to real world programs.

3.3  Bird/Meertens formalism

In contrast to the fold/unfold-strategy, the Bird/Meertens formalism17 has a much higher degree of automation: instead of having to discover important properties in Eureka-steps, this formalism exploits known mathematical equalities that hold for often used constructs, in particular for common higher-order functions18.

A simple example of this kind would be applying the list operation ’map’ twice with two different functions (’∘’ stands for function composition):

 map g ∘ map f 

This can be optimised to:

 map (gf) 

which effectively eliminates an intermediate list. Many more such equations exist, and additional ones can often be defined for specific applications. Using tools for automatic verification or even generation of such rules would allow the system to become more powerful over time while still retaining a high degree of automation.

One minor drawback of this method without any further extensions is that its effectiveness depends on the size of its rule set. This raises questions about how to control search: many rules require a significant effort in pattern matching. To be applicable to real world programs, clever heuristics for rule selection would have to be developed to overcome this problem. Another shortcoming is that it is not so generic by nature: the knowledge about transformations lies within the rule set. If the programmer does not make use of the known rule patterns, this method is unlikely to lead to significant improvements, whereas some members of the fold/unfold-family are often applicable to programs which involve datastructures that have never been seen before. Current research in this field is improving this situation, and there are indeed relations to offsprings of deforestation (e.g. fusion techniques as in []), where higher-order functions related to list operations like ’fold’ are generalised to arbitrary datastructures automatically: this makes known equalities available in many general cases.


1
Readers with knowledge of German might want to take a look at [], who not only provides for a good introduction to functional programming in general, but also gives an overview of the most common transformation techniques with many examples.
2
See [] for a thorough introduction to partial evaluation.
3
See page ?? for the evaluation rules (structural operational semantics).
4
As we will see in chapter 6, the language we use to implement transformations (LambdaProlog) has very convenient builtin functionality to support descending into bindings.
5
See section 3.2.
6
See chapter 6.
7
A very complete discussion of fold/unfold transformations and their variants with many and impressive examples can be found in []. [], too, has a fairly detailed chapter on it.
8
“Eureka!” was the word that the Greek philosopher Archimedes supposedly cried out when he discovered the relation between the important physical concepts of weight and density.
9
We will learn more about this in chapter 4.
10
See 2.1.3 again for their definition.
11
We will call the instantiation rule specialisation rule in our language so as to prevent confusion with instantiating existentially quantified variables when we do transformations on programs in LambdaProlog.
12
The somewhat lengthy exact definition of this term can be found in [].
13
A linear function does not use its parameter more than once, considering different case arms separately. We will see in chapter 6 that our partial evaluator can indeed simplify applications of nonlinear functions without causing performance penalties and does not even have to check the functions for linearity to do this: our approach seems to cope very well with such cases, too.
14
A good introduction to the various uses of such and related constructs is given in [].
15
These examples are given in the pure and lazy functional programming language Haskell, because it resembles the definition of recursive equations most. This allows readers to try them out. For more information on Haskell, see the URL: http://www.haskell.org
16
The impressed reader may learn this trick, which could indeed be automated using tupling techniques, in [].
17
A broad overview with many examples and a very good bibliography can be found in [].
18
E.g. list operations like ’map’, ’fold_left’, ’fold_right’, ’scan_left’, etc.

Copyright   ©  2000 Markus Mottl ⟨markus.mottl@gmail.com⟩
Previous Up Next