Imagine the following scenario: a customer orders a mission-critical and efficiency-dependent system from your software company. You develop and test it with your compiler that makes use of advanced transformations to optimise it: everything works fine. Your customer receives your (untransformed1) sources, compiles the program with a compiler that is proven to be conformant to the specification of the language, launches the mission — and suddenly, a multi-million dollar satellite crashes due to non-terminating2 execution of a critical software component: your system.
How can it happen that termination behaviour changes if our transformations only do what the evaluation rules of the language tell them? Here is an example:
fst (*, f x) |
At first sight it may seem that we can simplify this expression to ’*’
only, as the semantics of ’fst’ and pairs would indicate. However,
what about the application of the unknown function ’f’ to the unknown
parameter ’x’? What happens if it does not terminate? Due to our strict
semantics, which will evaluate all components of the pair before applying
fst, evaluation of the whole expression would not terminate. But
if we simplify, we change this behaviour: the program will terminate then!
As we have mentioned earlier, strict semantics makes it more difficult to apply transformations of this kind without changing the meaning of programs, and computational effects like non-termination are definitely part of it.
First of all, it should be noted that termination issues are among the trickiest problems in computer science: it can be shown that there is no algorithm that can completely and consistently decide for all programs whether they terminate or not.
Because we can only be sure about some cases, whether evaluation of a term terminates or not, we have to assume the worst-case if we are not sure: this implies that there will never be a transformation system that can exploit all possibilities to simplify a program. In uncertain cases the system would have to refrain from applying a transformation, because it cannot prove that this will not change the meaning of the program — even if this were perfectly safe.
However, as we will see, not all is lost: we can change the way in which we improve the code by transforming it into other representations that give more opportunities of simplification without destroying the effect behaviour of the program.
In the following subsections we will present more examples of problems associated with the correctness of various transformations and work out ways to get around them.
In our initial example we have already shown a case where naive application of partial evaluation does not yield the intended result. What can we do to improve the obvious inefficiency? As was mentioned in previous chapters, call-by-value evaluation does not allow transformations as straightforward as call-by-name, but it guarantees another property that we can exploit: variables always stand for reduced terms — that is what makes a language eager (strict). This means that in this example
fst (*, v) |
we can be absolutely sure that ’v’ is bound to a value. This makes it perfectly safe to apply the obvious simplification. The only thing we need to make sure is that the term that possibly causes non-termination is still evaluated just for the purpose of preserving termination behaviour.
There are two ways in which we can achieve this. The first one does not require any change to the language:
(λv.fst (*, v)) (f x) |
We just put the whole term that may contain a possibly non-terminating computation into a lambda abstraction, lift the computation out of the potentially diverging term and introduce it as a parameter of an application to this function. Because of strict evaluation, this parameter is guaranteed to be evaluated before the body of the function is consulted. Now we can make use of our simplification for terms and arrive at:
(λv. *) (f x) |
This prevents us from changing the meaning of the program during transformation. However, we were required to add an unnamed function (a lambda abstraction). If we want to eliminate the penalty of having to evaluate the function term (lambda abstraction) before applying it3, the second choice we have is to introduce a new construct to the language: something similar to function application, but which does not need to evaluate function terms. Indeed, most functional languages feature a let-construct that allows the programmer to evaluate a term and substitute it within an expression. The last example would then look as follows:
let v = f x in * |
We will describe this and related constructs that we use to lift various computations out of terms in more detail in chapter 6. There we will also learn about formal aspects of such transformations.
In the previous chapter we have described the various rules of the fold/unfold strategy and mentioned that two of them (the folding rule and the unfolding rule) may cause problems in certain cases. If we take a closer look at unfolding, this rule is actually just a special case of partial evaluation: if we know the definition of a function (its body)4 and given its argument, we can apply the function — but only in the case when evaluation of the parameter terminates. For example:
(λx. *) ((rec f.x. f x) *) |
We see that the parameter to the right is constructed by a call to a recursive function that can never terminate. Therefore, it is not safe to apply the lambda abstraction to the left of this term, because the lambda abstraction does not use its parameter: it always returns the unit-value. This would again remove the effect of non-termination, which is not the intention of our transformation system. Of course, we can again use the trick mentioned in the previous section on partial evaluation and lift out potential non-termination, which might allow more simplifications. In the example above this would result in:
let v = (rec f.x. f x) * in * |
The following example shows a more interesting case (we assume that ’suspicious_fun’ cannot be proven to terminate):
(λx.
case x of
inl(l).inl(x)
inr(r).inr(*))
(inl (suspicious_fun *))
|
This could be transformed to:
let v = suspicious_fun * in
(λx.
case x of
inl(l).inl(x)
inr(r).inr(*))
(inl v)
|
and then simplified (e.g. using our partial evaluator) to
let v = suspicious_fun * in inl(v) |
which is a significant improvement. For completeness it should be mentioned that we can (in some cases) inline such lifted computations again5. Here is the completely simplified result:
inl(suspicious_fun *) |
Folding, calling a function that evaluates an expression instead of evaluating the expression directly, generally works without problems. There is, however, a specific and important case where this does not hold:
In many variants of the fold/unfold strategy the folding step is used to “tie a recursive knot”. This means that the folding rule is used to introduce a recursive call to the function containing the expression to be replaced! The rationale behind this step is that we sometimes want to “invent” a new recursive function that computes the same meaning as an initial one, but, for example, eliminates intermediate datastructures6.
When applied in a naive way, this can transform terminating programs to non-terminating ones. In the following example we assume that we want to eliminate intermediate datastructures in some function ’f’ by transforming it to ’g’. In an intermediate step we end up with the following definition, which treats each case of its input parameter (a recursive sum type7) explicitly:
rec g.x.
case x of
inl(l). f (inl(l))
inr(r). f (inr(r))
|
As should be obvious, this function ’g’ is semantically equivalent to ’f’: it just already takes one choice (a case statement) before passing on the parameter to ’f’. One could argue now: if it is semantically equivalent, why not call this function recursively now instead of relying on the initial definition of ’f’? This step in which we “tie the recursive knot” would look as follows (the function calls to ’f’ are replaced with calls to ’g’):
rec g.x.
case x of
inl(l). g (inl(l))
inr(r). g (inr(r))
|
It should be clear that we have rather tied a recursive gallows rope rather than a recursive knot: the function does not terminate for any input! Where is the hidden catch in our assumptions?
During this work it turned out that certain preconditions must be met
before we can safely undertake a recursive folding step: the parameter
to the function call must be free from explicit values of sum type
(values constructed with ’inl’ and ’inr’ in our language),
including nested datastructures that contain such values. Only variables
of sum type are acceptable. This is not the case in the upper example: the
argument to ’g’ is ’inl(l)’ and ’inr(r)’ respectively.
If this requirement is not fulfilled, this can indicate that the “original” function definition, which was called before the transformation, still handles information that the new definition cannot yet handle recursively8.
It may be, however, that it is still safe to try folding even if there are explicit values of sum type in the parameter term. This requires us to extend the criterion above to the following: the recursive function application that we obtain by “tieing the recursive knot” and which contains explicit values of sum type must not be able to “reach itself” when called. This means that in the next cycle of the recursion, there must not be any evaluation path that could demand evaluation of the recursive knot again. If this is violated, a function parameter that happens to match this explicit value of sum type cannot be handled any more. Evaluation could stay in a loop that ranges from the outermost expression of the function body to the point where the folding rule was wrongly applied.
We could use the partial evaluator to find out easily whether this is the case: we “tie the recursive knot”, unfold this recursive call (= apply it) and let the partial evaluator try to simplify the resulting expression. If this yields another expression that still contains the “recursive knot”, the expression that was introduced by folding, then the application of the folding rule was not guaranteed to be invalid: loops may occur.
Since the power of the partial evaluator only depends on the ability of our system to infer non-termination (or other side effects — see the next section), we have limited the question of when folding is appropriate to this generally undecidable problem. This means that there will always be safe cases which we will have to reject for the sake of consistently preserving termination behaviour (and other effects).
Unfortunately, we do not (yet) have a formal proof of all the claims above, but the verbose explanation seems reasonable, and future work could try to address this question.
One detail should be noted: it may well happen that loops occur after folding even when the function parameter to the recursive call did not contain an explicit value of sum type. This loop, however, would then necessarily also happen without the transformation, which means that the initial program was not completely defined for all input. In other terms: folding is justified and might optimise a few cases for which the program is defined, and it would not change termination behaviour in any way.
Summing up the correctness issues concerning the folding rule:
As we have mentioned in the introduction to this chapter, transformations that “improve” termination behaviour are not acceptable: first of all, it is difficult to maintain a program when the transformed (and possibly not “human friendly”) code does not behave the same way as the untransformed version does: for example, it does not terminate on the same input values on which the transformed code terminates with a wrong result. Secondly (and more important): non-termination is not the only kind of side effect. Most strict functional languages contain “impurities” like I/O and exceptions (side effects that lead to observable behaviour). If we do transformations, it would be fatal if one of those effects suddenly vanished, were duplicated or if their order changed, thus changing the meaning of the program. For example, let us assume that our language had strings and an I/O-function for printing them, this function returning the unit value after execution. We consider this example:
fst (*, print "Hello World!") |
Although ’print’ always terminates, we cannot just simplify this
expression according to the semantics of ’fst’: if we did so, the
print statement would never be executed, which is clearly against the
meaning of the program. Therefore, if we take a more general solution
to the problem of effects, we must completely disallow transformations
that manipulate the order in which they occur and how often they do.
The approach mentioned earlier, namely lifting computations, works here
equally well: this solution seems to be the most general way to correctly
handle effects of all kinds.
This is also a good time to mention that the semantics of the language we have specified does not enforce any evaluation order: it does not have to, because it is pure: there are e.g. no I/O-statements. Order of evaluation is completely irrelevant in purely functional languages, which is the reason why they make equational reasoning so easy9.
For example, in the following piece of code it obviously makes a big difference whether the first element of the pair is evaluated first or the second:
(print "Hello ", print "World!") |
If we wanted to specify evaluation order in the operational semantics, we would have to make this explicit by passing around a “state of computation”. For the specific case of pairs, this might look as follows10:
|
This would specify a left-to-right evaluation order: the state σ, which is associated to each expression within angle brackets, gets passed from evaluation step to evaluation step, possibly changing as it does so. If somebody wanted to extend our language with impure features, maybe so as to apply the implementation of our partial evaluator to more “realistic” languages, they should bear this (importance of evaluation order) in mind. The formal semantics of the language would need to be changed to capture the notions of state and evaluation order.
So far it might seem that there are only theoretical questions concerning the validity of transformations. Unfortunately, a very important aspect seems to be lacking in the literature: the impact of transformations on the executability of programs on “modern” computers. This is a point which, as we will see, may have consequences as drastic as transformations that do not preserve the “mathematical” meaning of programs, including the formalisation of effect behaviour.
So as not to confuse the reader with too complicated a program in our spartan language, the following example is written in OCaml11, an ML–dialect (ML is strict):
let rec length1_aux accu list =
match list with
[] -> accu
| head::tail -> length1_aux (accu + 1) tail
let length1 list = length1_aux 0 list
let rec length2 list =
match list with
[] -> 0
| head::tail -> 1 + length2 tail
|
This example does not stand for a real case, it is somewhat simplified: we assume that ’length2’ is the result of an optimising transformation from ’length1’. It does in fact not differ in efficiency from a theoretic point of view. In reality problems as we will describe further below might involve more than one function, for example in a fusion transformation.
Function ’length1’ makes use of an auxiliary function which computes the number of elements in a list in a tail-recursive way. This allows good compilers to transform the recursive function call to a loop instead: this fits much better to the execution model of von Neumann architectures. If it cannot do so, it would have to throw the current state of computation for every recursive call on a stack whose space is bounded (indeed, normally bounded quite tightly).
Unfortunately, if this stack space is exhausted, the program will crash: a very severe form of failure. This means that a transformation that takes one or several tail-recursive functions, optimises them and produces an efficient version that is not tail-recursive any more (e.g. ’length2’ in the above example) changes the behaviour of the program as observable on a real machine drastically: running the example with a test list of length 50,000 on a current machine using the current compiler allows both version to terminate correctly. Doubling the size of the list, however, crashes ’length2’ only, while the original version terminates correctly. A list of 100,000 elements can be fairly little for a real-world task. The reader having new transformations in mind should be careful to avoid ones that lead to such behaviour.