I quite enjoyed this paper by Jennifer Hackett and Graham Hutton. It's interesting that it's taken us more than 50 years of lazy programming to come to this simple idea — on reading, I immediately kicked myself and wondered why I hadn't thought of that.
The difference between call-by-need and call-by-value is conceptually simple. In call-by-value (cbv), we evaluate function arguments before we call the function. In call-by-need (cbn, implemented using lazy evaluation) we evaluate them when we discover they're actually required, then we remember the results in case we need them again. What it means for a computation to be "required" can get fiddly, but it's easy to think about a function that's returning a single value. We only compute the things we need to obtain that value. So if we're adding together two numbers to obtain the result, we need to evaluate both numbers. If we are doing some conditional control flow, we need to evaluate the value of the condition. And if we're calling another function, we need to know what function to call and we need to compute the value of that function call in turn.
The key idea here isn't just what is being evaluated, but more importantly what isn't being evaluated. If we discover we don't need something, we don't evaluate it. As a result, cbn programs terminate more often than cbv programs – to see this, imagine calling a function that throws its argument away with an argument that loops forever. In fact, we can prove that cbn represents a normalizing strategy for evaluation — if there's a way to obtain a result, cbn will do so.
Efficiency
This gives us an intuition that cbn is in some sense more efficient than cbv, in that it can take a finite number of computational steps in a situation where cbv requires an infinite number. Indeed if we just count computational steps then cbn takes the minimum possible number of computational steps of any strategy we might pick, and there are terminating computations that will always run asymptotically (big-O) faster under cbn than under cbv.
The dark downside to all this is that an implementation of cbn requires a bunch of extra mechanisms to keep track of unevaluated computations, remember results once evaluation is complete, and check whether something has been evaluated or not. So in practice cbn evaluation imposes a heavy (but constant factor) overhead to all computations. It turns out that there's a similar conceptual overhead when we're doing abstract reasoning about cbn programs as well — we end up having to replicate the record keeping we do during evaluation.
Clairvoyance
The simple insight of the paper is this: every time we bind an expression, we can non-deterministically decide to just throw it away. Otherwise we can evaluate it immediately as we would with cbv. If we never throw anything away we get cbv. If we throw away too much, our program gets stuck and can't find a variable binding it needs. If we throw away exactly the computations we didn't need, then our program does exactly the same work it would have done under cbn.
We can look at this another way — if we have an oracle that tells us whether we're going to need an expression somewhere down the line, we can just throw away expressions the oracle says are unnecessary, and immediately evaluate expressions the oracle thinks we'll want.
Now if we're just trying to count evaluation steps, we can do it easily in this framework – the cbn evaluation steps are the minimum number we need for any evaluation that doesn't get stuck. No need to simulate all the mechanisms of laziness. The authors show, in particular, that it makes it much easier to show that Common Subexpression Elimination is actually an optimization.
Non-strictness, or why I care
One obvious follow-on to all this is to observe that there's a whole set of strategies in the middle, where our oracle gives more or less accurate approximations to whether we'll need the value of an expression. This is where I lived for much of grad school — the Id and pH languages evaluated expressions in parallel, and Eager Haskell used cbv techniques for a while but would occasionally give up and fall back to lazy evaluation. Eager Haskell lived in a world that was semantically equivalent to cbn, but which would do "too much" work (a very large constant factor too much in the worst case).
Interestingly, while I thought this would perfectly capture the non-strict eager semantics of Id and pH as other points on the oracle's continuum, that isn't obviously true – non-strict parallel execution lets us express some cyclic dependencies that we can exploit in a lazy program, but that would deadlock if we used cbv-style execution. In that respect, the ordering constraints in the face of mutual recursion covered later in the paper are relevant. In Id and pH it looks like we might have to interleave partial reduction in those mutual recursions, which is exactly what we'd like to avoid.
Is there a similarly simple semantics that captures the computational complexity of non-strict eagerness without simulating its run-time behavior? I'd love to see one if it exists.
Strictness Analysis
One of the most important optimizations in a compiler for a lazy language like Haskell is strictness analysis. The job of the strictness analyzer is to identify a subset of expressions in the program that are guaranteed to be evaluated. These are then transformed to use cbv rather than cbn. So we can think of the strictness analyzer as an approximation of our oracle. There are a couple of key differences:
- The strictness analyzer operates on the source program. If a function f is called 100 times, and it uses its first argument in only 99 of the calls, we have to conservatively say the argument is not guaranteed to be evaluated. By contrast our oracle gives its answers as we evaluate, so it answers the question correctly all 100 times.
- Even given that constraint, giving an accurate approximation is equivalent to the halting problem (as with most static analysis) – so a strictness analyzer also ends up being conservative because it can't find an accurate answer.
- As a result of being conservative, strictness analysis can never throw out expressions; the compiler has to use cbn evaluation on the expressions it can't figure out.
We can imagine a dual analysis that identifies when an expression is never used. The compiler could replace the computation with any other computation it wanted to (for example, undefined in Haskell). This is closely related to dead-code elimination; as it turns out, just getting rid of every trace of the expression (as dead-code elimination does) is far more effective and there just aren't very many cases left after that, so my understanding is it's generally not thought to be worth it.