blog-static/content/blog/haskell_lazy_evaluation/index.md

17 KiB

title date tags draft
Time Traveling In Haskell: How It Works And How To Use It 2020-05-03T20:05:29-07:00
Haskell
true

I recently got to use a very curious Haskell technique {{< sidenote "right" "production-note" "in production:" >}} As production as research code gets, anyway! {{< /sidenote >}} time traveling. I say this with the utmost seriousness. This technique worked like magic for the problem I was trying to solve (which isn't interesting enough to be presented here in itself), and so I thought I'd share what I learned. In addition to the technique and its workings, I will also explain how time traveling can be misused, yielding computations that never terminate.

Time Traveling

Some time ago, I read this post by Csongor Kiss about time traveling in Haskell. It's really cool, and makes a lot of sense if you have wrapped your head around lazy evaluation. I'm going to present my take on it here, but please check out Csongor's original post if you are interested.

Say that you have a list of integers, like [3,2,6]. Next, suppose that you want to find the maximum value in the list. You can implement such behavior quite simply with pattern matching:

myMax :: [Int] -> Int
myMax [] = error "Being total sucks"
myMax (x:xs) = max x $ myMax xs

You could even get fancy with a fold:

myMax :: [Int] -> Int
myMax = foldr1 max

All is well, and this is rather elementary Haskell. But now let's look at something that Csongor calls the repMax problem:

Imagine you had a list, and you wanted to replace all the elements of the list with the largest element, by only passing the list once.

How can we possibly do this in one pass? First, we need to find the maximum element, and only then can we have something to replace the other numbers with! It turns out, though, that we can just expect to have the future value, and all will be well. Csongor provides the following example:

repMax :: [Int] -> Int -> (Int, [Int])
repMax [] rep = (rep, [])
repMax [x] rep = (x, [rep])
repMax (l : ls) rep = (m', rep : ls')
  where (m, ls') = repMax ls rep
        m' = max m l

In this example, repMax takes the list of integers, each of which it must replace with their maximum element. It also takes as an argument the maximum element, as if it had already been computed. It does, however, still compute the intermediate maximum element, in the form of m'. Otherwise, where would the future value even come from?

Thus far, nothing too magical has happened. It's a little strange to expect the result of the computation to be given to us; however, thus far, it looks like wishful thinking. The real magic happens in Csongor's doRepMax function:

doRepMax :: [Int] -> [Int]
doRepMax xs = xs'
  where (largest, xs') = repMax xs largest

Look, in particular, on the line with the where clause. We see that repMax returns the maximum element of the list, largest, and the resulting list xs' consisting only of largest repeated as many times as xs had elements. But what's curious is the call to repMax itself. It takes as input xs, the list we're supposed to process... and largest, the value that it itself returns.

This works because Haskell's evaluation model is, effectively, lazy graph reduction. That is, you can think of Haskell as manipulating your code as {{< sidenote "right" "tree-note" "a syntax tree," >}} Why is it called graph reduction, you may be wondering, if the runtime is manipulating syntax trees? To save on work, if a program refers to the same value twice, Haskell has both of those references point to the exact same graph. This violates the tree's property of having only one path from the root to any node, and makes our program a graph. Graphs that refer to themselves also violate the properties of a tree. {{< /sidenote >}} performing substitutions and simplifications as necessary until it reaches a final answer. What the lazy part means is that parts of the syntax tree that are not yet needed to compute the final answer can exist, unsimplified, in the tree. Why don't we draw a few graphs to get familiar with the idea?

Visualizing Graphs and Their Reduction

Let's start with something that doesn't have anything fancy. We can take a look at the graph of the expression:

length [1]

Stripping away Haskell's syntax sugar for lists, we can write this expression as:

length (1:[])

Then, recalling that (:), or 'cons', is just a binary function, we rewrite:

length ((:) 1 [])

We're now ready to draw the graph; in this case, it's pretty much identical to the syntax tree of the last form of our expression:

{{< figure src="length_1.png" caption="The initial graph of length [1]." >}}

In this image, the @ nodes represent function application. The root node is an application of the function length to the graph that represents the list [1]. The list itself is represented using two application nodes: (:) takes two arguments, the head and tail of the list, and function applications in Haskell are curried. Eventually, in the process of evaluation, the body of length will be reached, and leave us with the following graph:

{{< figure src="length_2.png" caption="The graph of length [1] after the body of length is expanded." >}}

Conceptually, we only took one reduction step, and thus, we haven't yet gotten to evaluating the recursive call to length. Since (+) is also a binary function, 1+length xs is represented in this new graph as two applications of (+), first to 1, and then to length [].

But what is that box at the root? This box used to be the root of the first graph, which was an application node. However, it is now a an indirection. Conceptually, reducing this indirection amounts to reducing the graph it points to. But why have we {{< sidenote "right" "altered-note" "altered the graph" >}} This is a key aspect of implementing functional languages. The language itself may be pure, while the runtime can be, and usually is, impure and stateful. After all, computers are impure and stateful, too! {{< /sidenote >}} in this manner? Because Haskell is a pure language, of course! If we know that a particular graph reduces to some value, there's no reason to reduce it again. However, as we will soon see, it may be used again, so we want to preserve its value. Thus, when we're done reducing a graph, we replace its root node with an indirection that points to its result.

When can a graph be used more than once? Well, how about this:

let x = square 5 in x + x

Here, the initial graph looks as follows:

{{< figure src="square_1.png" caption="The initial graph of let x = square 5 in x + x." >}}

As you can see, this is a graph, but not a tree! Since both variables x refer to the same expression, square 5, they are represented by the same subgraph. Then, when we evaluate square 5 for the first time, and replace its root node with an indirection, we end up with the following:

{{< figure src="square_2.png" caption="The graph of let x = square 5 in x + x after square 5 is reduced." >}}

There are two 25s in the tree, and no more squares! We only had to evaluate square 5 exactly once, even though (+) will use it twice (once for the left argument, and once for the right).

Our graphs can also include cycles. A simple, perhaps the most simple example of this in practice is Haskell's fix function. It computes a function's fixed point, {{< sidenote "right" "fixpoint-note" "and can be used to write recursive functions." >}} In fact, in the lambda calculus, fix is pretty much the only way to write recursive functions. In the untyped lambda calculus, it can be written as: \lambda f . (\lambda x . f (x \ x)) \ (\lambda x . f (x \ x)) In the simply typed lambda calculus, it cannot be written in any way, and needs to be added as an extension, typically written as \textbf{fix}. {{< /sidenote >}} It's implemented as follows:

fix f = let x = f x in x

See how the definition of x refers to itself? This is what it looks like in graph form:

{{< figure src="fixpoint_1.png" caption="The initial graph of let x = f x in x." >}}

I think it's useful to take a look at how this graph is processed. Let's pick f = (1:). That is, f is a function that takes a list, and prepends 1 to it. Then, after constructing the graph of f x, we end up with the following:

{{< figure src="fixpoint_2.png" caption="The graph of fix (1:) after it's been reduced." >}}

We see the body of f, which is the application of (:) first to the constant 1, and then to f's argument (x, in this case). As before, once we evaluated f x, we replaced the application with an indirection; in the image, this indirection is the top box. But the argument, x, is itself an indirection which points to the root of f x, thereby creating a cycle in our graph.

Almost there! A node can refer to itself, and, when evaluated, it is replaced with its own value. Thus, a node can effectively reference its own value! The last piece of the puzzle is how a node can access parts of its own value: recall that doRepMax calls repMax with only largest, while repMax returns (largest, xs'). I have to admit, I don't know the internals of GHC, but I suspect this is done by translating the code into something like:

doRepMax :: [Int] -> [Int]
doRepMax xs = snd t
  where t = repMax xs (fst t)

Detailed Example: Reducing doRepMax

If the above examples haven't elucidated how doRepMax works, stick around in this section and we will go through it step-by-step. This is a rather long and detailed example, so feel free to skip this section to read more about actually using time traveling.

If you're sticking around, why don't we watch how the graph of doRepMax [1, 2] unfolds. This example will be more complex than the ones we've seen so far; to avoid overwhelming ourselves with notation, let's adopt a different convention of writing functions. Instead of using application nodes @, let's draw an application of a function f to arguments x1 through xn as a subgraph with root f and children xs. The below figure demonstrates what I mean:

{{< figure src="notation.png" caption="The new visual notation used in this section." >}}

Now, let's write the initial graph for doRepMax [1,2]:

{{< figure src="repmax_1.png" caption="The initial graph of doRepMax [1,2]." >}}

Other than our new notation, there's nothing too surprising here. At a high level, all we want is the second element of the tuple returned by repMax, which contains the output list. To get the tuple, we apply repMax to the list [1,2], which itself consists of two uses of the (:) function.

The first step of our hypothetical reduction would replace the application of doRepMax with its body, and create our graph's first cycle:

{{< figure src="repmax_2.png" caption="The first step of reducing doRepMax [1,2]." >}}

Next, we would do the same for the body of repMax. In the following diagram, to avoid drawing a noisy amount of crossing lines, I marked the application of fst with a star, and replaced the two edges to fst with edges to similar looking stars. This is merely a visual trick; an edge leading to a little star is actually an edge leading to fst. Take a look:

{{< figure src="repmax_3.png" caption="The second step of reducing doRepMax [1,2]." class="medium" >}}

Since (,) is a constructor, let's say that it doesn't need to be evaluated, and that its {{< sidenote "right" "normal-note" "graph cannot be reduced further" >}} A graph that can't be reduced further is said to be in normal form, by the way. {{< /sidenote >}} (in practice, other things like packing may occur here, but they are irrelevant to us). If (,) can't be reduced, we can move on to evaluating snd. Given a pair, snd simply returns the second element, which in our case is the subgraph starting at (:). We thus replace the application of snd with an indirection to this subgraph. This leaves us with the following:

{{< figure src="repmax_4.png" caption="The third step of reducing doRepMax [1,2]." class="medium" >}}

Since it's becoming hard to keep track of what part of the graph is actually being evaluated, I marked the former root of doRepMax [1,2] with a blue star. If our original expression occured at the top level, the graph reduction would probably stop here. After all, we're evaluating our graphs using call-by-need, and there doesn't seem to be a need for knowing the what the arguments of (:) are. However, stopping at (:) wouldn't be very interesting, and we wouldn't learn much from doing so. So instead, let's assume that something is trying to read the elements of our list; perhaps we are trying to print this list to the screen in GHCi.

In this case, our mysterious external force starts unpacking and inspecting the arguments to (:). The first argument to (:) is the list's head, which is the subgraph starting with the starred application of fst. We evaluate it in a similar manner to snd. That is, we replace this fst with an indirection to the first element of the argument tuple, which happens to be the subgraph starting with max:

{{< figure src="repmax_5.png" caption="The fourth step of reducing doRepMax [1,2]." class="medium" >}}

Phew! Next, we need to evaluate the body of max. Let's make one more simplification here: rather than substitututing max for its body here, let's just reason about what evaluating max would entail. We would need to evaluate its two arguments, compare them, and return the larger one. The argument 1 can't be reduced any more (it's just a number!), but the second argument, a call to fst, needs to be processed. To do so, we need to evaluate the call to repMax. We thus replace repMax with its body:

{{< figure src="repmax_6.png" caption="The fifth step of reducing doRepMax [1,2]." class="medium" >}}

We've reached one of the base cases here, and there are no more calls to max or repMax. The whole reason we're here is to evaluate the call to fst that's one of the arguments to max. Given this graph, doing so is easy. We can clearly see that 2 is the first element of the tuple returned by repMax [2]. We thus replace fst with an indirection to this node:

{{< figure src="repmax_7.png" caption="The sixth step of reducing doRepMax [1,2]." class="medium" >}}

This concludes our task of evaluating the arguments to max. Comparing them, we see that 2 is greater than 1, and thus, we replace max with an indirection to 2:

{{< figure src="repmax_8.png" caption="The seventh step of reducing doRepMax [1,2]." class="medium" >}}

The node that we starred in our graph is now an indirection (the one that used to be the call to fst) which points to another indirection (formerly the call to max), which points to 2. Thus, any edge pointing to a star now points to the value 2.

By finding the value of the starred node, we have found the first argument of (:), and returned it to our mysterious external force. If we were printing to GHCi, the number 2 would appear on the screen right about now. The force then moves on to the second argument of (:), which is the call to snd. This snd is applied to an instance of (,), which can't be reduced any further. Thus, all we have to do is take the second element of the tuple, and replace snd with an indirection to it:

{{< figure src="repmax_9.png" caption="The eighth step of reducing doRepMax [1,2]." class="medium" >}}

The second element of the tuple was a call to (:), and that's what the mysterious force is processing now. Just like it did before, it starts by looking at the first argument of this list, which is head. This argument is a reference to the starred node, which, as we've established, eventually points to 2. Another 2 pops up on the console.

Finally, the mysterious force reaches the second argument of the (:), which is the empty list. The empty list also cannot be evaluated any further, so that's what the mysterious force receives. Just like that, there's nothing left to print to the console. The mysterious force ceases. After removing the unused nodes, we are left with the following graph:

{{< figure src="repmax_10.png" caption="The result of reducing doRepMax [1,2]." >}}

As we would have expected, two 2s are printed to the console.

Using Time Traveling

{{< todo >}}This whole section {{< /todo >}}

Beware The Strictness

{{< todo >}}This whole section, too. {{< /todo >}}

Leftovers

This is what allows us to write the code above: the graph of repMax xs largest effectively refers to itself. While traversing the list, it places references to itself in place of each of the elements, and thanks to laziness, these references are not evaluated.

Let's try a more complicated example. How about instead of creating a new list, we return a Map containing the number of times each number occured, but only when those numbers were a factor of the maximum numbers. Our expected output will be:

>>> countMaxFactors [1,3,3,9]

fromList [(1, 1), (3, 2), (9, 1)]