blog-static/content/blog/haskell_lazy_evaluation.md

3.9 KiB

title date tags draft
Clairvoyance for Good: Using Lazy Evaluation in Haskell 2020-05-03T20:05:29-07:00
Haskell
true

While tackling a project for work, I ran across a rather unpleasant problem. I don't think it's valuable to go into the specifics here (it's rather large and convoluted); however, the outcome of this experience led me to discover a very interesting technique for lazy functional languages, and I want to share what I learned.

Time Traveling

Some time ago, I read this post by Csongor Kiss about time traveling in Haskell. It's really cool, and makes a lot of sense if you have wrapped your head around lazy evaluation. I'm going to present my take on it here, but please check out Csongor's original post if you are interested.

Say that you have a list of integers, like [3,2,6]. Next, suppose that you want to find the maximum value in the list. You can implement such behavior quite simply with pattern matching:

myMax :: [Int] -> Int
myMax [] = error "Being total sucks"
myMax (x:xs) = max x $ myMax xs

You could even get fancy with a fold:

myMax :: [Int] -> Int
myMax = foldr1 max

All is well, and this is rather elementary Haskell. But now let's look at something that Csongor calls the repMax problem:

Imagine you had a list, and you wanted to replace all the elements of the list with the largest element, by only passing the list once.

How can we possibly do this in one pass? First, we need to find the maximum element, and only then can we have something to replace the other numbers with! It turns out, though, that we can just expect to have the future value, and all will be well. Csongor provides the following example:

1 2 3 4 5 6 7 8 9 10 repMax :: [Int] -> Int -> (Int, [Int]) repMax [] rep = (rep, []) repMax [x] rep = (x, [rep]) repMax (l : ls) rep = (m', rep : ls') where (m, ls') = repMax ls rep m' = max m l doRepMax :: [Int] -> [Int] doRepMax xs = xs' where (largest, xs') = repMax xs largest

In the above snippet, repMax expects to receive the maximum value of its input list. At the same time, it also computes that maximum value, returning it and the newly created list. doRepMax is where the magic happens: the where clauses receives the maximum number from repMax, while at the same time using that maximum number to call repMax!

This works because Haskell's evaluation model is, effectively, lazy graph reduction. That is, you can think of Haskell as manipulating your code as {{< sidenote "right" "tree-note" "a syntax tree," >}} Why is it called graph reduction, you may be wondering, if the runtime is manipulating syntax trees? To save on work, if a program refers to the same value twice, Haskell has both of those references point to the exact same graph. This violates the tree's property of having only one path from the root to any node, and makes our program a graph. Graphs that refer to themselves also violate the properties of a tree. {{< /sidenote >}} performing substitutions and simplifications as necessary until it reaches a final answer. What the lazy part means is that parts of the syntax tree that are not yet needed to compute the final answer can exist, unsimplied, in the tree. This is what allows us to write the code above: the graph of repMax xs largest effectively refers to itself. While traversing the list, it places references to itself in place of each of the elements, and thanks to laziness, these references are not evaluated.

Let's try a more complicated example. How about instead of creating a new list, we return a Map containing the number of times each number occured, but only when those numbers were a factor of the maximum numbers. Our expected output will be:

>>> countMaxFactors [1,3,3,9]

fromList [(1, 1), (3, 2), (9, 1)]