3.9 KiB
title | date | tags | draft | |
---|---|---|---|---|
Clairvoyance for Good: Using Lazy Evaluation in Haskell | 2020-05-03T20:05:29-07:00 |
|
true |
While tackling a project for work, I ran across a rather unpleasant problem. I don't think it's valuable to go into the specifics here (it's rather large and convoluted); however, the outcome of this experience led me to discover a very interesting technique for lazy functional languages, and I want to share what I learned.
Time Traveling
Some time ago, I read this post by Csongor Kiss about time traveling in Haskell. It's really cool, and makes a lot of sense if you have wrapped your head around lazy evaluation. I'm going to present my take on it here, but please check out Csongor's original post if you are interested.
Say that you have a list of integers, like [3,2,6]
. Next, suppose that
you want to find the maximum value in the list. You can implement such
behavior quite simply with pattern matching:
myMax :: [Int] -> Int
myMax [] = error "Being total sucks"
myMax (x:xs) = max x $ myMax xs
You could even get fancy with a fold
:
myMax :: [Int] -> Int
myMax = foldr1 max
All is well, and this is rather elementary Haskell. But now let's look at
something that Csongor calls the repMax
problem:
Imagine you had a list, and you wanted to replace all the elements of the list with the largest element, by only passing the list once.
How can we possibly do this in one pass? First, we need to find the maximum element, and only then can we have something to replace the other numbers with! It turns out, though, that we can just expect to have the future value, and all will be well. Csongor provides the following example:
1
2
3
4
5
6
7
8
9
10
repMax :: [Int] -> Int -> (Int, [Int])
repMax [] rep = (rep, [])
repMax [x] rep = (x, [rep])
repMax (l : ls) rep = (m', rep : ls')
where (m, ls') = repMax ls rep
m' = max m l
doRepMax :: [Int] -> [Int]
doRepMax xs = xs'
where (largest, xs') = repMax xs largest
In the above snippet, repMax
expects to receive the maximum value of
its input list. At the same time, it also computes that maximum value,
returning it and the newly created list. doRepMax
is where the magic happens:
the where
clauses receives the maximum number from repMax
, while at the
same time using that maximum number to call repMax
!
This works because Haskell's evaluation model is, effectively,
lazy graph reduction. That is,
you can think of Haskell as manipulating your code as
{{< sidenote "right" "tree-note" "a syntax tree," >}}
Why is it called graph reduction, you may be wondering, if the runtime is
manipulating syntax trees? To save on work, if a program refers to the
same value twice, Haskell has both of those references point to the
exact same graph. This violates the tree's property of having only one path
from the root to any node, and makes our program a graph. Graphs that
refer to themselves also violate the properties of a tree.
{{< /sidenote >}} performing
substitutions and simplifications as necessary until it reaches a final answer.
What the lazy part means is that parts of the syntax tree that are not yet
needed to compute the final answer can exist, unsimplied, in the tree. This is
what allows us to write the code above: the graph of repMax xs largest
effectively refers to itself. While traversing the list, it places references
to itself in place of each of the elements, and thanks to laziness, these
references are not evaluated.
Let's try a more complicated example. How about instead of creating a new list,
we return a Map
containing the number of times each number occured, but only
when those numbers were a factor of the maximum numbers. Our expected output
will be:
>>> countMaxFactors [1,3,3,9]
fromList [(1, 1), (3, 2), (9, 1)]