diff --git a/content/blog/haskell_lazy_evaluation.md b/content/blog/haskell_lazy_evaluation.md new file mode 100644 index 0000000..fa9e18a --- /dev/null +++ b/content/blog/haskell_lazy_evaluation.md @@ -0,0 +1,95 @@ +--- +title: "Clairvoyance for Good: Using Lazy Evaluation in Haskell" +date: 2020-05-03T20:05:29-07:00 +tags: ["Haskell"] +draft: true +--- + +While tackling a project for work, I ran across a rather unpleasant problem. +I don't think it's valuable to go into the specifics here (it's rather +large and convoluted); however, the outcome of this experience led me to +discover a very interesting technique for lazy functional languages, +and I want to share what I learned. + +### Time Traveling +Some time ago, I read [this post](https://kcsongor.github.io/time-travel-in-haskell-for-dummies/) by Csongor Kiss about time traveling in Haskell. It's +really cool, and makes a lot of sense if you have wrapped your head around +lazy evaluation. I'm going to present my take on it here, but please check out +Csongor's original post if you are interested. + +Say that you have a list of integers, like `[3,2,6]`. Next, suppose that +you want to find the maximum value in the list. You can implement such +behavior quite simply with pattern matching: + +```Haskell +myMax :: [Int] -> Int +myMax [] = error "Being total sucks" +myMax (x:xs) = max x $ myMax xs +``` + +You could even get fancy with a `fold`: + +```Haskell +myMax :: [Int] -> Int +myMax = foldr1 max +``` + +All is well, and this is rather elementary Haskell. But now let's look at +something that Csongor calls the `repMax` problem: + +> Imagine you had a list, and you wanted to replace all the elements of the +> list with the largest element, by only passing the list once. + +How can we possibly do this in one pass? First, we need to find the maximum +element, and only then can we have something to replace the other numbers +with! It turns out, though, that we can just expect to have the future +value, and all will be well. Csongor provides the following example: + +```Haskell {linenos=table} +repMax :: [Int] -> Int -> (Int, [Int]) +repMax [] rep = (rep, []) +repMax [x] rep = (x, [rep]) +repMax (l : ls) rep = (m', rep : ls') + where (m, ls') = repMax ls rep + m' = max m l + +doRepMax :: [Int] -> [Int] +doRepMax xs = xs' + where (largest, xs') = repMax xs largest +``` + +In the above snippet, `repMax` expects to receive the maximum value of +its input list. At the same time, it also computes that maximum value, +returning it and the newly created list. `doRepMax` is where the magic happens: +the `where` clauses receives the maximum number from `repMax`, while at the +same time using that maximum number to call `repMax`! + +This works because Haskell's evaluation model is, effectively, +[lazy graph reduction](https://en.wikipedia.org/wiki/Graph_reduction). That is, +you can think of Haskell as manipulating your code as +{{< sidenote "right" "tree-note" "a syntax tree," >}} +Why is it called graph reduction, you may be wondering, if the runtime is +manipulating syntax trees? To save on work, if a program refers to the +same value twice, Haskell has both of those references point to the +exact same graph. This violates the tree's property of having only one path +from the root to any node, and makes our program a graph. Graphs that +refer to themselves also violate the properties of a tree. +{{< /sidenote >}} performing +substitutions and simplifications as necessary until it reaches a final answer. +What the lazy part means is that parts of the syntax tree that are not yet +needed to compute the final answer can exist, unsimplied, in the tree. This is +what allows us to write the code above: the graph of `repMax xs largest` +effectively refers to itself. While traversing the list, it places references +to itself in place of each of the elements, and thanks to laziness, these +references are not evaluated. + +Let's try a more complicated example. How about instead of creating a new list, +we return a `Map` containing the number of times each number occured, but only +when those numbers were a factor of the maximum numbers. Our expected output +will be: + +``` +>>> countMaxFactors [1,3,3,9] + +fromList [(1, 1), (3, 2), (9, 1)] +```