Update lazy evaluation post with images and more.

2020-07-30 00:49:35 -07:00 · 2020-07-30 00:49:35 -07:00 · 58e6ad9e79
commit 58e6ad9e79
parent 3aa2a6783e
5 changed files with 198 additions and 33 deletions
--- a/content/blog/haskell_lazy_evaluation/fixpoint_1.png
+++ b/content/blog/haskell_lazy_evaluation/fixpoint_1.png
--- a/content/blog/haskell_lazy_evaluation/fixpoint_2.png
+++ b/content/blog/haskell_lazy_evaluation/fixpoint_2.png
--- a/content/blog/haskell_lazy_evaluation/index.md
+++ b/content/blog/haskell_lazy_evaluation/index.md
@ -7,6 +7,7 @@ draft: true

 <style>
 img, figure.small img { max-height: 20rem; }
+figure.tiny img { max-height: 15rem; }
 figure.medium img { max-height: 30rem; }
 </style>

@ -15,8 +16,7 @@ I recently got to use a very curious Haskell technique
 As production as research code gets, anyway!
 {{< /sidenote >}} time traveling. I say this with
 the utmost seriousness. This technique worked like
-magic for the problem I was trying to solve (which isn't
-interesting enough to be presented here in itself), and so
+magic for the problem I was trying to solve, and so
 I thought I'd share what I learned. In addition
 to the technique and its workings, I will also explain how 
 time traveling can be misused, yielding computations that
@ -74,7 +74,7 @@ value even come from?

 Thus far, nothing too magical has happened. It's a little
 strange to expect the result of the computation to be
-given to us; however, thus far, it looks like wishful
+given to us; it just looks like wishful
 thinking. The real magic happens in Csongor's `doRepMax`
 function:

@ -100,8 +100,9 @@ Why is it called graph reduction, you may be wondering, if the runtime is
 manipulating syntax trees? To save on work, if a program refers to the
 same value twice, Haskell has both of those references point to the
 exact same graph. This violates the tree's property of having only one path
-from the root to any node, and makes our program a graph. Graphs that
-refer to themselves also violate the properties of a tree.
+from the root to any node, and makes our program a DAG (at least). Graph nodes that
+refer to themselves (which are also possible in the model) also violate the properties of a
+a DAG, and thus, in general, we are working with graphs.
 {{< /sidenote >}} performing
 substitutions and simplifications as necessary until it reaches a final answer.
 What the lazy part means is that parts of the syntax tree that are not yet
@ -184,7 +185,7 @@ we end up with the following:

 {{< figure src="square_2.png" caption="The graph of `let x = square 5 in x + x` after `square 5` is reduced." >}}

-There are two `25`s in the tree, and no more `square`s! We only
+There are two `25`s in the graph, and no more `square`s! We only
 had to evaluate `square 5` exactly once, even though `(+)`
 will use it twice (once for the left argument, and once for the right).

@ -207,7 +208,7 @@ fix f = let x = f x in x
 See how the definition of `x` refers to itself? This is what
 it looks like in graph form:

-{{< figure src="fixpoint_1.png" caption="The initial graph of `let x = f x in x`." >}}
+{{< figure src="fixpoint_1.png" caption="The initial graph of `let x = f x in x`." class="tiny" >}}

 I think it's useful to take a look at how this graph is processed. Let's
 pick `f = (1:)`. That is, `f` is a function that takes a list,
@ -221,7 +222,8 @@ constant `1`, and then to `f`'s argument (`x`, in this case). As
 before, once we evaluated `f x`, we replaced the application with
 an indirection; in the image, this indirection is the top box. But the
 argument, `x`, is itself an indirection which points to the root of `f x`,
-thereby creating a cycle in our graph. 
+thereby creating a cycle in our graph. Traversing this graph looks like
+traversing an infinite list of `1`s.

 Almost there! A node can refer to itself, and, when evaluated, it
 is replaced with its own value. Thus, a node can effectively reference
@ -259,18 +261,16 @@ Now, let's write the initial graph for `doRepMax [1,2]`:
 {{< figure src="repmax_1.png" caption="The initial graph of `doRepMax [1,2]`." >}}

 Other than our new notation, there's nothing too surprising here.
-At a high level, all we want is the second element of the tuple
+The first step of our hypothetical reduction would replace the application of `doRepMax` with its
+body, and create our graph's first cycle. At a high level, all we want is the second element of the tuple
 returned by `repMax`, which contains the output list. To get
-the tuple, we apply `repMax` to the list `[1,2]`, which itself
+the tuple, we apply `repMax` to the list `[1,2]` and the first element
+of its result. The list `[1,2]` itself
 consists of two uses of the `(:)` function.

-The first step
-of our hypothetical reduction would replace the application of `doRepMax` with its
-body, and create our graph's first cycle:
-
 {{< figure src="repmax_2.png" caption="The first step of reducing `doRepMax [1,2]`." >}}

-Next, we would do the same for the body of `repMax`. In
+Next, we would also expand the body of `repMax`. In
 the following diagram, to avoid drawing a noisy amount of
 crossing lines, I marked the application of `fst` with
 a star, and replaced the two edges to `fst` with
@ -362,7 +362,7 @@ element of the tuple, and replace `snd` with an indirection to it:

 The second element of the tuple was a call to `(:)`, and that's what the mysterious
 force is processing now. Just like it did before, it starts by looking at the
-first argument of this list, which is head. This argument is a reference to
+first argument of this list, which is the list's head. This argument is a reference to
 the starred node, which, as we've established, eventually points to `2`.
 Another `2` pops up on the console.

@ -374,32 +374,197 @@ After removing the unused nodes, we are left with the following graph:

 {{< figure src="repmax_10.png" caption="The result of reducing `doRepMax [1,2]`." >}}

-As we would have expected, two `2`s are printed to the console.
+As we would have expected, two `2`s were printed to the console, and our
+final graph represents the list `[2,2]`.

 ### Using Time Traveling
+Is time tarveling even useful? I would argue yes, especially
+in cases where Haskell's purity can make certain things
+difficult.

-{{< todo >}}This whole section {{< /todo >}}
+As a first example, Csongor provides an assembler that works
+in a single pass. The challenge in this case is to resolve
+jumps to code segments occuring _after_ the jump itself;
+in essence, the address of the target code segment needs to be
+known before the segment itself is processed. Csongor's
+code uses the [Tardis monad](https://hackage.haskell.org/package/tardis-0.4.1.0/docs/Control-Monad-Tardis.html),
+which combines regular state, to which you can write and then
+later read from, and future state, from which you can
+read values before your write them. Check out
+[his complete example](https://kcsongor.github.io/time-travel-in-haskell-for-dummies/#a-single-pass-assembler-an-example) here.
+
+Alternatively, here's an example from my research. I'll be fairly
+vague, since all of this is still in progress. The gist is that
+we have some kind of data structure (say, a list or a tree),
+and we want to associate with each element in this data
+structure a 'score' of how useful it is. There are many possible
+heuristics of picking 'scores'; a very simple one is 
+to make it inversely propertional to the number of times
+an element occurs. To be more concrete, suppose
+we have some element type `Element`:
+
+{{< codelines "Haskell" "time-traveling/ValueScore.hs" 5 6 >}}
+
+Suppose also that our data structure is a binary tree:
+
+{{< codelines "Haskell" "time-traveling/ValueScore.hs" 14 16 >}}
+
+We then want to transform an input `ElementTree`, such as:
+
+```Haskell
+Node A (Node A Empty Empty) Empty
+```
+
+Into a scored tree, like:
+
+```Haskell
+Node (A,0.5) (Node (A,0.5) Empty Empty) Empty
+```
+
+Since `A` occured twice, its score is `1/2 = 0.5`. 
+
+Let's define some utility functions before we get to the
+meat of the implementation:
+
+{{< codelines "Haskell" "time-traveling/ValueScore.hs" 8 12 >}}
+
+The `addElement` function simply increments the counter for a particular
+element in the map, adding the number `1` if it doesn't exist. The `getScore`
+function computes the score of a particular element, defaulting to `1.0` if
+it's not found in the map.
+
+Just as before -- noticing that passing around the future values is getting awfully
+bothersome -- we write our scoring function as though we have
+a 'future value'.
+
+{{< codelines "Haskell" "time-traveling/ValueScore.hs" 18 24 >}}
+
+The actual `doAssignScores` function is pretty much identical to
+`doRepMax`:
+
+{{< codelines "Haskell" "time-traveling/ValueScore.hs" 26 28 >}}
+
+There's quite a bit of repetition here, especially in the handling
+of future values - all of our functions now accept an extra
+future argument, and return a work-in-progress future value.
+This is what the `Tardis` monad, and its corresponding
+`TardisT` monad transformer, aim to address. Just like the
+`State` monad helps us avoid writing plumbing code for
+forward-traveling values, `Tardis` helps us do the same
+for backward-traveling ones.
+
+#### Cycles in Monadic Bind
+
+We've seen that we're able to write code like the following:
+
+```Haskell
+(a, b) = f a c
+```
+
+That is, we were able to write function calls that referenced
+their own return values. What if we try doing this inside
+a `do` block? Say, for example, we want to sprinkle some time
+traveling into our program, but don't want to add a whole new
+transformer into our monad stack. We could write code as follows:
+
+```Haskell
+do
+    (a, b) <- f a c
+    return b
+```
+
+Unfortunately, this doesn't work. However, it's entirely
+possible to enable this using the `RecursiveDo` language
+extension:
+
+```Haskell
+{-# LANGUAGE RecursiveDo #-}
+```
+
+Then, we can write the above as follows:
+
+```Haskell
+do
+    rec (a, b) <- f a c
+    return b
+``` 
+
+This power, however, comes at a price. It's not as straightforward
+to build graphs from recursive monadic computations; in fact,
+it's not possible in general. The translation of the above
+code uses `MonadFix`. A monad that satisfies `MonadFix` has
+an operation `mfix`, which is the monadic version of the `fix`
+function we saw earlier:
+
+```Haskell
+mfix :: Monad m => (a -> m a) -> m a
+-- Regular fix, for comparison
+fix :: (a -> a) -> a
+```
+
+To really understand how the translation works, check out the
+[paper on recursive do notation](http://leventerkok.github.io/papers/recdo.pdf).

 ### Beware The Strictness
+Though Csongor points out other problems with the
+time traveling approach, I think he doesn't mention
+an important idea: you have to be _very_ careful about introducing
+strictness into your programs when running time-traveling code.
+For example, suppose we wanted to write a function,
+`takeUntilMax`, which would return the input list,
+cut off after the first occurence of the maximum number.
+Following the same strategy, we come up with:

-{{< todo >}}This whole section, too. {{< /todo >}}
+{{< codelines "Haskell" "time-traveling/TakeMax.hs" 1 12 >}}

-### Leftovers
+In short, if we encounter our maximum number, we just return
+a list of that maximum number, since we do not want to recurse
+further. On the other hand, if we encounter a number that's
+_not_ the maximum, we continue our recursion.

-This is
-what allows us to write the code above: the graph of `repMax xs largest`
-effectively refers to itself. While traversing the list, it places references
-to itself in place of each of the elements, and thanks to laziness, these
-references are not evaluated.
+Unfortunately, this doesn't work; our program never terminates.
+You may be thinking:

-Let's try a more complicated example. How about instead of creating a new list,
-we return a `Map` containing the number of times each number occured, but only
-when those numbers were a factor of the maximum numbers. Our expected output
-will be:
+> Well, obviously this doesn't work! We didn't actually
+compute the maximum number properly, since we stopped
+recursing too early. We need to traverse the whole list,
+and not just the part before the maximum number.

-```
->>> countMaxFactors [1,3,3,9]
+To address this, we can reformulate our `takeUntilMax`
+function as follows:

-fromList [(1, 1), (3, 2), (9, 1)]
-```
+{{< codelines "Haskell" "time-traveling/TakeMax.hs" 14 21 >}}

+Now we definitely compute the maximum correctly! Alas,
+this doesn't work either. The issue lies on lines 5 and 18,
+more specifically in the comparison `x == m`. Here, we 
+are trying to base the decision of what branch to take
+on a future value. This is simply impossible; to compute
+the value, we need to know the value!
+
+This is no 'silly mistake', either! In complicated programs
+that use time traveling, strictness lurks behind every corner.
+In my research work, I was at one point inserting a data structure into
+a set; however, deep in the structure was a data type containing
+a 'future' value, and using the default `Eq` instance!
+Adding the data structure to a set ended up invoking `(==)` (or perhaps
+some function from the `Ord` typeclass),
+which, in turn, tried to compare the lazily evaluated values.
+My code therefore didn't terminate, much like `takeUntilMax`.
+
+Debugging time traveling code is, in general,
+a pain. This is especially true since future values don't look any different
+from regular values. You can see it in the type signatures
+of `repMax` and `takeUntilMax`: the maximum number is just an `Int`!
+And yet, trying to see what its value is will kill the entire program.
+As always, remember Brian W. Kernighan's wise words:
+
+> Debugging is twice as hard as writing the code in the first place.
+Therefore, if you write the code as cleverly as possible, you are,
+by definition, not smart enough to debug it.
+
+### Conclusion
+This is about it! In a way, time traveling can make code performing
+certain operations more expressive. Furthermore, even if it's not groundbreaking,
+thinking about time traveling is a good exercise to get familiar
+with lazy evaluation in general. I hope you found this useful!
--- a/content/blog/haskell_lazy_evaluation/length_2.png
+++ b/content/blog/haskell_lazy_evaluation/length_2.png
--- a/content/blog/haskell_lazy_evaluation/square_2.png
+++ b/content/blog/haskell_lazy_evaluation/square_2.png