Resume working on the draft of time traveling.

2020-07-20 22:32:14 -07:00 · 2020-07-20 22:32:14 -07:00 · e9f2378b47
commit e9f2378b47
parent 7d2f78d25c
1 changed files with 316 additions and 13 deletions
--- a/content/blog/haskell_lazy_evaluation.md
+++ b/content/blog/haskell_lazy_evaluation.md
@ -1,15 +1,21 @@
 ---
-title: "Clairvoyance for Good: Using Lazy Evaluation in Haskell"
+title: "Time Traveling In Haskell: How It Works And How To Use It"
 date: 2020-05-03T20:05:29-07:00
 tags: ["Haskell"]
 draft: true
 ---

-While tackling a project for work, I ran across a rather unpleasant problem.
-I don't think it's valuable to go into the specifics here (it's rather
-large and convoluted); however, the outcome of this experience led me to
-discover a very interesting technique for lazy functional languages,
-and I want to share what I learned.
+I recently got to use a very curious Haskell technique
+{{< sidenote "right" "production-note" "in production:" >}}
+As production as research code gets, anyway!
+{{< /sidenote >}} time traveling. I say this with
+the utmost seriousness. This technique worked like
+magic for the problem I was trying to solve (which isn't
+interesting enough to be presented here in itself), and so
+I thought I'd share what I learned. In addition
+to the technique and its workings, I will also explain how 
+time traveling can be misused, yielding computations that
+never terminate.

 ### Time Traveling
 Some time ago, I read [this post](https://kcsongor.github.io/time-travel-in-haskell-for-dummies/) by Csongor Kiss about time traveling in Haskell. It's
@ -45,24 +51,41 @@ element, and only then can we have something to replace the other numbers
 with! It turns out, though, that we can just expect to have the future
 value, and all will be well. Csongor provides the following example:

-```Haskell {linenos=table}
+```Haskell
 repMax :: [Int] -> Int -> (Int, [Int])
 repMax [] rep = (rep, [])
 repMax [x] rep = (x, [rep])
 repMax (l : ls) rep = (m', rep : ls')
  where (m, ls') = repMax ls rep
        m' = max m l
+```
+In this example, `repMax` takes the list of integers,
+each of which it must replace with their maximum element.
+It also takes __as an argument__ the maximum element,
+as if it had already been computed. It does, however,
+still compute the intermediate maximum element,
+in the form of `m'`. Otherwise, where would the future
+value even come from?

+Thus far, nothing too magical has happened. It's a little
+strange to expect the result of the computation to be
+given to us; however, thus far, it looks like wishful
+thinking. The real magic happens in Csongor's `doRepMax`
+function:
+
+```Haskell
 doRepMax :: [Int] -> [Int]
 doRepMax xs = xs'
  where (largest, xs') = repMax xs largest
 ```

-In the above snippet, `repMax` expects to receive the maximum value of
-its input list. At the same time, it also computes that maximum value,
-returning it and the newly created list. `doRepMax` is where the magic happens:
-the `where` clauses receives the maximum number from `repMax`, while at the
-same time using that maximum number to call `repMax`!
+Look, in particular, on the line with the `where` clause.
+We see that `repMax` returns the maximum element of the
+list, `largest`, and the resulting list `xs'` consisting
+only of `largest` repeated as many times as `xs` had elements.
+But what's curious is the call to `repMax` itself. It takes
+as input `xs`, the list we're supposed to process... and
+`largest`, the value that _it itself returns_.

 This works because Haskell's evaluation model is, effectively,
 [lazy graph reduction](https://en.wikipedia.org/wiki/Graph_reduction). That is,
@ -77,7 +100,286 @@ refer to themselves also violate the properties of a tree.
 {{< /sidenote >}} performing
 substitutions and simplifications as necessary until it reaches a final answer.
 What the lazy part means is that parts of the syntax tree that are not yet
-needed to compute the final answer can exist, unsimplied, in the tree. This is
+needed to compute the final answer can exist, unsimplified, in the tree.
+Why don't we draw a few graphs to get familiar with the idea?
+
+### Visualizing Graphs and Their Reduction
+Let's start with something that doesn't have anything fancy. We can
+take a look at the graph of the expression:
+
+```Haskell
+length [1]
+```
+
+Stripping away Haskell's syntax sugar for lists, we can write this expression as:
+
+```Haskell
+length (1:[])
+```
+
+Then, recalling that `(:)`, or 'cons', is just a binary function, we rewrite:
+
+```Haskell
+length ((:) 1 [])
+```
+
+We're now ready to draw the graph; in this case, it's pretty much identical
+to the syntax tree of the last form of our expression:
+
+{{< todo >}}Add image!{{< /todo >}}
+
+In this image, the `@` nodes represent function application. The
+root node is an application of the function `length` to the graph that
+represents the list `[1]`. The list itself is represented using two
+application nodes: `(:)` takes two arguments, the head and tail of the
+list, and function applications in Haskell are
+[curried](https://en.wikipedia.org/wiki/Currying). Eventually,
+in the process of evaluation, the body of `length` will be reached,
+and leave us with the following graph:
+
+{{< todo >}}Add image!{{< /todo >}}
+
+Conceptually, we only took one reduction step, and thus, we haven't yet gotten
+to evaluating the recursive call to `length`. Since `(+)`
+is also a binary function, `1+length xs` is represented in this
+new graph as two applications of `(+)`, first to `1`, and then
+to `length []`.
+
+But what is that box at the root? This box _used to be_ the root of the
+first graph, which was an application node. However, it is now a
+an _indirection_. Conceptually, reducing
+this indirection amounts to reducing the graph
+it points to. But why have we {{< sidenote "right" "altered-note" "altered the graph" >}}
+This is a key aspect of implementing functional languages.
+The language itself may be pure, while the runtime
+can be, and usually is, impure and stateful. After all,
+computers are impure and stateful, too!
+{{< /sidenote >}} in this manner? Because Haskell is a pure language,
+of course! If we know that a particular graph reduces to some value,
+there's no reason to reduce it again. However, as we will
+soon see, it may be _used_ again, so we want to preserve its value.
+Thus, when we're done reducing a graph, we replace its root node with
+an indirection that points to its result. 
+
+When can a graph be used more than once? Well, how about this:
+
+```Haskell
+let x = square 5 in x + x
+```
+
+Here, the initial graph looks as follows:
+
+{{< todo >}}Add image!{{< /todo >}}
+
+As you can see, this _is_ a graph, not a tree! Since both
+variables `x` refer to the same expression, `square 5`, they
+are represented by the same subgraph. Then, when we evaluate `square 5`
+for the first time, and replace its root node with an indirection,
+we end up with the following:
+
+{{< todo >}}Add image!{{< /todo >}}
+
+There are two `25`s in the tree, and no more `square`s! We only
+had to evaluate `square 5` exactly once, even though `(+)`
+will use it twice (once for the left argument, and once for the right).
+
+Our graphs can also include cycles.
+A simple, perhaps _the most_ simple example of this in practice is Haskell's
+`fix` function. It computes a function's fixed point,
+{{< sidenote "right" "fixpoint-note" "and can be used to write recursive functions." >}}
+In fact, in the lambda calculus, <code>fix</code> is pretty much <em>the only</em>
+way to write recursive functions. In the untyped lambda calculus, it can
+be written as: $$\lambda f . (\lambda x . f (x \ x)) \  (\lambda x . f (x \ x))$$
+In the simply typed lambda calculus, it cannot be written in any way, and
+needs to be added as an extension, typically written as \(\textbf{fix}\).
+{{< /sidenote >}}
+It's implemented as follows:
+
+```Haskell
+fix f = let x = f x in x
+```
+
+See how the definition of `x` refers to itself? This is what
+it looks like in graph form:
+
+{{< todo >}}Add image!{{< /todo >}}
+
+I think it's useful to take a look at how this graph is processed. Let's
+pick `f = (1:)`. That is, `f` is a function that takes a list,
+and prepends `1` to it. Then, after constructing the graph of `f x`,
+we end up with the following:
+
+{{< todo >}}Add image!{{< /todo >}}
+
+We see the body of `f`, which is the application of `(:)` first to the
+constant `1`, and then to `f`'s argument (`x`, in this case). As
+before, once we evaluated `f x`, we replaced the application with
+an indirection; in the image, this indirection is the top box. But the
+argument, `x`, is itself an indirection which points to the root of `f x`,
+thereby creating a cycle in our graph. 
+
+Almost there! A node can refer to itself, and, when evaluated, it
+is replaced with its own value. Thus, a node can effectively reference
+its own value! The last piece of the puzzle is how a node can access
+_parts_ of its own value: recall that `doRepMax` calls `repMax`
+with only `largest`, while `repMax` returns `(largest, xs')`.
+I have to admit, I don't know the internals of GHC, but I suspect
+this is done by translating the code into something like:
+
+```Haskell
+doRepMax :: [Int] -> [Int]
+doRepMax xs = snd t
+  where t = repMax xs (fst t)
+```
+
+### Detailed Example: Reducing `doRepMax`
+
+If the above examples haven't elucidated how `doRepMax` works,
+stick around in this section and we will go through it step-by-step.
+This is a rather long and detailed example, so feel free to skip
+this section to read more about actually using time traveling.
+
+If you're sticking around, why don't we watch how the graph of `doRepMax [1, 2]` unfolds.
+This example will be more complex than the ones we've seen
+so far; to avoid overwhelming ourselves with notation,
+let's adopt a different convention of writing functions. Instead
+of using application nodes `@`, let's draw an application of a
+function `f` to arguments `x1` through `xn` as a subgraph with root `f`
+and children `x`s. The below figure demonstrates what I mean:
+
+{{< todo >}}Add image!{{< /todo >}}
+
+Now, let's write the initial graph for `doRepMax [1,2]`:
+
+{{< todo >}}Add image!{{< /todo >}}
+
+Other than our new notation, there's nothing too surprising here.
+At a high level, all we want is the second element of the tuple
+returned by `repMax`, which contains the output list. To get
+the tuple, we apply `repMax` to the list `[1,2]`, which itself
+consists of two uses of the `(:)` function.
+
+The first step
+of our hypothetical reduction would replace the application of `doRepMax` with its
+body, and create our graph's first cycle:
+
+{{< todo >}}Add image!{{< /todo >}}
+
+Next, we would do the same for the body of `repMax`. In
+the following diagram, to avoid drawing a noisy amount of
+crossing lines, I marked the application of `fst` with
+a star, and replaced the two edges to `fst` with
+edges to similar looking stars. This is merely
+a visual trick; an edge leading to a little star is
+actually an edge leading to `fst`. Take a look:
+
+{{< todo >}}Add image!{{< /todo >}}
+
+Since `(,)` is a constructor, let's say that it doesn't
+need to be evaluated, and that its
+{{< sidenote "right" "normal-note" "graph cannot be reduced further" >}}
+A graph that can't be reduced further is said to be in <em>normal form</em>,
+by the way.
+{{< /sidenote >}} (in practice, other things like
+packing may occur here, but they are irrelevant to us).
+If `(,)` can't be reduced, we can move on to evaluating `snd`. Given a pair, `snd`
+simply returns the second element, which in our
+case is the subgraph starting at `(:)`. We
+thus replace the application of `snd` with an
+indirection to this subgraph. This leaves us
+with the following:
+
+{{< todo >}}Add image!{{< /todo >}}
+
+If our original `doRepMax [1, 2]` expression occured at the top level,
+the graph reduction would probably stop here.  After all,
+we're evaluating our graphs using call-by-need, and there
+doesn't seem to be a need for knowing the what the arguments of `(:)` are.
+However, stopping at `(:)` wouldn't be very interesting,
+and we wouldn't learn much from doing so. So instead, let's assume
+that _something_ is trying to read the elements of our list;
+perhaps we are trying to print this list to the screen in GHCi.
+
+In this case, our mysterious external force starts unpacking and
+inspecting the arguments to `(:)`. The first argument to `(:)` is
+the list's head, which is the subgraph starting with the starred application
+of `fst`. We evaluate it in a similar manner to `snd`. That is,
+we replace this `fst` with an indirection to the first element
+of the argument tuple, which happens to be the subgraph starting with `max`:
+
+{{< todo >}}Add image!{{< /todo >}}
+
+Phew! Next, we need to evaluate the body of `max`. Let's make one more
+simplification here: rather than substitututing `max` for its body
+here, let's just reason about what evaluating `max` would entail.
+We would need to evaluate its two arguments, compare them,
+and return the larger one. The argument `1` can't be reduced
+any more (it's just a number!), but the second argument,
+a call to `fst`, needs to be processed. To do so, we need to
+evaluate the call to `repMax`. We thus replace `repMax`
+with its body:
+
+{{< todo >}}Add image!{{< /todo >}}
+
+We've reached one of the base cases here, and there
+are no more calls to `max` or `repMax`. The whole reason
+we're here is to evaluate the call to `fst` that's one
+of the arguments to `max`. Given this graph, this is easy.
+We can clearly see that `2` is the first element of the tuple
+returned by `repMax [2]`. We thus replace `fst` with
+an indirection to this node:
+
+{{< todo >}}Add image!{{< /todo >}}
+
+This concludes our task of evaluating the arguments to `max`.
+Comparing them, we see that `2` is greater than `1`, and thus,
+we replace `max` with an indirection to `2`:
+
+{{< todo >}}Add image!{{< /todo >}}
+
+The node that we starred in our graph is now an indirection (the
+one that used to be the call to `fst`) which points to
+another indirection (formerly the call to `max`), which
+points to `2`. Thus, any edge pointing to a star now
+points to the value 2. 
+
+By finding the value of the starred node, we have found the first
+argument of `(:)`, and returned it to our mysterious external force.
+If we were printing to GHCi, the number `2` would appear on the screen
+right about now. The force then moves on to the second argument of `(:)`,
+which is the call to `snd`. This `snd` is applied to an instance of `(,)`, which
+can't be reduced any further. Thus, all we have to do is take the second
+element of the tuple, and replace `snd` with an indirection to it:
+
+{{< todo >}}Add image!{{< /todo >}}
+
+The second element of the tuple was a call to `(:)`, and that's what the mysterious
+force is processing now. Just like it did before, it starts by looking at the
+first argument of this list, which is head. This argument is a reference to
+the starred node, which, as we've established, eventually points to `2`.
+Another `2` pops up on the console.
+
+Finally, the mysterious force reaches the second argument of the `(:)`,
+which is the empty list. The empty list also cannot be evaluated any
+further, so that's what the mysterious force receives. Just like that,
+there's nothing left to print to the console. The mysterious force ceases,
+and we're left with the following graph:
+
+{{< todo >}}Add image!{{< /todo >}}
+
+As we would have expected, two `2`s are printed to the console.
+
+### Using Time Traveling
+
+{{< todo >}}This whole section {{< /todo >}}
+
+### Beware The Strictness
+
+{{< todo >}}This whole section, too. {{< /todo >}}
+
+### Leftovers
+
+This is
 what allows us to write the code above: the graph of `repMax xs largest`
 effectively refers to itself. While traversing the list, it places references
 to itself in place of each of the elements, and thanks to laziness, these
@ -93,3 +395,4 @@ will be:

 fromList [(1, 1), (3, 2), (9, 1)]
 ```
+