Update lazy evaluation post with images and more.

This commit is contained in:
Danila Fedorin 2020-07-30 00:49:35 -07:00
parent 3aa2a6783e
commit 58e6ad9e79
5 changed files with 198 additions and 33 deletions

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

After

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 53 KiB

After

Width:  |  Height:  |  Size: 48 KiB

View File

@ -7,6 +7,7 @@ draft: true
<style> <style>
img, figure.small img { max-height: 20rem; } img, figure.small img { max-height: 20rem; }
figure.tiny img { max-height: 15rem; }
figure.medium img { max-height: 30rem; } figure.medium img { max-height: 30rem; }
</style> </style>
@ -15,8 +16,7 @@ I recently got to use a very curious Haskell technique
As production as research code gets, anyway! As production as research code gets, anyway!
{{< /sidenote >}} time traveling. I say this with {{< /sidenote >}} time traveling. I say this with
the utmost seriousness. This technique worked like the utmost seriousness. This technique worked like
magic for the problem I was trying to solve (which isn't magic for the problem I was trying to solve, and so
interesting enough to be presented here in itself), and so
I thought I'd share what I learned. In addition I thought I'd share what I learned. In addition
to the technique and its workings, I will also explain how to the technique and its workings, I will also explain how
time traveling can be misused, yielding computations that time traveling can be misused, yielding computations that
@ -74,7 +74,7 @@ value even come from?
Thus far, nothing too magical has happened. It's a little Thus far, nothing too magical has happened. It's a little
strange to expect the result of the computation to be strange to expect the result of the computation to be
given to us; however, thus far, it looks like wishful given to us; it just looks like wishful
thinking. The real magic happens in Csongor's `doRepMax` thinking. The real magic happens in Csongor's `doRepMax`
function: function:
@ -100,8 +100,9 @@ Why is it called graph reduction, you may be wondering, if the runtime is
manipulating syntax trees? To save on work, if a program refers to the manipulating syntax trees? To save on work, if a program refers to the
same value twice, Haskell has both of those references point to the same value twice, Haskell has both of those references point to the
exact same graph. This violates the tree's property of having only one path exact same graph. This violates the tree's property of having only one path
from the root to any node, and makes our program a graph. Graphs that from the root to any node, and makes our program a DAG (at least). Graph nodes that
refer to themselves also violate the properties of a tree. refer to themselves (which are also possible in the model) also violate the properties of a
a DAG, and thus, in general, we are working with graphs.
{{< /sidenote >}} performing {{< /sidenote >}} performing
substitutions and simplifications as necessary until it reaches a final answer. substitutions and simplifications as necessary until it reaches a final answer.
What the lazy part means is that parts of the syntax tree that are not yet What the lazy part means is that parts of the syntax tree that are not yet
@ -184,7 +185,7 @@ we end up with the following:
{{< figure src="square_2.png" caption="The graph of `let x = square 5 in x + x` after `square 5` is reduced." >}} {{< figure src="square_2.png" caption="The graph of `let x = square 5 in x + x` after `square 5` is reduced." >}}
There are two `25`s in the tree, and no more `square`s! We only There are two `25`s in the graph, and no more `square`s! We only
had to evaluate `square 5` exactly once, even though `(+)` had to evaluate `square 5` exactly once, even though `(+)`
will use it twice (once for the left argument, and once for the right). will use it twice (once for the left argument, and once for the right).
@ -207,7 +208,7 @@ fix f = let x = f x in x
See how the definition of `x` refers to itself? This is what See how the definition of `x` refers to itself? This is what
it looks like in graph form: it looks like in graph form:
{{< figure src="fixpoint_1.png" caption="The initial graph of `let x = f x in x`." >}} {{< figure src="fixpoint_1.png" caption="The initial graph of `let x = f x in x`." class="tiny" >}}
I think it's useful to take a look at how this graph is processed. Let's I think it's useful to take a look at how this graph is processed. Let's
pick `f = (1:)`. That is, `f` is a function that takes a list, pick `f = (1:)`. That is, `f` is a function that takes a list,
@ -221,7 +222,8 @@ constant `1`, and then to `f`'s argument (`x`, in this case). As
before, once we evaluated `f x`, we replaced the application with before, once we evaluated `f x`, we replaced the application with
an indirection; in the image, this indirection is the top box. But the an indirection; in the image, this indirection is the top box. But the
argument, `x`, is itself an indirection which points to the root of `f x`, argument, `x`, is itself an indirection which points to the root of `f x`,
thereby creating a cycle in our graph. thereby creating a cycle in our graph. Traversing this graph looks like
traversing an infinite list of `1`s.
Almost there! A node can refer to itself, and, when evaluated, it Almost there! A node can refer to itself, and, when evaluated, it
is replaced with its own value. Thus, a node can effectively reference is replaced with its own value. Thus, a node can effectively reference
@ -259,18 +261,16 @@ Now, let's write the initial graph for `doRepMax [1,2]`:
{{< figure src="repmax_1.png" caption="The initial graph of `doRepMax [1,2]`." >}} {{< figure src="repmax_1.png" caption="The initial graph of `doRepMax [1,2]`." >}}
Other than our new notation, there's nothing too surprising here. Other than our new notation, there's nothing too surprising here.
At a high level, all we want is the second element of the tuple The first step of our hypothetical reduction would replace the application of `doRepMax` with its
body, and create our graph's first cycle. At a high level, all we want is the second element of the tuple
returned by `repMax`, which contains the output list. To get returned by `repMax`, which contains the output list. To get
the tuple, we apply `repMax` to the list `[1,2]`, which itself the tuple, we apply `repMax` to the list `[1,2]` and the first element
of its result. The list `[1,2]` itself
consists of two uses of the `(:)` function. consists of two uses of the `(:)` function.
The first step
of our hypothetical reduction would replace the application of `doRepMax` with its
body, and create our graph's first cycle:
{{< figure src="repmax_2.png" caption="The first step of reducing `doRepMax [1,2]`." >}} {{< figure src="repmax_2.png" caption="The first step of reducing `doRepMax [1,2]`." >}}
Next, we would do the same for the body of `repMax`. In Next, we would also expand the body of `repMax`. In
the following diagram, to avoid drawing a noisy amount of the following diagram, to avoid drawing a noisy amount of
crossing lines, I marked the application of `fst` with crossing lines, I marked the application of `fst` with
a star, and replaced the two edges to `fst` with a star, and replaced the two edges to `fst` with
@ -362,7 +362,7 @@ element of the tuple, and replace `snd` with an indirection to it:
The second element of the tuple was a call to `(:)`, and that's what the mysterious The second element of the tuple was a call to `(:)`, and that's what the mysterious
force is processing now. Just like it did before, it starts by looking at the force is processing now. Just like it did before, it starts by looking at the
first argument of this list, which is head. This argument is a reference to first argument of this list, which is the list's head. This argument is a reference to
the starred node, which, as we've established, eventually points to `2`. the starred node, which, as we've established, eventually points to `2`.
Another `2` pops up on the console. Another `2` pops up on the console.
@ -374,32 +374,197 @@ After removing the unused nodes, we are left with the following graph:
{{< figure src="repmax_10.png" caption="The result of reducing `doRepMax [1,2]`." >}} {{< figure src="repmax_10.png" caption="The result of reducing `doRepMax [1,2]`." >}}
As we would have expected, two `2`s are printed to the console. As we would have expected, two `2`s were printed to the console, and our
final graph represents the list `[2,2]`.
### Using Time Traveling ### Using Time Traveling
Is time tarveling even useful? I would argue yes, especially
in cases where Haskell's purity can make certain things
difficult.
{{< todo >}}This whole section {{< /todo >}} As a first example, Csongor provides an assembler that works
in a single pass. The challenge in this case is to resolve
jumps to code segments occuring _after_ the jump itself;
in essence, the address of the target code segment needs to be
known before the segment itself is processed. Csongor's
code uses the [Tardis monad](https://hackage.haskell.org/package/tardis-0.4.1.0/docs/Control-Monad-Tardis.html),
which combines regular state, to which you can write and then
later read from, and future state, from which you can
read values before your write them. Check out
[his complete example](https://kcsongor.github.io/time-travel-in-haskell-for-dummies/#a-single-pass-assembler-an-example) here.
Alternatively, here's an example from my research. I'll be fairly
vague, since all of this is still in progress. The gist is that
we have some kind of data structure (say, a list or a tree),
and we want to associate with each element in this data
structure a 'score' of how useful it is. There are many possible
heuristics of picking 'scores'; a very simple one is
to make it inversely propertional to the number of times
an element occurs. To be more concrete, suppose
we have some element type `Element`:
{{< codelines "Haskell" "time-traveling/ValueScore.hs" 5 6 >}}
Suppose also that our data structure is a binary tree:
{{< codelines "Haskell" "time-traveling/ValueScore.hs" 14 16 >}}
We then want to transform an input `ElementTree`, such as:
```Haskell
Node A (Node A Empty Empty) Empty
```
Into a scored tree, like:
```Haskell
Node (A,0.5) (Node (A,0.5) Empty Empty) Empty
```
Since `A` occured twice, its score is `1/2 = 0.5`.
Let's define some utility functions before we get to the
meat of the implementation:
{{< codelines "Haskell" "time-traveling/ValueScore.hs" 8 12 >}}
The `addElement` function simply increments the counter for a particular
element in the map, adding the number `1` if it doesn't exist. The `getScore`
function computes the score of a particular element, defaulting to `1.0` if
it's not found in the map.
Just as before -- noticing that passing around the future values is getting awfully
bothersome -- we write our scoring function as though we have
a 'future value'.
{{< codelines "Haskell" "time-traveling/ValueScore.hs" 18 24 >}}
The actual `doAssignScores` function is pretty much identical to
`doRepMax`:
{{< codelines "Haskell" "time-traveling/ValueScore.hs" 26 28 >}}
There's quite a bit of repetition here, especially in the handling
of future values - all of our functions now accept an extra
future argument, and return a work-in-progress future value.
This is what the `Tardis` monad, and its corresponding
`TardisT` monad transformer, aim to address. Just like the
`State` monad helps us avoid writing plumbing code for
forward-traveling values, `Tardis` helps us do the same
for backward-traveling ones.
#### Cycles in Monadic Bind
We've seen that we're able to write code like the following:
```Haskell
(a, b) = f a c
```
That is, we were able to write function calls that referenced
their own return values. What if we try doing this inside
a `do` block? Say, for example, we want to sprinkle some time
traveling into our program, but don't want to add a whole new
transformer into our monad stack. We could write code as follows:
```Haskell
do
(a, b) <- f a c
return b
```
Unfortunately, this doesn't work. However, it's entirely
possible to enable this using the `RecursiveDo` language
extension:
```Haskell
{-# LANGUAGE RecursiveDo #-}
```
Then, we can write the above as follows:
```Haskell
do
rec (a, b) <- f a c
return b
```
This power, however, comes at a price. It's not as straightforward
to build graphs from recursive monadic computations; in fact,
it's not possible in general. The translation of the above
code uses `MonadFix`. A monad that satisfies `MonadFix` has
an operation `mfix`, which is the monadic version of the `fix`
function we saw earlier:
```Haskell
mfix :: Monad m => (a -> m a) -> m a
-- Regular fix, for comparison
fix :: (a -> a) -> a
```
To really understand how the translation works, check out the
[paper on recursive do notation](http://leventerkok.github.io/papers/recdo.pdf).
### Beware The Strictness ### Beware The Strictness
Though Csongor points out other problems with the
time traveling approach, I think he doesn't mention
an important idea: you have to be _very_ careful about introducing
strictness into your programs when running time-traveling code.
For example, suppose we wanted to write a function,
`takeUntilMax`, which would return the input list,
cut off after the first occurence of the maximum number.
Following the same strategy, we come up with:
{{< todo >}}This whole section, too. {{< /todo >}} {{< codelines "Haskell" "time-traveling/TakeMax.hs" 1 12 >}}
### Leftovers In short, if we encounter our maximum number, we just return
a list of that maximum number, since we do not want to recurse
further. On the other hand, if we encounter a number that's
_not_ the maximum, we continue our recursion.
This is Unfortunately, this doesn't work; our program never terminates.
what allows us to write the code above: the graph of `repMax xs largest` You may be thinking:
effectively refers to itself. While traversing the list, it places references
to itself in place of each of the elements, and thanks to laziness, these
references are not evaluated.
Let's try a more complicated example. How about instead of creating a new list, > Well, obviously this doesn't work! We didn't actually
we return a `Map` containing the number of times each number occured, but only compute the maximum number properly, since we stopped
when those numbers were a factor of the maximum numbers. Our expected output recursing too early. We need to traverse the whole list,
will be: and not just the part before the maximum number.
``` To address this, we can reformulate our `takeUntilMax`
>>> countMaxFactors [1,3,3,9] function as follows:
fromList [(1, 1), (3, 2), (9, 1)] {{< codelines "Haskell" "time-traveling/TakeMax.hs" 14 21 >}}
```
Now we definitely compute the maximum correctly! Alas,
this doesn't work either. The issue lies on lines 5 and 18,
more specifically in the comparison `x == m`. Here, we
are trying to base the decision of what branch to take
on a future value. This is simply impossible; to compute
the value, we need to know the value!
This is no 'silly mistake', either! In complicated programs
that use time traveling, strictness lurks behind every corner.
In my research work, I was at one point inserting a data structure into
a set; however, deep in the structure was a data type containing
a 'future' value, and using the default `Eq` instance!
Adding the data structure to a set ended up invoking `(==)` (or perhaps
some function from the `Ord` typeclass),
which, in turn, tried to compare the lazily evaluated values.
My code therefore didn't terminate, much like `takeUntilMax`.
Debugging time traveling code is, in general,
a pain. This is especially true since future values don't look any different
from regular values. You can see it in the type signatures
of `repMax` and `takeUntilMax`: the maximum number is just an `Int`!
And yet, trying to see what its value is will kill the entire program.
As always, remember Brian W. Kernighan's wise words:
> Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it.
### Conclusion
This is about it! In a way, time traveling can make code performing
certain operations more expressive. Furthermore, even if it's not groundbreaking,
thinking about time traveling is a good exercise to get familiar
with lazy evaluation in general. I hope you found this useful!

Binary file not shown.

Before

Width:  |  Height:  |  Size: 74 KiB

After

Width:  |  Height:  |  Size: 72 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 45 KiB

After

Width:  |  Height:  |  Size: 45 KiB