diff --git a/content/blog/haskell_lazy_evaluation/fixpoint_1.png b/content/blog/haskell_lazy_evaluation/fixpoint_1.png index 6963e81..bb1eb72 100644 Binary files a/content/blog/haskell_lazy_evaluation/fixpoint_1.png and b/content/blog/haskell_lazy_evaluation/fixpoint_1.png differ diff --git a/content/blog/haskell_lazy_evaluation/fixpoint_2.png b/content/blog/haskell_lazy_evaluation/fixpoint_2.png index 1868f2b..3adcc39 100644 Binary files a/content/blog/haskell_lazy_evaluation/fixpoint_2.png and b/content/blog/haskell_lazy_evaluation/fixpoint_2.png differ diff --git a/content/blog/haskell_lazy_evaluation/index.md b/content/blog/haskell_lazy_evaluation/index.md index e760a53..0365361 100644 --- a/content/blog/haskell_lazy_evaluation/index.md +++ b/content/blog/haskell_lazy_evaluation/index.md @@ -7,6 +7,7 @@ draft: true @@ -15,8 +16,7 @@ I recently got to use a very curious Haskell technique As production as research code gets, anyway! {{< /sidenote >}} time traveling. I say this with the utmost seriousness. This technique worked like -magic for the problem I was trying to solve (which isn't -interesting enough to be presented here in itself), and so +magic for the problem I was trying to solve, and so I thought I'd share what I learned. In addition to the technique and its workings, I will also explain how time traveling can be misused, yielding computations that @@ -74,7 +74,7 @@ value even come from? Thus far, nothing too magical has happened. It's a little strange to expect the result of the computation to be -given to us; however, thus far, it looks like wishful +given to us; it just looks like wishful thinking. The real magic happens in Csongor's `doRepMax` function: @@ -100,8 +100,9 @@ Why is it called graph reduction, you may be wondering, if the runtime is manipulating syntax trees? To save on work, if a program refers to the same value twice, Haskell has both of those references point to the exact same graph. This violates the tree's property of having only one path -from the root to any node, and makes our program a graph. Graphs that -refer to themselves also violate the properties of a tree. +from the root to any node, and makes our program a DAG (at least). Graph nodes that +refer to themselves (which are also possible in the model) also violate the properties of a +a DAG, and thus, in general, we are working with graphs. {{< /sidenote >}} performing substitutions and simplifications as necessary until it reaches a final answer. What the lazy part means is that parts of the syntax tree that are not yet @@ -184,7 +185,7 @@ we end up with the following: {{< figure src="square_2.png" caption="The graph of `let x = square 5 in x + x` after `square 5` is reduced." >}} -There are two `25`s in the tree, and no more `square`s! We only +There are two `25`s in the graph, and no more `square`s! We only had to evaluate `square 5` exactly once, even though `(+)` will use it twice (once for the left argument, and once for the right). @@ -207,7 +208,7 @@ fix f = let x = f x in x See how the definition of `x` refers to itself? This is what it looks like in graph form: -{{< figure src="fixpoint_1.png" caption="The initial graph of `let x = f x in x`." >}} +{{< figure src="fixpoint_1.png" caption="The initial graph of `let x = f x in x`." class="tiny" >}} I think it's useful to take a look at how this graph is processed. Let's pick `f = (1:)`. That is, `f` is a function that takes a list, @@ -221,7 +222,8 @@ constant `1`, and then to `f`'s argument (`x`, in this case). As before, once we evaluated `f x`, we replaced the application with an indirection; in the image, this indirection is the top box. But the argument, `x`, is itself an indirection which points to the root of `f x`, -thereby creating a cycle in our graph. +thereby creating a cycle in our graph. Traversing this graph looks like +traversing an infinite list of `1`s. Almost there! A node can refer to itself, and, when evaluated, it is replaced with its own value. Thus, a node can effectively reference @@ -259,18 +261,16 @@ Now, let's write the initial graph for `doRepMax [1,2]`: {{< figure src="repmax_1.png" caption="The initial graph of `doRepMax [1,2]`." >}} Other than our new notation, there's nothing too surprising here. -At a high level, all we want is the second element of the tuple +The first step of our hypothetical reduction would replace the application of `doRepMax` with its +body, and create our graph's first cycle. At a high level, all we want is the second element of the tuple returned by `repMax`, which contains the output list. To get -the tuple, we apply `repMax` to the list `[1,2]`, which itself +the tuple, we apply `repMax` to the list `[1,2]` and the first element +of its result. The list `[1,2]` itself consists of two uses of the `(:)` function. -The first step -of our hypothetical reduction would replace the application of `doRepMax` with its -body, and create our graph's first cycle: - {{< figure src="repmax_2.png" caption="The first step of reducing `doRepMax [1,2]`." >}} -Next, we would do the same for the body of `repMax`. In +Next, we would also expand the body of `repMax`. In the following diagram, to avoid drawing a noisy amount of crossing lines, I marked the application of `fst` with a star, and replaced the two edges to `fst` with @@ -362,7 +362,7 @@ element of the tuple, and replace `snd` with an indirection to it: The second element of the tuple was a call to `(:)`, and that's what the mysterious force is processing now. Just like it did before, it starts by looking at the -first argument of this list, which is head. This argument is a reference to +first argument of this list, which is the list's head. This argument is a reference to the starred node, which, as we've established, eventually points to `2`. Another `2` pops up on the console. @@ -374,32 +374,197 @@ After removing the unused nodes, we are left with the following graph: {{< figure src="repmax_10.png" caption="The result of reducing `doRepMax [1,2]`." >}} -As we would have expected, two `2`s are printed to the console. +As we would have expected, two `2`s were printed to the console, and our +final graph represents the list `[2,2]`. ### Using Time Traveling +Is time tarveling even useful? I would argue yes, especially +in cases where Haskell's purity can make certain things +difficult. -{{< todo >}}This whole section {{< /todo >}} +As a first example, Csongor provides an assembler that works +in a single pass. The challenge in this case is to resolve +jumps to code segments occuring _after_ the jump itself; +in essence, the address of the target code segment needs to be +known before the segment itself is processed. Csongor's +code uses the [Tardis monad](https://hackage.haskell.org/package/tardis-0.4.1.0/docs/Control-Monad-Tardis.html), +which combines regular state, to which you can write and then +later read from, and future state, from which you can +read values before your write them. Check out +[his complete example](https://kcsongor.github.io/time-travel-in-haskell-for-dummies/#a-single-pass-assembler-an-example) here. + +Alternatively, here's an example from my research. I'll be fairly +vague, since all of this is still in progress. The gist is that +we have some kind of data structure (say, a list or a tree), +and we want to associate with each element in this data +structure a 'score' of how useful it is. There are many possible +heuristics of picking 'scores'; a very simple one is +to make it inversely propertional to the number of times +an element occurs. To be more concrete, suppose +we have some element type `Element`: + +{{< codelines "Haskell" "time-traveling/ValueScore.hs" 5 6 >}} + +Suppose also that our data structure is a binary tree: + +{{< codelines "Haskell" "time-traveling/ValueScore.hs" 14 16 >}} + +We then want to transform an input `ElementTree`, such as: + +```Haskell +Node A (Node A Empty Empty) Empty +``` + +Into a scored tree, like: + +```Haskell +Node (A,0.5) (Node (A,0.5) Empty Empty) Empty +``` + +Since `A` occured twice, its score is `1/2 = 0.5`. + +Let's define some utility functions before we get to the +meat of the implementation: + +{{< codelines "Haskell" "time-traveling/ValueScore.hs" 8 12 >}} + +The `addElement` function simply increments the counter for a particular +element in the map, adding the number `1` if it doesn't exist. The `getScore` +function computes the score of a particular element, defaulting to `1.0` if +it's not found in the map. + +Just as before -- noticing that passing around the future values is getting awfully +bothersome -- we write our scoring function as though we have +a 'future value'. + +{{< codelines "Haskell" "time-traveling/ValueScore.hs" 18 24 >}} + +The actual `doAssignScores` function is pretty much identical to +`doRepMax`: + +{{< codelines "Haskell" "time-traveling/ValueScore.hs" 26 28 >}} + +There's quite a bit of repetition here, especially in the handling +of future values - all of our functions now accept an extra +future argument, and return a work-in-progress future value. +This is what the `Tardis` monad, and its corresponding +`TardisT` monad transformer, aim to address. Just like the +`State` monad helps us avoid writing plumbing code for +forward-traveling values, `Tardis` helps us do the same +for backward-traveling ones. + +#### Cycles in Monadic Bind + +We've seen that we're able to write code like the following: + +```Haskell +(a, b) = f a c +``` + +That is, we were able to write function calls that referenced +their own return values. What if we try doing this inside +a `do` block? Say, for example, we want to sprinkle some time +traveling into our program, but don't want to add a whole new +transformer into our monad stack. We could write code as follows: + +```Haskell +do + (a, b) <- f a c + return b +``` + +Unfortunately, this doesn't work. However, it's entirely +possible to enable this using the `RecursiveDo` language +extension: + +```Haskell +{-# LANGUAGE RecursiveDo #-} +``` + +Then, we can write the above as follows: + +```Haskell +do + rec (a, b) <- f a c + return b +``` + +This power, however, comes at a price. It's not as straightforward +to build graphs from recursive monadic computations; in fact, +it's not possible in general. The translation of the above +code uses `MonadFix`. A monad that satisfies `MonadFix` has +an operation `mfix`, which is the monadic version of the `fix` +function we saw earlier: + +```Haskell +mfix :: Monad m => (a -> m a) -> m a +-- Regular fix, for comparison +fix :: (a -> a) -> a +``` + +To really understand how the translation works, check out the +[paper on recursive do notation](http://leventerkok.github.io/papers/recdo.pdf). ### Beware The Strictness +Though Csongor points out other problems with the +time traveling approach, I think he doesn't mention +an important idea: you have to be _very_ careful about introducing +strictness into your programs when running time-traveling code. +For example, suppose we wanted to write a function, +`takeUntilMax`, which would return the input list, +cut off after the first occurence of the maximum number. +Following the same strategy, we come up with: -{{< todo >}}This whole section, too. {{< /todo >}} +{{< codelines "Haskell" "time-traveling/TakeMax.hs" 1 12 >}} -### Leftovers +In short, if we encounter our maximum number, we just return +a list of that maximum number, since we do not want to recurse +further. On the other hand, if we encounter a number that's +_not_ the maximum, we continue our recursion. -This is -what allows us to write the code above: the graph of `repMax xs largest` -effectively refers to itself. While traversing the list, it places references -to itself in place of each of the elements, and thanks to laziness, these -references are not evaluated. +Unfortunately, this doesn't work; our program never terminates. +You may be thinking: -Let's try a more complicated example. How about instead of creating a new list, -we return a `Map` containing the number of times each number occured, but only -when those numbers were a factor of the maximum numbers. Our expected output -will be: +> Well, obviously this doesn't work! We didn't actually +compute the maximum number properly, since we stopped +recursing too early. We need to traverse the whole list, +and not just the part before the maximum number. -``` ->>> countMaxFactors [1,3,3,9] +To address this, we can reformulate our `takeUntilMax` +function as follows: -fromList [(1, 1), (3, 2), (9, 1)] -``` +{{< codelines "Haskell" "time-traveling/TakeMax.hs" 14 21 >}} +Now we definitely compute the maximum correctly! Alas, +this doesn't work either. The issue lies on lines 5 and 18, +more specifically in the comparison `x == m`. Here, we +are trying to base the decision of what branch to take +on a future value. This is simply impossible; to compute +the value, we need to know the value! + +This is no 'silly mistake', either! In complicated programs +that use time traveling, strictness lurks behind every corner. +In my research work, I was at one point inserting a data structure into +a set; however, deep in the structure was a data type containing +a 'future' value, and using the default `Eq` instance! +Adding the data structure to a set ended up invoking `(==)` (or perhaps +some function from the `Ord` typeclass), +which, in turn, tried to compare the lazily evaluated values. +My code therefore didn't terminate, much like `takeUntilMax`. + +Debugging time traveling code is, in general, +a pain. This is especially true since future values don't look any different +from regular values. You can see it in the type signatures +of `repMax` and `takeUntilMax`: the maximum number is just an `Int`! +And yet, trying to see what its value is will kill the entire program. +As always, remember Brian W. Kernighan's wise words: + +> Debugging is twice as hard as writing the code in the first place. +Therefore, if you write the code as cleverly as possible, you are, +by definition, not smart enough to debug it. + +### Conclusion +This is about it! In a way, time traveling can make code performing +certain operations more expressive. Furthermore, even if it's not groundbreaking, +thinking about time traveling is a good exercise to get familiar +with lazy evaluation in general. I hope you found this useful! diff --git a/content/blog/haskell_lazy_evaluation/length_2.png b/content/blog/haskell_lazy_evaluation/length_2.png index b269141..6875db4 100644 Binary files a/content/blog/haskell_lazy_evaluation/length_2.png and b/content/blog/haskell_lazy_evaluation/length_2.png differ diff --git a/content/blog/haskell_lazy_evaluation/square_2.png b/content/blog/haskell_lazy_evaluation/square_2.png index d64274a..1f7e135 100644 Binary files a/content/blog/haskell_lazy_evaluation/square_2.png and b/content/blog/haskell_lazy_evaluation/square_2.png differ