


@ 7,6 +7,7 @@ draft: true 









<style> 




img, figure.small img { maxheight: 20rem; } 




figure.tiny img { maxheight: 15rem; } 




figure.medium img { maxheight: 30rem; } 




</style> 








@ 15,8 +16,7 @@ I recently got to use a very curious Haskell technique 




As production as research code gets, anyway! 




{{< /sidenote >}} time traveling. I say this with 




the utmost seriousness. This technique worked like 




magic for the problem I was trying to solve (which isn't 




interesting enough to be presented here in itself), and so 




magic for the problem I was trying to solve, and so 




I thought I'd share what I learned. In addition 




to the technique and its workings, I will also explain how 




time traveling can be misused, yielding computations that 



@ 74,7 +74,7 @@ value even come from? 









Thus far, nothing too magical has happened. It's a little 




strange to expect the result of the computation to be 




given to us; however, thus far, it looks like wishful 




given to us; it just looks like wishful 




thinking. The real magic happens in Csongor's `doRepMax` 




function: 








@ 100,8 +100,9 @@ Why is it called graph reduction, you may be wondering, if the runtime is 




manipulating syntax trees? To save on work, if a program refers to the 




same value twice, Haskell has both of those references point to the 




exact same graph. This violates the tree's property of having only one path 




from the root to any node, and makes our program a graph. Graphs that 




refer to themselves also violate the properties of a tree. 




from the root to any node, and makes our program a DAG (at least). Graph nodes that 




refer to themselves (which are also possible in the model) also violate the properties of a 




a DAG, and thus, in general, we are working with graphs. 




{{< /sidenote >}} performing 




substitutions and simplifications as necessary until it reaches a final answer. 




What the lazy part means is that parts of the syntax tree that are not yet 



@ 184,7 +185,7 @@ we end up with the following: 









{{< figure src="square_2.png" caption="The graph of `let x = square 5 in x + x` after `square 5` is reduced." >}} 









There are two `25`s in the tree, and no more `square`s! We only 




There are two `25`s in the graph, and no more `square`s! We only 




had to evaluate `square 5` exactly once, even though `(+)` 




will use it twice (once for the left argument, and once for the right). 








@ 207,7 +208,7 @@ fix f = let x = f x in x 




See how the definition of `x` refers to itself? This is what 




it looks like in graph form: 









{{< figure src="fixpoint_1.png" caption="The initial graph of `let x = f x in x`." >}} 




{{< figure src="fixpoint_1.png" caption="The initial graph of `let x = f x in x`." class="tiny" >}} 









I think it's useful to take a look at how this graph is processed. Let's 




pick `f = (1:)`. That is, `f` is a function that takes a list, 



@ 221,7 +222,8 @@ constant `1`, and then to `f`'s argument (`x`, in this case). As 




before, once we evaluated `f x`, we replaced the application with 




an indirection; in the image, this indirection is the top box. But the 




argument, `x`, is itself an indirection which points to the root of `f x`, 




thereby creating a cycle in our graph. 




thereby creating a cycle in our graph. Traversing this graph looks like 




traversing an infinite list of `1`s. 









Almost there! A node can refer to itself, and, when evaluated, it 




is replaced with its own value. Thus, a node can effectively reference 



@ 259,18 +261,16 @@ Now, let's write the initial graph for `doRepMax [1,2]`: 




{{< figure src="repmax_1.png" caption="The initial graph of `doRepMax [1,2]`." >}} 









Other than our new notation, there's nothing too surprising here. 




At a high level, all we want is the second element of the tuple 




The first step of our hypothetical reduction would replace the application of `doRepMax` with its 




body, and create our graph's first cycle. At a high level, all we want is the second element of the tuple 




returned by `repMax`, which contains the output list. To get 




the tuple, we apply `repMax` to the list `[1,2]`, which itself 




the tuple, we apply `repMax` to the list `[1,2]` and the first element 




of its result. The list `[1,2]` itself 




consists of two uses of the `(:)` function. 









The first step 




of our hypothetical reduction would replace the application of `doRepMax` with its 




body, and create our graph's first cycle: 









{{< figure src="repmax_2.png" caption="The first step of reducing `doRepMax [1,2]`." >}} 









Next, we would do the same for the body of `repMax`. In 




Next, we would also expand the body of `repMax`. In 




the following diagram, to avoid drawing a noisy amount of 




crossing lines, I marked the application of `fst` with 




a star, and replaced the two edges to `fst` with 



@ 362,7 +362,7 @@ element of the tuple, and replace `snd` with an indirection to it: 









The second element of the tuple was a call to `(:)`, and that's what the mysterious 




force is processing now. Just like it did before, it starts by looking at the 




first argument of this list, which is head. This argument is a reference to 




first argument of this list, which is the list's head. This argument is a reference to 




the starred node, which, as we've established, eventually points to `2`. 




Another `2` pops up on the console. 








@ 374,32 +374,197 @@ After removing the unused nodes, we are left with the following graph: 









{{< figure src="repmax_10.png" caption="The result of reducing `doRepMax [1,2]`." >}} 









As we would have expected, two `2`s are printed to the console. 




As we would have expected, two `2`s were printed to the console, and our 




final graph represents the list `[2,2]`. 









### Using Time Traveling 




Is time tarveling even useful? I would argue yes, especially 




in cases where Haskell's purity can make certain things 




difficult. 









As a first example, Csongor provides an assembler that works 




in a single pass. The challenge in this case is to resolve 




jumps to code segments occuring _after_ the jump itself; 




in essence, the address of the target code segment needs to be 




known before the segment itself is processed. Csongor's 




code uses the [Tardis monad](https://hackage.haskell.org/package/tardis0.4.1.0/docs/ControlMonadTardis.html), 




which combines regular state, to which you can write and then 




later read from, and future state, from which you can 




read values before your write them. Check out 




[his complete example](https://kcsongor.github.io/timetravelinhaskellfordummies/#asinglepassassembleranexample) here. 









Alternatively, here's an example from my research. I'll be fairly 




vague, since all of this is still in progress. The gist is that 




we have some kind of data structure (say, a list or a tree), 




and we want to associate with each element in this data 




structure a 'score' of how useful it is. There are many possible 




heuristics of picking 'scores'; a very simple one is 




to make it inversely propertional to the number of times 




an element occurs. To be more concrete, suppose 




we have some element type `Element`: 









{{< codelines "Haskell" "timetraveling/ValueScore.hs" 5 6 >}} 









Suppose also that our data structure is a binary tree: 









{{< codelines "Haskell" "timetraveling/ValueScore.hs" 14 16 >}} 









We then want to transform an input `ElementTree`, such as: 









{{< todo >}}This whole section {{< /todo >}} 




```Haskell 




Node A (Node A Empty Empty) Empty 




``` 









### Beware The Strictness 




Into a scored tree, like: 









```Haskell 




Node (A,0.5) (Node (A,0.5) Empty Empty) Empty 




``` 









Since `A` occured twice, its score is `1/2 = 0.5`. 









{{< todo >}}This whole section, too. {{< /todo >}} 




Let's define some utility functions before we get to the 




meat of the implementation: 









### Leftovers 




{{< codelines "Haskell" "timetraveling/ValueScore.hs" 8 12 >}} 









This is 




what allows us to write the code above: the graph of `repMax xs largest` 




effectively refers to itself. While traversing the list, it places references 




to itself in place of each of the elements, and thanks to laziness, these 




references are not evaluated. 




The `addElement` function simply increments the counter for a particular 




element in the map, adding the number `1` if it doesn't exist. The `getScore` 




function computes the score of a particular element, defaulting to `1.0` if 




it's not found in the map. 









Let's try a more complicated example. How about instead of creating a new list, 




we return a `Map` containing the number of times each number occured, but only 




when those numbers were a factor of the maximum numbers. Our expected output 




will be: 




Just as before  noticing that passing around the future values is getting awfully 




bothersome  we write our scoring function as though we have 




a 'future value'. 









{{< codelines "Haskell" "timetraveling/ValueScore.hs" 18 24 >}} 









The actual `doAssignScores` function is pretty much identical to 




`doRepMax`: 









{{< codelines "Haskell" "timetraveling/ValueScore.hs" 26 28 >}} 









There's quite a bit of repetition here, especially in the handling 




of future values  all of our functions now accept an extra 




future argument, and return a workinprogress future value. 




This is what the `Tardis` monad, and its corresponding 




`TardisT` monad transformer, aim to address. Just like the 




`State` monad helps us avoid writing plumbing code for 




forwardtraveling values, `Tardis` helps us do the same 




for backwardtraveling ones. 









#### Cycles in Monadic Bind 









We've seen that we're able to write code like the following: 









```Haskell 




(a, b) = f a c 




``` 




>>> countMaxFactors [1,3,3,9] 









fromList [(1, 1), (3, 2), (9, 1)] 




That is, we were able to write function calls that referenced 




their own return values. What if we try doing this inside 




a `do` block? Say, for example, we want to sprinkle some time 




traveling into our program, but don't want to add a whole new 




transformer into our monad stack. We could write code as follows: 









```Haskell 




do 




(a, b) < f a c 




return b 




``` 









Unfortunately, this doesn't work. However, it's entirely 




possible to enable this using the `RecursiveDo` language 




extension: 









```Haskell 




{# LANGUAGE RecursiveDo #} 




``` 









Then, we can write the above as follows: 









```Haskell 




do 




rec (a, b) < f a c 




return b 




``` 









This power, however, comes at a price. It's not as straightforward 




to build graphs from recursive monadic computations; in fact, 




it's not possible in general. The translation of the above 




code uses `MonadFix`. A monad that satisfies `MonadFix` has 




an operation `mfix`, which is the monadic version of the `fix` 




function we saw earlier: 









```Haskell 




mfix :: Monad m => (a > m a) > m a 




 Regular fix, for comparison 




fix :: (a > a) > a 




``` 









To really understand how the translation works, check out the 




[paper on recursive do notation](http://leventerkok.github.io/papers/recdo.pdf). 









### Beware The Strictness 




Though Csongor points out other problems with the 




time traveling approach, I think he doesn't mention 




an important idea: you have to be _very_ careful about introducing 




strictness into your programs when running timetraveling code. 




For example, suppose we wanted to write a function, 




`takeUntilMax`, which would return the input list, 




cut off after the first occurence of the maximum number. 




Following the same strategy, we come up with: 









{{< codelines "Haskell" "timetraveling/TakeMax.hs" 1 12 >}} 









In short, if we encounter our maximum number, we just return 




a list of that maximum number, since we do not want to recurse 




further. On the other hand, if we encounter a number that's 




_not_ the maximum, we continue our recursion. 









Unfortunately, this doesn't work; our program never terminates. 




You may be thinking: 









> Well, obviously this doesn't work! We didn't actually 




compute the maximum number properly, since we stopped 




recursing too early. We need to traverse the whole list, 




and not just the part before the maximum number. 









To address this, we can reformulate our `takeUntilMax` 




function as follows: 









{{< codelines "Haskell" "timetraveling/TakeMax.hs" 14 21 >}} 









Now we definitely compute the maximum correctly! Alas, 




this doesn't work either. The issue lies on lines 5 and 18, 




more specifically in the comparison `x == m`. Here, we 




are trying to base the decision of what branch to take 




on a future value. This is simply impossible; to compute 




the value, we need to know the value! 









This is no 'silly mistake', either! In complicated programs 




that use time traveling, strictness lurks behind every corner. 




In my research work, I was at one point inserting a data structure into 




a set; however, deep in the structure was a data type containing 




a 'future' value, and using the default `Eq` instance! 




Adding the data structure to a set ended up invoking `(==)` (or perhaps 




some function from the `Ord` typeclass), 




which, in turn, tried to compare the lazily evaluated values. 




My code therefore didn't terminate, much like `takeUntilMax`. 









Debugging time traveling code is, in general, 




a pain. This is especially true since future values don't look any different 




from regular values. You can see it in the type signatures 




of `repMax` and `takeUntilMax`: the maximum number is just an `Int`! 




And yet, trying to see what its value is will kill the entire program. 




As always, remember Brian W. Kernighan's wise words: 









> Debugging is twice as hard as writing the code in the first place. 




Therefore, if you write the code as cleverly as possible, you are, 




by definition, not smart enough to debug it. 









### Conclusion 




This is about it! In a way, time traveling can make code performing 




certain operations more expressive. Furthermore, even if it's not groundbreaking, 




thinking about time traveling is a good exercise to get familiar 




with lazy evaluation in general. I hope you found this useful! 



