Add first draft of Homework 3 (CS325)

2020-01-03 21:09:15 -08:00
parent d9544398b9
commit a026e67a3b
1 changed files with 295 additions and 0 deletions
--- a/content/blog/02_cs325_languages_hw3.md
+++ b/content/blog/02_cs325_languages_hw3.md
@@ -0,0 +1,295 @@
+---
+title: A Language for an Assignment - Homework 3
+date: 2020-01-02T22:17:43-08:00
+tags: ["Haskell", "Python", "Algorithms"]
+draft: true
+---
+
+It rained in Sunriver on New Year's Eve, and it continued to rain
+for the next couple of days. So, instead of going skiing as planned,
+to the dismay of my family and friends, I spent the majority of
+those days working on the third language for homework 3. It
+was quite the language, too - the homework has three problems, each of
+which has a solution independent of the others. I invite you
+to join me in my descent into madness as we construct another language.
+
+### Homework 3
+Let's take a look at the three homework problems. The first two are
+related, but are solved using a different technique:
+
+{{< codelines "text" "cs325-langs/hws/hw3.txt" 18 30 >}}
+
+This problem requires us to find the `k` numbers closest to some
+query (which I will call `n`) from a list `xs`. The list isn't sorted, and the
+problem must run in linear time. Sorting the list would require
+the standard
+{{< sidenote "right" "n-note" "\(O(n\log n)\) time." >}}
+The \(n\) in this expression is not the same as the query <code>n</code>,
+but rather the length of the list. In fact, I have not yet assigned
+the length of the input <code>xs</code> to any variable. If we say that
+\(m\) is a number that denotes that length, the proper expression
+for the complexity is \(O(m \log m)\).
+{{< /sidenote >}} Thus, we have to take another route, which should
+already be familiar: quickselect. Using quickselect, we can find the `k`th
+closest number, and then collect all the numbers that are closer than the `kth`
+closest number. So, we need a language that:
+
+* Supports quickselect (and thus, list partitioning and recursion).
+* Supports iteration, {{< sidenote "left" "iteration-note" "multiple times." >}}
+Why would we need to iterate multiple times? Note that we could have a list
+of numbers that are all the same, <code>[1,1,1,1,1]</code>. Then, we'll need
+to know how many of the numbers <em>equally close</em> as the <code>k</code>th
+element we need to include, which will require another pass through the list.
+{{< /sidenote >}}
+
+That's a good start. Let's take a look at the second problem:
+
+{{< codelines "text" "cs325-langs/hws/hw3.txt" 33 47 >}}
+
+This problem really is easier. We have to find the position of _the_ closest
+element, and then try expand towards either the left or right, depending on
+which end is better. This expansion will take several steps, and will
+likely require a way to "look" at a given part of the list. So let's add two more
+rules. We need a language that also:
+
+* Supports looping control flow, such as `while`.
+* {{< sidenote "right" "view-note" "Allows for a \"view\" into the list" >}}
+We could, of course, simply use list indexing. But then, we'd just be making
+a simple imperative language, and that's boring. So let's play around
+with our design a little, and experimentally add such a "list view" component.
+{{< /sidenote >}}
+(like an abstraction over indexing).
+
+This is shaping up to be a fun language. Let's take a look at the last problem:
+{{< codelines "text" "cs325-langs/hws/hw3.txt" 50 64 >}}
+
+This problem requires more iterations of a list. We have several
+{{< sidenote "right" "cursor-note" "\"cursors\"" >}}
+I always make the language before I write the post, since a lot of
+design decisions change mid-implementation. I realize now that
+"cursors" would've been a better name for this language feature,
+but alas, it is too late.
+{{< /sidenote >}} looking into the list, and depending if the values
+at each of the cursors add up, we do or do not add a new tuple to a list. So,
+two more requirements:
+
+* The "cursors" must be able to interact.
+* The language can represent {{< sidenote "left" "tuple-note" "tuples." >}}
+We could, of course, hack some other way to return a list of tuples, but
+it turns out tuples are pretty simple to implement, and help make for nicer
+programming in our language.
+{{< /sidenote >}}
+
+I think we've gathered what we want from the homework. Let's move on to the
+language!
+
+### A Language
+As is now usual, let's envision a solution to the problems in our language. There
+are actually quite a lot of functions to look at, so let's see them one by one.
+First, let's look at `qselect`.
+
+{{< codelines "text" "cs325-langs/sols/hw3.lang" 1 19 >}}
+
+After the early return, the first interesting part of the language is the
+use of what I have decided to call a __list traverser__. The list
+traverser is a __generalization of a list index__. Whenever we use a list
+index variable, we generally use the following operations:
+
+* __Initialize__: we set the list index to some initial value, such as 0.
+* __Step__: If we're walking the list from left to right, we increment the index.
+If we're walking the list from right to left, we decrement the index.
+* __Validity Check__: We check if the index is still valid (that is, we haven't
+gone past the edge of the list).
+* __Access__: Get the element the cursor is pointing to.
+
+A {{< sidenote "right" "cpp-note" "traverser declaration" >}}
+A fun fact is that we've just rediscovered C++
+<a href="http://www.cplusplus.com/reference/iterator/">iterators</a>. C++
+containers and their iterators provide us with the operations I described:
+
+We can initialize an iterator like <code>auto it = list.begin()</code>. We
+can step the iterator using <code>it++</code>. We can check its validity
+using <code>it != list.end()</code>, and access what it's pointing to using
+<code>*it</code>. While C++ uses templates and inheritance for this,
+we define a language feature specifically for lists.
+
+{{< /sidenote >}} describes these operations. The declartion for the `bisector`
+traverser creates a "cursor" over the list `xs`, that goes between the 0th
+and last elements of `xs`. The declaration for the `pivot` traverser creates
+a "cursor" over the list `xs` that jumps around random locations in the list.
+
+The next interesting part of the language is a __traverser macro__. This thing,
+that looks like a function call (but isn't), performs an operation on the
+cursor. For instance, `pop!` removes the element at the cursor from the list,
+whereas `bisect!` categorizes the remaining elements in the cursor's list
+into two lists, using a boolean-returning lambda (written in Java syntax).
+
+Note that this implementation of `qselect` takes a function `c`, which it
+uses to judge the actual value of the number. This is because our `qselect`
+won't be finding _the_ smallest number, but the number with the smallest difference
+with `n`. `n` will be factored in via the function.
+
+Next up, let's take a look at the function that uses `qselect`, `closestUnsorted`:
+
+{{< codelines "text" "cs325-langs/sols/hw3.lang" 21 46 >}}
+
+Like we discussed, it finds the `k`th closest element (calling it `min`),
+and counts how many elements that are __equal__ need to be included,
+by setting the number to `k` at first, and subtracting 1 for every number
+it encounters that's closer than `min`. Notice that we use the `valid!` and
+`step!` macros, which implement the opertions we described above. Notice
+that the user doesn't deal with adding and subtracting numbers, and doing
+comparisons. All they have to do is ask "am I still good to iterate?"
+
+Next, let's take a look at `closestSorted`, which will require more
+traverser macros.
+
+{{< codelines "text" "cs325-langs/sols/hw3.lang" 48 70 >}}
+
+The first new macro is `canstep!`. This macro just verifies that
+the traverser can make another step. We need this for the "reverse" iterator,
+which indicates the lower bound of the range of numbers we want to return,
+because `subset!` (which itself is just Python's slice, like `xs[a:b]`), uses an inclusive bottom
+index, and thus, we can't afford to step it before knowing that we can, and that
+it's a better choice after the step.
+
+Similarly, we have the `at!(t, i)` macro, which looks at the
+traverser `t`, with offset `i`.
+
+We have two loops. The first loop runs as long as we can expand the range in both
+directions, and picks the better direction at each iteration. The second loop
+runs as long as we still want more numbers, but have already hit the edge
+of the list on the left or on the right.
+
+Finally, let's look at the solution to `xyz`:
+
+{{< codelines "text" "cs325-langs/sols/hw3.lang" 72 95 >}}
+
+I won't go in depth, but notice that the expression in the `span` part
+of the `traverser` declaration can access another traverser. We treat
+as a feature the fact that this expression isn't immediately evaluated at the place
+of the traverser declaration. Rather, every time that a comparison for a traverser
+operation is performed, this expression is re-evaluated. This allows us to put
+dynamic bounds on traversers `y` and `z`, one of which must not exceed the other.
+
+This is more than enough to work with. Let's move on to the implementation.
+
+#### Implementation
+Again, let's not go too far into the details of implementing the language from scratch.
+Instead, let's take a look into specific parts of the language that deserve attention.
+
+##### Revenge of the State Monad
+Our previous language was, indeed, a respite from complexity. Translation was
+straightforward, and the resulting expressions and statements were plugged straight
+into a handwritten AST. We cannot get away with this here; the language is powerful
+enough to implement three list-based problems, which comes at the cost of increased 
+complexity.
+
+We need, once again, to generate temporary variables. We also need to keep track of
+which variables are traversers, and the properties of these traversers, throughout
+each function of the language. We thus fall back to using `Control.Monad.State`:
+
+{{< todo >}}Code for Translator Monad{{< /todo >}}
+
+There's one part of the state tuple that we haven't yet explained: the list of
+statements.
+
+##### Generating Statements
+Recall that our translation function for expressions in the first homework had the type:
+
+```Haskell
+translateExpr :: Expr -> Translator ([Py.PyStmt], Py.PyExpr)
+```
+
+We then had to use `do`-notation, and explicitly concatenate lists
+of emitted statements. In this language, I took an alternative route: I made
+the statements part of the state. They are thus implicitly generated and
+stored in the monad, and expression generators don't have to worry about
+concatenating them. When the program is ready to use the generated statements
+(say, when an `if`-statement needs to use the statements emitted by the condition
+expression), we retrieve them from the monad:
+
+{{< todo >}}Code for getting statements{{< /todo >}}
+
+##### Validating Traverser Declarations
+We declare two separate types that hold traverser data. The first is a kind of "draft"
+type, `TraverserData`. This record holds all possible configurations of a traverser
+that occur as the program is iterating through the various `key: value` pairs in
+the declaration. For instance, at the very beginning of processing a traverser declaration,
+our program will use a "default" `TraverserData`, with all fields set to `Nothing` or
+their default value. This value will then be modified by the first key/value pair,
+changing, for instance, the list that the traverser operates on. This new modified
+`TraverserData` will then be modified by the next key/value pair, and so on. This
+is, effectively, a fold operation.
+
+{{< todo >}}Code for TraverserData{{< /todo >}}
+{{< todo >}}Maybe sidenote about fold?{{< /todo >}}
+
+The data may not have all the required fields until the very end, and its type
+reflects that: `Maybe String` here, `Maybe TraverserBounds` there. We don't
+want to deal with unwrapping the `Maybe a` values every time we use the traverser,
+especially if we've done so before. So, we define a `ValidTraverserData` record,
+that does not have `Maybe` arguments, and thus, has all the required data. At the
+end of a traverser declaration, we attempt to translate a `TraverserData` into
+a `ValidTraverserData`, invoking `fail` if we can't, and storing the `ValidTraverserData`
+into the state otherwise. Then, every time we retrieve a traverser from the state,
+it's guaranteed to be valid, and we have to spend no extra work unpacking it. We
+define a lookup monadic operation like this:
+
+{{< todo >}}Code for getting ValidTraverserData{{< /todo >}}
+
+##### Compiling Macros
+I didn't call them macros for no reason. Clearly, we don't want to generate
+code that
+{{< sidenote "right" "increment-note" "calls functions only to increment an index." >}}
+In fact, there's no easy way to do this at all. Python's integers (if we choose to
+represent our traversers using integers), are immutable. Furthermore, unlike C++,
+where passing by reference allows a function to change its parameters "outside"
+the call, Python offers no way to reassign a different value to a variable given
+to a function.
+<br><br>
+For an example use of C++'s pass-by-reference mechanic, consider <code>std::swap</code>:
+it's a function, but it modifies the two variables given to it. There's no
+way to generically implement such a function in Python.
+{{< /sidenote >}} We also can't allow arbitrary expressions to serve as traversers:
+our translator keeps some context about which variables are traversers, what their
+bounds are, and how they behave. Thus, __calls to traverser macros are very much macros__:
+they operate on AST nodes, and __require__ that their first argument is a variable,
+named like the traverser. We use the `requireTraverser` monadic operation
+to get the traverser associated with the given variable name, and then perform
+the operation as intended. The `at!(t)` operation is straightforward:
+
+{{< todo >}}Code for at!{{< /todo >}}
+
+The `at!(t,i)` is less so, since it deals with the intricacies of accessing
+the list at either a positive of negative offset, depending on the direction
+of the traverser. We implement a function to properly generate an expression for the offset:
+
+{{< todo >}}Code for traverserIncrement{{< /todo >}}
+
+We then implement `at!(t,i)` as follows:
+
+{{< todo >}}Code for at!{{< /todo >}}
+
+The most complicated macro is `bisect!`. It must be able to step the traverser,
+and also return a tuple of two lists that the bisection yields. We also
+prefer that it didn't pollute the environment with extra variables. To
+achieve this, we want `bisect!` to be a function call. We want this
+function to implement the iteration and list construction.
+
+`bisect!`, by definition, takes a lambda. This lambda, in our language, is declared
+in the lexical scope in which `bisect!` is called. Thus, to guarantee correct translation,
+we must do one of two things:
+
+1. Translate 1-to-1, and create a lambda, passing it to a fixed `bisect` function declared
+elsewhere.
+2. Translate to a nested function declaration, inlining the lambda.
+
+{{< todo >}}Maybe sidenote about inline?{{< /todo >}}
+
+Since I quite like the idea of inlining a lambda, let's settle for that. To do this,
+we pull a fresh temporary variable and declare a function, into which we place
+the traverser iteration code, as well as the body of the lambda, with the variable
+substituted for the list access expression. Here's the code:
+
+{{< todo >}}Code for bisect!{{< /todo >}}