From a026e67a3be5665a8f1d5b95bcc9aa81c517063a Mon Sep 17 00:00:00 2001 From: Danila Fedorin Date: Fri, 3 Jan 2020 21:09:15 -0800 Subject: [PATCH] Add first draft of Homework 3 (CS325) --- content/blog/02_cs325_languages_hw3.md | 295 +++++++++++++++++++++++++ 1 file changed, 295 insertions(+) create mode 100644 content/blog/02_cs325_languages_hw3.md diff --git a/content/blog/02_cs325_languages_hw3.md b/content/blog/02_cs325_languages_hw3.md new file mode 100644 index 0000000..47df306 --- /dev/null +++ b/content/blog/02_cs325_languages_hw3.md @@ -0,0 +1,295 @@ +--- +title: A Language for an Assignment - Homework 3 +date: 2020-01-02T22:17:43-08:00 +tags: ["Haskell", "Python", "Algorithms"] +draft: true +--- + +It rained in Sunriver on New Year's Eve, and it continued to rain +for the next couple of days. So, instead of going skiing as planned, +to the dismay of my family and friends, I spent the majority of +those days working on the third language for homework 3. It +was quite the language, too - the homework has three problems, each of +which has a solution independent of the others. I invite you +to join me in my descent into madness as we construct another language. + +### Homework 3 +Let's take a look at the three homework problems. The first two are +related, but are solved using a different technique: + +{{< codelines "text" "cs325-langs/hws/hw3.txt" 18 30 >}} + +This problem requires us to find the `k` numbers closest to some +query (which I will call `n`) from a list `xs`. The list isn't sorted, and the +problem must run in linear time. Sorting the list would require +the standard +{{< sidenote "right" "n-note" "\(O(n\log n)\) time." >}} +The \(n\) in this expression is not the same as the query n, +but rather the length of the list. In fact, I have not yet assigned +the length of the input xs to any variable. If we say that +\(m\) is a number that denotes that length, the proper expression +for the complexity is \(O(m \log m)\). +{{< /sidenote >}} Thus, we have to take another route, which should +already be familiar: quickselect. Using quickselect, we can find the `k`th +closest number, and then collect all the numbers that are closer than the `kth` +closest number. So, we need a language that: + +* Supports quickselect (and thus, list partitioning and recursion). +* Supports iteration, {{< sidenote "left" "iteration-note" "multiple times." >}} +Why would we need to iterate multiple times? Note that we could have a list +of numbers that are all the same, [1,1,1,1,1]. Then, we'll need +to know how many of the numbers equally close as the kth +element we need to include, which will require another pass through the list. +{{< /sidenote >}} + +That's a good start. Let's take a look at the second problem: + +{{< codelines "text" "cs325-langs/hws/hw3.txt" 33 47 >}} + +This problem really is easier. We have to find the position of _the_ closest +element, and then try expand towards either the left or right, depending on +which end is better. This expansion will take several steps, and will +likely require a way to "look" at a given part of the list. So let's add two more +rules. We need a language that also: + +* Supports looping control flow, such as `while`. +* {{< sidenote "right" "view-note" "Allows for a \"view\" into the list" >}} +We could, of course, simply use list indexing. But then, we'd just be making +a simple imperative language, and that's boring. So let's play around +with our design a little, and experimentally add such a "list view" component. +{{< /sidenote >}} +(like an abstraction over indexing). + +This is shaping up to be a fun language. Let's take a look at the last problem: +{{< codelines "text" "cs325-langs/hws/hw3.txt" 50 64 >}} + +This problem requires more iterations of a list. We have several +{{< sidenote "right" "cursor-note" "\"cursors\"" >}} +I always make the language before I write the post, since a lot of +design decisions change mid-implementation. I realize now that +"cursors" would've been a better name for this language feature, +but alas, it is too late. +{{< /sidenote >}} looking into the list, and depending if the values +at each of the cursors add up, we do or do not add a new tuple to a list. So, +two more requirements: + +* The "cursors" must be able to interact. +* The language can represent {{< sidenote "left" "tuple-note" "tuples." >}} +We could, of course, hack some other way to return a list of tuples, but +it turns out tuples are pretty simple to implement, and help make for nicer +programming in our language. +{{< /sidenote >}} + +I think we've gathered what we want from the homework. Let's move on to the +language! + +### A Language +As is now usual, let's envision a solution to the problems in our language. There +are actually quite a lot of functions to look at, so let's see them one by one. +First, let's look at `qselect`. + +{{< codelines "text" "cs325-langs/sols/hw3.lang" 1 19 >}} + +After the early return, the first interesting part of the language is the +use of what I have decided to call a __list traverser__. The list +traverser is a __generalization of a list index__. Whenever we use a list +index variable, we generally use the following operations: + +* __Initialize__: we set the list index to some initial value, such as 0. +* __Step__: If we're walking the list from left to right, we increment the index. +If we're walking the list from right to left, we decrement the index. +* __Validity Check__: We check if the index is still valid (that is, we haven't +gone past the edge of the list). +* __Access__: Get the element the cursor is pointing to. + +A {{< sidenote "right" "cpp-note" "traverser declaration" >}} +A fun fact is that we've just rediscovered C++ +iterators. C++ +containers and their iterators provide us with the operations I described: + +We can initialize an iterator like auto it = list.begin(). We +can step the iterator using it++. We can check its validity +using it != list.end(), and access what it's pointing to using +*it. While C++ uses templates and inheritance for this, +we define a language feature specifically for lists. + +{{< /sidenote >}} describes these operations. The declartion for the `bisector` +traverser creates a "cursor" over the list `xs`, that goes between the 0th +and last elements of `xs`. The declaration for the `pivot` traverser creates +a "cursor" over the list `xs` that jumps around random locations in the list. + +The next interesting part of the language is a __traverser macro__. This thing, +that looks like a function call (but isn't), performs an operation on the +cursor. For instance, `pop!` removes the element at the cursor from the list, +whereas `bisect!` categorizes the remaining elements in the cursor's list +into two lists, using a boolean-returning lambda (written in Java syntax). + +Note that this implementation of `qselect` takes a function `c`, which it +uses to judge the actual value of the number. This is because our `qselect` +won't be finding _the_ smallest number, but the number with the smallest difference +with `n`. `n` will be factored in via the function. + +Next up, let's take a look at the function that uses `qselect`, `closestUnsorted`: + +{{< codelines "text" "cs325-langs/sols/hw3.lang" 21 46 >}} + +Like we discussed, it finds the `k`th closest element (calling it `min`), +and counts how many elements that are __equal__ need to be included, +by setting the number to `k` at first, and subtracting 1 for every number +it encounters that's closer than `min`. Notice that we use the `valid!` and +`step!` macros, which implement the opertions we described above. Notice +that the user doesn't deal with adding and subtracting numbers, and doing +comparisons. All they have to do is ask "am I still good to iterate?" + +Next, let's take a look at `closestSorted`, which will require more +traverser macros. + +{{< codelines "text" "cs325-langs/sols/hw3.lang" 48 70 >}} + +The first new macro is `canstep!`. This macro just verifies that +the traverser can make another step. We need this for the "reverse" iterator, +which indicates the lower bound of the range of numbers we want to return, +because `subset!` (which itself is just Python's slice, like `xs[a:b]`), uses an inclusive bottom +index, and thus, we can't afford to step it before knowing that we can, and that +it's a better choice after the step. + +Similarly, we have the `at!(t, i)` macro, which looks at the +traverser `t`, with offset `i`. + +We have two loops. The first loop runs as long as we can expand the range in both +directions, and picks the better direction at each iteration. The second loop +runs as long as we still want more numbers, but have already hit the edge +of the list on the left or on the right. + +Finally, let's look at the solution to `xyz`: + +{{< codelines "text" "cs325-langs/sols/hw3.lang" 72 95 >}} + +I won't go in depth, but notice that the expression in the `span` part +of the `traverser` declaration can access another traverser. We treat +as a feature the fact that this expression isn't immediately evaluated at the place +of the traverser declaration. Rather, every time that a comparison for a traverser +operation is performed, this expression is re-evaluated. This allows us to put +dynamic bounds on traversers `y` and `z`, one of which must not exceed the other. + +This is more than enough to work with. Let's move on to the implementation. + +#### Implementation +Again, let's not go too far into the details of implementing the language from scratch. +Instead, let's take a look into specific parts of the language that deserve attention. + +##### Revenge of the State Monad +Our previous language was, indeed, a respite from complexity. Translation was +straightforward, and the resulting expressions and statements were plugged straight +into a handwritten AST. We cannot get away with this here; the language is powerful +enough to implement three list-based problems, which comes at the cost of increased +complexity. + +We need, once again, to generate temporary variables. We also need to keep track of +which variables are traversers, and the properties of these traversers, throughout +each function of the language. We thus fall back to using `Control.Monad.State`: + +{{< todo >}}Code for Translator Monad{{< /todo >}} + +There's one part of the state tuple that we haven't yet explained: the list of +statements. + +##### Generating Statements +Recall that our translation function for expressions in the first homework had the type: + +```Haskell +translateExpr :: Expr -> Translator ([Py.PyStmt], Py.PyExpr) +``` + +We then had to use `do`-notation, and explicitly concatenate lists +of emitted statements. In this language, I took an alternative route: I made +the statements part of the state. They are thus implicitly generated and +stored in the monad, and expression generators don't have to worry about +concatenating them. When the program is ready to use the generated statements +(say, when an `if`-statement needs to use the statements emitted by the condition +expression), we retrieve them from the monad: + +{{< todo >}}Code for getting statements{{< /todo >}} + +##### Validating Traverser Declarations +We declare two separate types that hold traverser data. The first is a kind of "draft" +type, `TraverserData`. This record holds all possible configurations of a traverser +that occur as the program is iterating through the various `key: value` pairs in +the declaration. For instance, at the very beginning of processing a traverser declaration, +our program will use a "default" `TraverserData`, with all fields set to `Nothing` or +their default value. This value will then be modified by the first key/value pair, +changing, for instance, the list that the traverser operates on. This new modified +`TraverserData` will then be modified by the next key/value pair, and so on. This +is, effectively, a fold operation. + +{{< todo >}}Code for TraverserData{{< /todo >}} +{{< todo >}}Maybe sidenote about fold?{{< /todo >}} + +The data may not have all the required fields until the very end, and its type +reflects that: `Maybe String` here, `Maybe TraverserBounds` there. We don't +want to deal with unwrapping the `Maybe a` values every time we use the traverser, +especially if we've done so before. So, we define a `ValidTraverserData` record, +that does not have `Maybe` arguments, and thus, has all the required data. At the +end of a traverser declaration, we attempt to translate a `TraverserData` into +a `ValidTraverserData`, invoking `fail` if we can't, and storing the `ValidTraverserData` +into the state otherwise. Then, every time we retrieve a traverser from the state, +it's guaranteed to be valid, and we have to spend no extra work unpacking it. We +define a lookup monadic operation like this: + +{{< todo >}}Code for getting ValidTraverserData{{< /todo >}} + +##### Compiling Macros +I didn't call them macros for no reason. Clearly, we don't want to generate +code that +{{< sidenote "right" "increment-note" "calls functions only to increment an index." >}} +In fact, there's no easy way to do this at all. Python's integers (if we choose to +represent our traversers using integers), are immutable. Furthermore, unlike C++, +where passing by reference allows a function to change its parameters "outside" +the call, Python offers no way to reassign a different value to a variable given +to a function. +

+For an example use of C++'s pass-by-reference mechanic, consider std::swap: +it's a function, but it modifies the two variables given to it. There's no +way to generically implement such a function in Python. +{{< /sidenote >}} We also can't allow arbitrary expressions to serve as traversers: +our translator keeps some context about which variables are traversers, what their +bounds are, and how they behave. Thus, __calls to traverser macros are very much macros__: +they operate on AST nodes, and __require__ that their first argument is a variable, +named like the traverser. We use the `requireTraverser` monadic operation +to get the traverser associated with the given variable name, and then perform +the operation as intended. The `at!(t)` operation is straightforward: + +{{< todo >}}Code for at!{{< /todo >}} + +The `at!(t,i)` is less so, since it deals with the intricacies of accessing +the list at either a positive of negative offset, depending on the direction +of the traverser. We implement a function to properly generate an expression for the offset: + +{{< todo >}}Code for traverserIncrement{{< /todo >}} + +We then implement `at!(t,i)` as follows: + +{{< todo >}}Code for at!{{< /todo >}} + +The most complicated macro is `bisect!`. It must be able to step the traverser, +and also return a tuple of two lists that the bisection yields. We also +prefer that it didn't pollute the environment with extra variables. To +achieve this, we want `bisect!` to be a function call. We want this +function to implement the iteration and list construction. + +`bisect!`, by definition, takes a lambda. This lambda, in our language, is declared +in the lexical scope in which `bisect!` is called. Thus, to guarantee correct translation, +we must do one of two things: + +1. Translate 1-to-1, and create a lambda, passing it to a fixed `bisect` function declared +elsewhere. +2. Translate to a nested function declaration, inlining the lambda. + +{{< todo >}}Maybe sidenote about inline?{{< /todo >}} + +Since I quite like the idea of inlining a lambda, let's settle for that. To do this, +we pull a fresh temporary variable and declare a function, into which we place +the traverser iteration code, as well as the body of the lambda, with the variable +substituted for the list access expression. Here's the code: + +{{< todo >}}Code for bisect!{{< /todo >}}