Add first draft of Homework 3 (CS325)
This commit is contained in:
parent
d9544398b9
commit
a026e67a3b
295
content/blog/02_cs325_languages_hw3.md
Normal file
295
content/blog/02_cs325_languages_hw3.md
Normal file
@ -0,0 +1,295 @@
|
||||
---
|
||||
title: A Language for an Assignment - Homework 3
|
||||
date: 2020-01-02T22:17:43-08:00
|
||||
tags: ["Haskell", "Python", "Algorithms"]
|
||||
draft: true
|
||||
---
|
||||
|
||||
It rained in Sunriver on New Year's Eve, and it continued to rain
|
||||
for the next couple of days. So, instead of going skiing as planned,
|
||||
to the dismay of my family and friends, I spent the majority of
|
||||
those days working on the third language for homework 3. It
|
||||
was quite the language, too - the homework has three problems, each of
|
||||
which has a solution independent of the others. I invite you
|
||||
to join me in my descent into madness as we construct another language.
|
||||
|
||||
### Homework 3
|
||||
Let's take a look at the three homework problems. The first two are
|
||||
related, but are solved using a different technique:
|
||||
|
||||
{{< codelines "text" "cs325-langs/hws/hw3.txt" 18 30 >}}
|
||||
|
||||
This problem requires us to find the `k` numbers closest to some
|
||||
query (which I will call `n`) from a list `xs`. The list isn't sorted, and the
|
||||
problem must run in linear time. Sorting the list would require
|
||||
the standard
|
||||
{{< sidenote "right" "n-note" "\(O(n\log n)\) time." >}}
|
||||
The \(n\) in this expression is not the same as the query <code>n</code>,
|
||||
but rather the length of the list. In fact, I have not yet assigned
|
||||
the length of the input <code>xs</code> to any variable. If we say that
|
||||
\(m\) is a number that denotes that length, the proper expression
|
||||
for the complexity is \(O(m \log m)\).
|
||||
{{< /sidenote >}} Thus, we have to take another route, which should
|
||||
already be familiar: quickselect. Using quickselect, we can find the `k`th
|
||||
closest number, and then collect all the numbers that are closer than the `kth`
|
||||
closest number. So, we need a language that:
|
||||
|
||||
* Supports quickselect (and thus, list partitioning and recursion).
|
||||
* Supports iteration, {{< sidenote "left" "iteration-note" "multiple times." >}}
|
||||
Why would we need to iterate multiple times? Note that we could have a list
|
||||
of numbers that are all the same, <code>[1,1,1,1,1]</code>. Then, we'll need
|
||||
to know how many of the numbers <em>equally close</em> as the <code>k</code>th
|
||||
element we need to include, which will require another pass through the list.
|
||||
{{< /sidenote >}}
|
||||
|
||||
That's a good start. Let's take a look at the second problem:
|
||||
|
||||
{{< codelines "text" "cs325-langs/hws/hw3.txt" 33 47 >}}
|
||||
|
||||
This problem really is easier. We have to find the position of _the_ closest
|
||||
element, and then try expand towards either the left or right, depending on
|
||||
which end is better. This expansion will take several steps, and will
|
||||
likely require a way to "look" at a given part of the list. So let's add two more
|
||||
rules. We need a language that also:
|
||||
|
||||
* Supports looping control flow, such as `while`.
|
||||
* {{< sidenote "right" "view-note" "Allows for a \"view\" into the list" >}}
|
||||
We could, of course, simply use list indexing. But then, we'd just be making
|
||||
a simple imperative language, and that's boring. So let's play around
|
||||
with our design a little, and experimentally add such a "list view" component.
|
||||
{{< /sidenote >}}
|
||||
(like an abstraction over indexing).
|
||||
|
||||
This is shaping up to be a fun language. Let's take a look at the last problem:
|
||||
{{< codelines "text" "cs325-langs/hws/hw3.txt" 50 64 >}}
|
||||
|
||||
This problem requires more iterations of a list. We have several
|
||||
{{< sidenote "right" "cursor-note" "\"cursors\"" >}}
|
||||
I always make the language before I write the post, since a lot of
|
||||
design decisions change mid-implementation. I realize now that
|
||||
"cursors" would've been a better name for this language feature,
|
||||
but alas, it is too late.
|
||||
{{< /sidenote >}} looking into the list, and depending if the values
|
||||
at each of the cursors add up, we do or do not add a new tuple to a list. So,
|
||||
two more requirements:
|
||||
|
||||
* The "cursors" must be able to interact.
|
||||
* The language can represent {{< sidenote "left" "tuple-note" "tuples." >}}
|
||||
We could, of course, hack some other way to return a list of tuples, but
|
||||
it turns out tuples are pretty simple to implement, and help make for nicer
|
||||
programming in our language.
|
||||
{{< /sidenote >}}
|
||||
|
||||
I think we've gathered what we want from the homework. Let's move on to the
|
||||
language!
|
||||
|
||||
### A Language
|
||||
As is now usual, let's envision a solution to the problems in our language. There
|
||||
are actually quite a lot of functions to look at, so let's see them one by one.
|
||||
First, let's look at `qselect`.
|
||||
|
||||
{{< codelines "text" "cs325-langs/sols/hw3.lang" 1 19 >}}
|
||||
|
||||
After the early return, the first interesting part of the language is the
|
||||
use of what I have decided to call a __list traverser__. The list
|
||||
traverser is a __generalization of a list index__. Whenever we use a list
|
||||
index variable, we generally use the following operations:
|
||||
|
||||
* __Initialize__: we set the list index to some initial value, such as 0.
|
||||
* __Step__: If we're walking the list from left to right, we increment the index.
|
||||
If we're walking the list from right to left, we decrement the index.
|
||||
* __Validity Check__: We check if the index is still valid (that is, we haven't
|
||||
gone past the edge of the list).
|
||||
* __Access__: Get the element the cursor is pointing to.
|
||||
|
||||
A {{< sidenote "right" "cpp-note" "traverser declaration" >}}
|
||||
A fun fact is that we've just rediscovered C++
|
||||
<a href="http://www.cplusplus.com/reference/iterator/">iterators</a>. C++
|
||||
containers and their iterators provide us with the operations I described:
|
||||
|
||||
We can initialize an iterator like <code>auto it = list.begin()</code>. We
|
||||
can step the iterator using <code>it++</code>. We can check its validity
|
||||
using <code>it != list.end()</code>, and access what it's pointing to using
|
||||
<code>*it</code>. While C++ uses templates and inheritance for this,
|
||||
we define a language feature specifically for lists.
|
||||
|
||||
{{< /sidenote >}} describes these operations. The declartion for the `bisector`
|
||||
traverser creates a "cursor" over the list `xs`, that goes between the 0th
|
||||
and last elements of `xs`. The declaration for the `pivot` traverser creates
|
||||
a "cursor" over the list `xs` that jumps around random locations in the list.
|
||||
|
||||
The next interesting part of the language is a __traverser macro__. This thing,
|
||||
that looks like a function call (but isn't), performs an operation on the
|
||||
cursor. For instance, `pop!` removes the element at the cursor from the list,
|
||||
whereas `bisect!` categorizes the remaining elements in the cursor's list
|
||||
into two lists, using a boolean-returning lambda (written in Java syntax).
|
||||
|
||||
Note that this implementation of `qselect` takes a function `c`, which it
|
||||
uses to judge the actual value of the number. This is because our `qselect`
|
||||
won't be finding _the_ smallest number, but the number with the smallest difference
|
||||
with `n`. `n` will be factored in via the function.
|
||||
|
||||
Next up, let's take a look at the function that uses `qselect`, `closestUnsorted`:
|
||||
|
||||
{{< codelines "text" "cs325-langs/sols/hw3.lang" 21 46 >}}
|
||||
|
||||
Like we discussed, it finds the `k`th closest element (calling it `min`),
|
||||
and counts how many elements that are __equal__ need to be included,
|
||||
by setting the number to `k` at first, and subtracting 1 for every number
|
||||
it encounters that's closer than `min`. Notice that we use the `valid!` and
|
||||
`step!` macros, which implement the opertions we described above. Notice
|
||||
that the user doesn't deal with adding and subtracting numbers, and doing
|
||||
comparisons. All they have to do is ask "am I still good to iterate?"
|
||||
|
||||
Next, let's take a look at `closestSorted`, which will require more
|
||||
traverser macros.
|
||||
|
||||
{{< codelines "text" "cs325-langs/sols/hw3.lang" 48 70 >}}
|
||||
|
||||
The first new macro is `canstep!`. This macro just verifies that
|
||||
the traverser can make another step. We need this for the "reverse" iterator,
|
||||
which indicates the lower bound of the range of numbers we want to return,
|
||||
because `subset!` (which itself is just Python's slice, like `xs[a:b]`), uses an inclusive bottom
|
||||
index, and thus, we can't afford to step it before knowing that we can, and that
|
||||
it's a better choice after the step.
|
||||
|
||||
Similarly, we have the `at!(t, i)` macro, which looks at the
|
||||
traverser `t`, with offset `i`.
|
||||
|
||||
We have two loops. The first loop runs as long as we can expand the range in both
|
||||
directions, and picks the better direction at each iteration. The second loop
|
||||
runs as long as we still want more numbers, but have already hit the edge
|
||||
of the list on the left or on the right.
|
||||
|
||||
Finally, let's look at the solution to `xyz`:
|
||||
|
||||
{{< codelines "text" "cs325-langs/sols/hw3.lang" 72 95 >}}
|
||||
|
||||
I won't go in depth, but notice that the expression in the `span` part
|
||||
of the `traverser` declaration can access another traverser. We treat
|
||||
as a feature the fact that this expression isn't immediately evaluated at the place
|
||||
of the traverser declaration. Rather, every time that a comparison for a traverser
|
||||
operation is performed, this expression is re-evaluated. This allows us to put
|
||||
dynamic bounds on traversers `y` and `z`, one of which must not exceed the other.
|
||||
|
||||
This is more than enough to work with. Let's move on to the implementation.
|
||||
|
||||
#### Implementation
|
||||
Again, let's not go too far into the details of implementing the language from scratch.
|
||||
Instead, let's take a look into specific parts of the language that deserve attention.
|
||||
|
||||
##### Revenge of the State Monad
|
||||
Our previous language was, indeed, a respite from complexity. Translation was
|
||||
straightforward, and the resulting expressions and statements were plugged straight
|
||||
into a handwritten AST. We cannot get away with this here; the language is powerful
|
||||
enough to implement three list-based problems, which comes at the cost of increased
|
||||
complexity.
|
||||
|
||||
We need, once again, to generate temporary variables. We also need to keep track of
|
||||
which variables are traversers, and the properties of these traversers, throughout
|
||||
each function of the language. We thus fall back to using `Control.Monad.State`:
|
||||
|
||||
{{< todo >}}Code for Translator Monad{{< /todo >}}
|
||||
|
||||
There's one part of the state tuple that we haven't yet explained: the list of
|
||||
statements.
|
||||
|
||||
##### Generating Statements
|
||||
Recall that our translation function for expressions in the first homework had the type:
|
||||
|
||||
```Haskell
|
||||
translateExpr :: Expr -> Translator ([Py.PyStmt], Py.PyExpr)
|
||||
```
|
||||
|
||||
We then had to use `do`-notation, and explicitly concatenate lists
|
||||
of emitted statements. In this language, I took an alternative route: I made
|
||||
the statements part of the state. They are thus implicitly generated and
|
||||
stored in the monad, and expression generators don't have to worry about
|
||||
concatenating them. When the program is ready to use the generated statements
|
||||
(say, when an `if`-statement needs to use the statements emitted by the condition
|
||||
expression), we retrieve them from the monad:
|
||||
|
||||
{{< todo >}}Code for getting statements{{< /todo >}}
|
||||
|
||||
##### Validating Traverser Declarations
|
||||
We declare two separate types that hold traverser data. The first is a kind of "draft"
|
||||
type, `TraverserData`. This record holds all possible configurations of a traverser
|
||||
that occur as the program is iterating through the various `key: value` pairs in
|
||||
the declaration. For instance, at the very beginning of processing a traverser declaration,
|
||||
our program will use a "default" `TraverserData`, with all fields set to `Nothing` or
|
||||
their default value. This value will then be modified by the first key/value pair,
|
||||
changing, for instance, the list that the traverser operates on. This new modified
|
||||
`TraverserData` will then be modified by the next key/value pair, and so on. This
|
||||
is, effectively, a fold operation.
|
||||
|
||||
{{< todo >}}Code for TraverserData{{< /todo >}}
|
||||
{{< todo >}}Maybe sidenote about fold?{{< /todo >}}
|
||||
|
||||
The data may not have all the required fields until the very end, and its type
|
||||
reflects that: `Maybe String` here, `Maybe TraverserBounds` there. We don't
|
||||
want to deal with unwrapping the `Maybe a` values every time we use the traverser,
|
||||
especially if we've done so before. So, we define a `ValidTraverserData` record,
|
||||
that does not have `Maybe` arguments, and thus, has all the required data. At the
|
||||
end of a traverser declaration, we attempt to translate a `TraverserData` into
|
||||
a `ValidTraverserData`, invoking `fail` if we can't, and storing the `ValidTraverserData`
|
||||
into the state otherwise. Then, every time we retrieve a traverser from the state,
|
||||
it's guaranteed to be valid, and we have to spend no extra work unpacking it. We
|
||||
define a lookup monadic operation like this:
|
||||
|
||||
{{< todo >}}Code for getting ValidTraverserData{{< /todo >}}
|
||||
|
||||
##### Compiling Macros
|
||||
I didn't call them macros for no reason. Clearly, we don't want to generate
|
||||
code that
|
||||
{{< sidenote "right" "increment-note" "calls functions only to increment an index." >}}
|
||||
In fact, there's no easy way to do this at all. Python's integers (if we choose to
|
||||
represent our traversers using integers), are immutable. Furthermore, unlike C++,
|
||||
where passing by reference allows a function to change its parameters "outside"
|
||||
the call, Python offers no way to reassign a different value to a variable given
|
||||
to a function.
|
||||
<br><br>
|
||||
For an example use of C++'s pass-by-reference mechanic, consider <code>std::swap</code>:
|
||||
it's a function, but it modifies the two variables given to it. There's no
|
||||
way to generically implement such a function in Python.
|
||||
{{< /sidenote >}} We also can't allow arbitrary expressions to serve as traversers:
|
||||
our translator keeps some context about which variables are traversers, what their
|
||||
bounds are, and how they behave. Thus, __calls to traverser macros are very much macros__:
|
||||
they operate on AST nodes, and __require__ that their first argument is a variable,
|
||||
named like the traverser. We use the `requireTraverser` monadic operation
|
||||
to get the traverser associated with the given variable name, and then perform
|
||||
the operation as intended. The `at!(t)` operation is straightforward:
|
||||
|
||||
{{< todo >}}Code for at!{{< /todo >}}
|
||||
|
||||
The `at!(t,i)` is less so, since it deals with the intricacies of accessing
|
||||
the list at either a positive of negative offset, depending on the direction
|
||||
of the traverser. We implement a function to properly generate an expression for the offset:
|
||||
|
||||
{{< todo >}}Code for traverserIncrement{{< /todo >}}
|
||||
|
||||
We then implement `at!(t,i)` as follows:
|
||||
|
||||
{{< todo >}}Code for at!{{< /todo >}}
|
||||
|
||||
The most complicated macro is `bisect!`. It must be able to step the traverser,
|
||||
and also return a tuple of two lists that the bisection yields. We also
|
||||
prefer that it didn't pollute the environment with extra variables. To
|
||||
achieve this, we want `bisect!` to be a function call. We want this
|
||||
function to implement the iteration and list construction.
|
||||
|
||||
`bisect!`, by definition, takes a lambda. This lambda, in our language, is declared
|
||||
in the lexical scope in which `bisect!` is called. Thus, to guarantee correct translation,
|
||||
we must do one of two things:
|
||||
|
||||
1. Translate 1-to-1, and create a lambda, passing it to a fixed `bisect` function declared
|
||||
elsewhere.
|
||||
2. Translate to a nested function declaration, inlining the lambda.
|
||||
|
||||
{{< todo >}}Maybe sidenote about inline?{{< /todo >}}
|
||||
|
||||
Since I quite like the idea of inlining a lambda, let's settle for that. To do this,
|
||||
we pull a fresh temporary variable and declare a function, into which we place
|
||||
the traverser iteration code, as well as the body of the lambda, with the variable
|
||||
substituted for the list access expression. Here's the code:
|
||||
|
||||
{{< todo >}}Code for bisect!{{< /todo >}}
|
Loading…
Reference in New Issue
Block a user