Add first draft of Homework 3 (CS325)
This commit is contained in:
parent
d9544398b9
commit
a026e67a3b
295
content/blog/02_cs325_languages_hw3.md
Normal file
295
content/blog/02_cs325_languages_hw3.md
Normal file
|
@ -0,0 +1,295 @@
|
||||||
|
---
|
||||||
|
title: A Language for an Assignment - Homework 3
|
||||||
|
date: 2020-01-02T22:17:43-08:00
|
||||||
|
tags: ["Haskell", "Python", "Algorithms"]
|
||||||
|
draft: true
|
||||||
|
---
|
||||||
|
|
||||||
|
It rained in Sunriver on New Year's Eve, and it continued to rain
|
||||||
|
for the next couple of days. So, instead of going skiing as planned,
|
||||||
|
to the dismay of my family and friends, I spent the majority of
|
||||||
|
those days working on the third language for homework 3. It
|
||||||
|
was quite the language, too - the homework has three problems, each of
|
||||||
|
which has a solution independent of the others. I invite you
|
||||||
|
to join me in my descent into madness as we construct another language.
|
||||||
|
|
||||||
|
### Homework 3
|
||||||
|
Let's take a look at the three homework problems. The first two are
|
||||||
|
related, but are solved using a different technique:
|
||||||
|
|
||||||
|
{{< codelines "text" "cs325-langs/hws/hw3.txt" 18 30 >}}
|
||||||
|
|
||||||
|
This problem requires us to find the `k` numbers closest to some
|
||||||
|
query (which I will call `n`) from a list `xs`. The list isn't sorted, and the
|
||||||
|
problem must run in linear time. Sorting the list would require
|
||||||
|
the standard
|
||||||
|
{{< sidenote "right" "n-note" "\(O(n\log n)\) time." >}}
|
||||||
|
The \(n\) in this expression is not the same as the query <code>n</code>,
|
||||||
|
but rather the length of the list. In fact, I have not yet assigned
|
||||||
|
the length of the input <code>xs</code> to any variable. If we say that
|
||||||
|
\(m\) is a number that denotes that length, the proper expression
|
||||||
|
for the complexity is \(O(m \log m)\).
|
||||||
|
{{< /sidenote >}} Thus, we have to take another route, which should
|
||||||
|
already be familiar: quickselect. Using quickselect, we can find the `k`th
|
||||||
|
closest number, and then collect all the numbers that are closer than the `kth`
|
||||||
|
closest number. So, we need a language that:
|
||||||
|
|
||||||
|
* Supports quickselect (and thus, list partitioning and recursion).
|
||||||
|
* Supports iteration, {{< sidenote "left" "iteration-note" "multiple times." >}}
|
||||||
|
Why would we need to iterate multiple times? Note that we could have a list
|
||||||
|
of numbers that are all the same, <code>[1,1,1,1,1]</code>. Then, we'll need
|
||||||
|
to know how many of the numbers <em>equally close</em> as the <code>k</code>th
|
||||||
|
element we need to include, which will require another pass through the list.
|
||||||
|
{{< /sidenote >}}
|
||||||
|
|
||||||
|
That's a good start. Let's take a look at the second problem:
|
||||||
|
|
||||||
|
{{< codelines "text" "cs325-langs/hws/hw3.txt" 33 47 >}}
|
||||||
|
|
||||||
|
This problem really is easier. We have to find the position of _the_ closest
|
||||||
|
element, and then try expand towards either the left or right, depending on
|
||||||
|
which end is better. This expansion will take several steps, and will
|
||||||
|
likely require a way to "look" at a given part of the list. So let's add two more
|
||||||
|
rules. We need a language that also:
|
||||||
|
|
||||||
|
* Supports looping control flow, such as `while`.
|
||||||
|
* {{< sidenote "right" "view-note" "Allows for a \"view\" into the list" >}}
|
||||||
|
We could, of course, simply use list indexing. But then, we'd just be making
|
||||||
|
a simple imperative language, and that's boring. So let's play around
|
||||||
|
with our design a little, and experimentally add such a "list view" component.
|
||||||
|
{{< /sidenote >}}
|
||||||
|
(like an abstraction over indexing).
|
||||||
|
|
||||||
|
This is shaping up to be a fun language. Let's take a look at the last problem:
|
||||||
|
{{< codelines "text" "cs325-langs/hws/hw3.txt" 50 64 >}}
|
||||||
|
|
||||||
|
This problem requires more iterations of a list. We have several
|
||||||
|
{{< sidenote "right" "cursor-note" "\"cursors\"" >}}
|
||||||
|
I always make the language before I write the post, since a lot of
|
||||||
|
design decisions change mid-implementation. I realize now that
|
||||||
|
"cursors" would've been a better name for this language feature,
|
||||||
|
but alas, it is too late.
|
||||||
|
{{< /sidenote >}} looking into the list, and depending if the values
|
||||||
|
at each of the cursors add up, we do or do not add a new tuple to a list. So,
|
||||||
|
two more requirements:
|
||||||
|
|
||||||
|
* The "cursors" must be able to interact.
|
||||||
|
* The language can represent {{< sidenote "left" "tuple-note" "tuples." >}}
|
||||||
|
We could, of course, hack some other way to return a list of tuples, but
|
||||||
|
it turns out tuples are pretty simple to implement, and help make for nicer
|
||||||
|
programming in our language.
|
||||||
|
{{< /sidenote >}}
|
||||||
|
|
||||||
|
I think we've gathered what we want from the homework. Let's move on to the
|
||||||
|
language!
|
||||||
|
|
||||||
|
### A Language
|
||||||
|
As is now usual, let's envision a solution to the problems in our language. There
|
||||||
|
are actually quite a lot of functions to look at, so let's see them one by one.
|
||||||
|
First, let's look at `qselect`.
|
||||||
|
|
||||||
|
{{< codelines "text" "cs325-langs/sols/hw3.lang" 1 19 >}}
|
||||||
|
|
||||||
|
After the early return, the first interesting part of the language is the
|
||||||
|
use of what I have decided to call a __list traverser__. The list
|
||||||
|
traverser is a __generalization of a list index__. Whenever we use a list
|
||||||
|
index variable, we generally use the following operations:
|
||||||
|
|
||||||
|
* __Initialize__: we set the list index to some initial value, such as 0.
|
||||||
|
* __Step__: If we're walking the list from left to right, we increment the index.
|
||||||
|
If we're walking the list from right to left, we decrement the index.
|
||||||
|
* __Validity Check__: We check if the index is still valid (that is, we haven't
|
||||||
|
gone past the edge of the list).
|
||||||
|
* __Access__: Get the element the cursor is pointing to.
|
||||||
|
|
||||||
|
A {{< sidenote "right" "cpp-note" "traverser declaration" >}}
|
||||||
|
A fun fact is that we've just rediscovered C++
|
||||||
|
<a href="http://www.cplusplus.com/reference/iterator/">iterators</a>. C++
|
||||||
|
containers and their iterators provide us with the operations I described:
|
||||||
|
|
||||||
|
We can initialize an iterator like <code>auto it = list.begin()</code>. We
|
||||||
|
can step the iterator using <code>it++</code>. We can check its validity
|
||||||
|
using <code>it != list.end()</code>, and access what it's pointing to using
|
||||||
|
<code>*it</code>. While C++ uses templates and inheritance for this,
|
||||||
|
we define a language feature specifically for lists.
|
||||||
|
|
||||||
|
{{< /sidenote >}} describes these operations. The declartion for the `bisector`
|
||||||
|
traverser creates a "cursor" over the list `xs`, that goes between the 0th
|
||||||
|
and last elements of `xs`. The declaration for the `pivot` traverser creates
|
||||||
|
a "cursor" over the list `xs` that jumps around random locations in the list.
|
||||||
|
|
||||||
|
The next interesting part of the language is a __traverser macro__. This thing,
|
||||||
|
that looks like a function call (but isn't), performs an operation on the
|
||||||
|
cursor. For instance, `pop!` removes the element at the cursor from the list,
|
||||||
|
whereas `bisect!` categorizes the remaining elements in the cursor's list
|
||||||
|
into two lists, using a boolean-returning lambda (written in Java syntax).
|
||||||
|
|
||||||
|
Note that this implementation of `qselect` takes a function `c`, which it
|
||||||
|
uses to judge the actual value of the number. This is because our `qselect`
|
||||||
|
won't be finding _the_ smallest number, but the number with the smallest difference
|
||||||
|
with `n`. `n` will be factored in via the function.
|
||||||
|
|
||||||
|
Next up, let's take a look at the function that uses `qselect`, `closestUnsorted`:
|
||||||
|
|
||||||
|
{{< codelines "text" "cs325-langs/sols/hw3.lang" 21 46 >}}
|
||||||
|
|
||||||
|
Like we discussed, it finds the `k`th closest element (calling it `min`),
|
||||||
|
and counts how many elements that are __equal__ need to be included,
|
||||||
|
by setting the number to `k` at first, and subtracting 1 for every number
|
||||||
|
it encounters that's closer than `min`. Notice that we use the `valid!` and
|
||||||
|
`step!` macros, which implement the opertions we described above. Notice
|
||||||
|
that the user doesn't deal with adding and subtracting numbers, and doing
|
||||||
|
comparisons. All they have to do is ask "am I still good to iterate?"
|
||||||
|
|
||||||
|
Next, let's take a look at `closestSorted`, which will require more
|
||||||
|
traverser macros.
|
||||||
|
|
||||||
|
{{< codelines "text" "cs325-langs/sols/hw3.lang" 48 70 >}}
|
||||||
|
|
||||||
|
The first new macro is `canstep!`. This macro just verifies that
|
||||||
|
the traverser can make another step. We need this for the "reverse" iterator,
|
||||||
|
which indicates the lower bound of the range of numbers we want to return,
|
||||||
|
because `subset!` (which itself is just Python's slice, like `xs[a:b]`), uses an inclusive bottom
|
||||||
|
index, and thus, we can't afford to step it before knowing that we can, and that
|
||||||
|
it's a better choice after the step.
|
||||||
|
|
||||||
|
Similarly, we have the `at!(t, i)` macro, which looks at the
|
||||||
|
traverser `t`, with offset `i`.
|
||||||
|
|
||||||
|
We have two loops. The first loop runs as long as we can expand the range in both
|
||||||
|
directions, and picks the better direction at each iteration. The second loop
|
||||||
|
runs as long as we still want more numbers, but have already hit the edge
|
||||||
|
of the list on the left or on the right.
|
||||||
|
|
||||||
|
Finally, let's look at the solution to `xyz`:
|
||||||
|
|
||||||
|
{{< codelines "text" "cs325-langs/sols/hw3.lang" 72 95 >}}
|
||||||
|
|
||||||
|
I won't go in depth, but notice that the expression in the `span` part
|
||||||
|
of the `traverser` declaration can access another traverser. We treat
|
||||||
|
as a feature the fact that this expression isn't immediately evaluated at the place
|
||||||
|
of the traverser declaration. Rather, every time that a comparison for a traverser
|
||||||
|
operation is performed, this expression is re-evaluated. This allows us to put
|
||||||
|
dynamic bounds on traversers `y` and `z`, one of which must not exceed the other.
|
||||||
|
|
||||||
|
This is more than enough to work with. Let's move on to the implementation.
|
||||||
|
|
||||||
|
#### Implementation
|
||||||
|
Again, let's not go too far into the details of implementing the language from scratch.
|
||||||
|
Instead, let's take a look into specific parts of the language that deserve attention.
|
||||||
|
|
||||||
|
##### Revenge of the State Monad
|
||||||
|
Our previous language was, indeed, a respite from complexity. Translation was
|
||||||
|
straightforward, and the resulting expressions and statements were plugged straight
|
||||||
|
into a handwritten AST. We cannot get away with this here; the language is powerful
|
||||||
|
enough to implement three list-based problems, which comes at the cost of increased
|
||||||
|
complexity.
|
||||||
|
|
||||||
|
We need, once again, to generate temporary variables. We also need to keep track of
|
||||||
|
which variables are traversers, and the properties of these traversers, throughout
|
||||||
|
each function of the language. We thus fall back to using `Control.Monad.State`:
|
||||||
|
|
||||||
|
{{< todo >}}Code for Translator Monad{{< /todo >}}
|
||||||
|
|
||||||
|
There's one part of the state tuple that we haven't yet explained: the list of
|
||||||
|
statements.
|
||||||
|
|
||||||
|
##### Generating Statements
|
||||||
|
Recall that our translation function for expressions in the first homework had the type:
|
||||||
|
|
||||||
|
```Haskell
|
||||||
|
translateExpr :: Expr -> Translator ([Py.PyStmt], Py.PyExpr)
|
||||||
|
```
|
||||||
|
|
||||||
|
We then had to use `do`-notation, and explicitly concatenate lists
|
||||||
|
of emitted statements. In this language, I took an alternative route: I made
|
||||||
|
the statements part of the state. They are thus implicitly generated and
|
||||||
|
stored in the monad, and expression generators don't have to worry about
|
||||||
|
concatenating them. When the program is ready to use the generated statements
|
||||||
|
(say, when an `if`-statement needs to use the statements emitted by the condition
|
||||||
|
expression), we retrieve them from the monad:
|
||||||
|
|
||||||
|
{{< todo >}}Code for getting statements{{< /todo >}}
|
||||||
|
|
||||||
|
##### Validating Traverser Declarations
|
||||||
|
We declare two separate types that hold traverser data. The first is a kind of "draft"
|
||||||
|
type, `TraverserData`. This record holds all possible configurations of a traverser
|
||||||
|
that occur as the program is iterating through the various `key: value` pairs in
|
||||||
|
the declaration. For instance, at the very beginning of processing a traverser declaration,
|
||||||
|
our program will use a "default" `TraverserData`, with all fields set to `Nothing` or
|
||||||
|
their default value. This value will then be modified by the first key/value pair,
|
||||||
|
changing, for instance, the list that the traverser operates on. This new modified
|
||||||
|
`TraverserData` will then be modified by the next key/value pair, and so on. This
|
||||||
|
is, effectively, a fold operation.
|
||||||
|
|
||||||
|
{{< todo >}}Code for TraverserData{{< /todo >}}
|
||||||
|
{{< todo >}}Maybe sidenote about fold?{{< /todo >}}
|
||||||
|
|
||||||
|
The data may not have all the required fields until the very end, and its type
|
||||||
|
reflects that: `Maybe String` here, `Maybe TraverserBounds` there. We don't
|
||||||
|
want to deal with unwrapping the `Maybe a` values every time we use the traverser,
|
||||||
|
especially if we've done so before. So, we define a `ValidTraverserData` record,
|
||||||
|
that does not have `Maybe` arguments, and thus, has all the required data. At the
|
||||||
|
end of a traverser declaration, we attempt to translate a `TraverserData` into
|
||||||
|
a `ValidTraverserData`, invoking `fail` if we can't, and storing the `ValidTraverserData`
|
||||||
|
into the state otherwise. Then, every time we retrieve a traverser from the state,
|
||||||
|
it's guaranteed to be valid, and we have to spend no extra work unpacking it. We
|
||||||
|
define a lookup monadic operation like this:
|
||||||
|
|
||||||
|
{{< todo >}}Code for getting ValidTraverserData{{< /todo >}}
|
||||||
|
|
||||||
|
##### Compiling Macros
|
||||||
|
I didn't call them macros for no reason. Clearly, we don't want to generate
|
||||||
|
code that
|
||||||
|
{{< sidenote "right" "increment-note" "calls functions only to increment an index." >}}
|
||||||
|
In fact, there's no easy way to do this at all. Python's integers (if we choose to
|
||||||
|
represent our traversers using integers), are immutable. Furthermore, unlike C++,
|
||||||
|
where passing by reference allows a function to change its parameters "outside"
|
||||||
|
the call, Python offers no way to reassign a different value to a variable given
|
||||||
|
to a function.
|
||||||
|
<br><br>
|
||||||
|
For an example use of C++'s pass-by-reference mechanic, consider <code>std::swap</code>:
|
||||||
|
it's a function, but it modifies the two variables given to it. There's no
|
||||||
|
way to generically implement such a function in Python.
|
||||||
|
{{< /sidenote >}} We also can't allow arbitrary expressions to serve as traversers:
|
||||||
|
our translator keeps some context about which variables are traversers, what their
|
||||||
|
bounds are, and how they behave. Thus, __calls to traverser macros are very much macros__:
|
||||||
|
they operate on AST nodes, and __require__ that their first argument is a variable,
|
||||||
|
named like the traverser. We use the `requireTraverser` monadic operation
|
||||||
|
to get the traverser associated with the given variable name, and then perform
|
||||||
|
the operation as intended. The `at!(t)` operation is straightforward:
|
||||||
|
|
||||||
|
{{< todo >}}Code for at!{{< /todo >}}
|
||||||
|
|
||||||
|
The `at!(t,i)` is less so, since it deals with the intricacies of accessing
|
||||||
|
the list at either a positive of negative offset, depending on the direction
|
||||||
|
of the traverser. We implement a function to properly generate an expression for the offset:
|
||||||
|
|
||||||
|
{{< todo >}}Code for traverserIncrement{{< /todo >}}
|
||||||
|
|
||||||
|
We then implement `at!(t,i)` as follows:
|
||||||
|
|
||||||
|
{{< todo >}}Code for at!{{< /todo >}}
|
||||||
|
|
||||||
|
The most complicated macro is `bisect!`. It must be able to step the traverser,
|
||||||
|
and also return a tuple of two lists that the bisection yields. We also
|
||||||
|
prefer that it didn't pollute the environment with extra variables. To
|
||||||
|
achieve this, we want `bisect!` to be a function call. We want this
|
||||||
|
function to implement the iteration and list construction.
|
||||||
|
|
||||||
|
`bisect!`, by definition, takes a lambda. This lambda, in our language, is declared
|
||||||
|
in the lexical scope in which `bisect!` is called. Thus, to guarantee correct translation,
|
||||||
|
we must do one of two things:
|
||||||
|
|
||||||
|
1. Translate 1-to-1, and create a lambda, passing it to a fixed `bisect` function declared
|
||||||
|
elsewhere.
|
||||||
|
2. Translate to a nested function declaration, inlining the lambda.
|
||||||
|
|
||||||
|
{{< todo >}}Maybe sidenote about inline?{{< /todo >}}
|
||||||
|
|
||||||
|
Since I quite like the idea of inlining a lambda, let's settle for that. To do this,
|
||||||
|
we pull a fresh temporary variable and declare a function, into which we place
|
||||||
|
the traverser iteration code, as well as the body of the lambda, with the variable
|
||||||
|
substituted for the list access expression. Here's the code:
|
||||||
|
|
||||||
|
{{< todo >}}Code for bisect!{{< /todo >}}
|
Loading…
Reference in New Issue
Block a user