431 lines
20 KiB
Markdown
431 lines
20 KiB
Markdown
---
|
|
title: A Language for an Assignment - Homework 3
|
|
date: 2020-01-02T22:17:43-08:00
|
|
tags: ["Haskell", "Python", "Algorithms", "Programming Languages"]
|
|
series: "A Language for an Assignment"
|
|
---
|
|
|
|
It rained in Sunriver on New Year's Eve, and it continued to rain
|
|
for the next couple of days. So, instead of going skiing as planned,
|
|
to the dismay of my family and friends, I spent the majority of
|
|
those days working on the third language for homework 3. It
|
|
was quite the language, too - the homework has three problems, each of
|
|
which has a solution independent of the others. I invite you
|
|
to join me in my descent into madness as we construct another language.
|
|
|
|
### Homework 3
|
|
Let's take a look at the three homework problems. The first two are
|
|
related, but are solved using a different technique:
|
|
|
|
{{< codelines "text" "cs325-langs/hws/hw3.txt" 18 30 >}}
|
|
|
|
This problem requires us to find the `k` numbers closest to some
|
|
query (which I will call `n`) from a list `xs`. The list isn't sorted, and the
|
|
problem must run in linear time. Sorting the list would require
|
|
the standard
|
|
{{< sidenote "right" "n-note" "\(O(n\log n)\) time." >}}
|
|
The \(n\) in this expression is not the same as the query <code>n</code>,
|
|
but rather the length of the list. In fact, I have not yet assigned
|
|
the length of the input <code>xs</code> to any variable. If we say that
|
|
\(m\) is a number that denotes that length, the proper expression
|
|
for the complexity is \(O(m \log m)\).
|
|
{{< /sidenote >}} Thus, we have to take another route, which should
|
|
already be familiar: quickselect. Using quickselect, we can find the `k`th
|
|
closest number, and then collect all the numbers that are closer than the `kth`
|
|
closest number. So, we need a language that:
|
|
|
|
* Supports quickselect (and thus, list partitioning and recursion).
|
|
* Supports iteration, {{< sidenote "left" "iteration-note" "multiple times." >}}
|
|
Why would we need to iterate multiple times? Note that we could have a list
|
|
of numbers that are all the same, <code>[1,1,1,1,1]</code>. Then, we'll need
|
|
to know how many of the numbers <em>equally close</em> as the <code>k</code>th
|
|
element we need to include, which will require another pass through the list.
|
|
{{< /sidenote >}}
|
|
|
|
That's a good start. Let's take a look at the second problem:
|
|
|
|
{{< codelines "text" "cs325-langs/hws/hw3.txt" 33 47 >}}
|
|
|
|
This problem really is easier. We have to find the position of _the_ closest
|
|
element, and then try expand towards either the left or right, depending on
|
|
which end is better. This expansion will take several steps, and will
|
|
likely require a way to "look" at a given part of the list. So let's add two more
|
|
rules. We need a language that also:
|
|
|
|
* Supports looping control flow, such as `while`.
|
|
* {{< sidenote "right" "view-note" "Allows for a \"view\" into the list" >}}
|
|
We could, of course, simply use list indexing. But then, we'd just be making
|
|
a simple imperative language, and that's boring. So let's play around
|
|
with our design a little, and experimentally add such a "list view" component.
|
|
{{< /sidenote >}}
|
|
(like an abstraction over indexing).
|
|
|
|
This is shaping up to be a fun language. Let's take a look at the last problem:
|
|
{{< codelines "text" "cs325-langs/hws/hw3.txt" 50 64 >}}
|
|
|
|
This problem requires more iterations of a list. We have several
|
|
{{< sidenote "right" "cursor-note" "\"cursors\"" >}}
|
|
I always make the language before I write the post, since a lot of
|
|
design decisions change mid-implementation. I realize now that
|
|
"cursors" would've been a better name for this language feature,
|
|
but alas, it is too late.
|
|
{{< /sidenote >}} looking into the list, and depending if the values
|
|
at each of the cursors add up, we do or do not add a new tuple to a list. So,
|
|
two more requirements:
|
|
|
|
* The "cursors" must be able to interact.
|
|
* The language can represent {{< sidenote "left" "tuple-note" "tuples." >}}
|
|
We could, of course, hack some other way to return a list of tuples, but
|
|
it turns out tuples are pretty simple to implement, and help make for nicer
|
|
programming in our language.
|
|
{{< /sidenote >}}
|
|
|
|
I think we've gathered what we want from the homework. Let's move on to the
|
|
language!
|
|
|
|
### A Language
|
|
As is now usual, let's envision a solution to the problems in our language. There
|
|
are actually quite a lot of functions to look at, so let's see them one by one.
|
|
First, let's look at `qselect`.
|
|
|
|
{{< codelines "text" "cs325-langs/sols/hw3.lang" 1 19 >}}
|
|
|
|
After the early return, the first interesting part of the language is the
|
|
use of what I have decided to call a __list traverser__. The list
|
|
traverser is a __generalization of a list index__. Whenever we use a list
|
|
index variable, we generally use the following operations:
|
|
|
|
* __Initialize__: we set the list index to some initial value, such as 0.
|
|
* __Step__: If we're walking the list from left to right, we increment the index.
|
|
If we're walking the list from right to left, we decrement the index.
|
|
* __Validity Check__: We check if the index is still valid (that is, we haven't
|
|
gone past the edge of the list).
|
|
* __Access__: Get the element the cursor is pointing to.
|
|
|
|
A {{< sidenote "right" "cpp-note" "traverser declaration" >}}
|
|
A fun fact is that we've just rediscovered C++
|
|
<a href="http://www.cplusplus.com/reference/iterator/">iterators</a>. C++
|
|
containers and their iterators provide us with the operations I described:
|
|
|
|
We can initialize an iterator like <code>auto it = list.begin()</code>. We
|
|
can step the iterator using <code>it++</code>. We can check its validity
|
|
using <code>it != list.end()</code>, and access what it's pointing to using
|
|
<code>*it</code>. While C++ uses templates and inheritance for this,
|
|
we define a language feature specifically for lists.
|
|
|
|
{{< /sidenote >}} describes these operations. The declartion for the `bisector`
|
|
traverser creates a "cursor" over the list `xs`, that goes between the 0th
|
|
and last elements of `xs`. The declaration for the `pivot` traverser creates
|
|
a "cursor" over the list `xs` that jumps around random locations in the list.
|
|
|
|
The next interesting part of the language is a __traverser macro__. This thing,
|
|
that looks like a function call (but isn't), performs an operation on the
|
|
cursor. For instance, `pop!` removes the element at the cursor from the list,
|
|
whereas `bisect!` categorizes the remaining elements in the cursor's list
|
|
into two lists, using a boolean-returning lambda (written in Java syntax).
|
|
|
|
Note that this implementation of `qselect` takes a function `c`, which it
|
|
uses to judge the actual value of the number. This is because our `qselect`
|
|
won't be finding _the_ smallest number, but the number with the smallest difference
|
|
with `n`. `n` will be factored in via the function.
|
|
|
|
Next up, let's take a look at the function that uses `qselect`, `closestUnsorted`:
|
|
|
|
{{< codelines "text" "cs325-langs/sols/hw3.lang" 21 46 >}}
|
|
|
|
Like we discussed, it finds the `k`th closest element (calling it `min`),
|
|
and counts how many elements that are __equal__ need to be included,
|
|
by setting the number to `k` at first, and subtracting 1 for every number
|
|
it encounters that's closer than `min`. Notice that we use the `valid!` and
|
|
`step!` macros, which implement the operations we described above. Notice
|
|
that the user doesn't deal with adding and subtracting numbers, and doing
|
|
comparisons. All they have to do is ask "am I still good to iterate?"
|
|
|
|
Next, let's take a look at `closestSorted`, which will require more
|
|
traverser macros.
|
|
|
|
{{< codelines "text" "cs325-langs/sols/hw3.lang" 48 70 >}}
|
|
|
|
The first new macro is `canstep!`. This macro just verifies that
|
|
the traverser can make another step. We need this for the "reverse" iterator,
|
|
which indicates the lower bound of the range of numbers we want to return,
|
|
because `subset!` (which itself is just Python's slice, like `xs[a:b]`), uses an inclusive bottom
|
|
index, and thus, we can't afford to step it before knowing that we can, and that
|
|
it's a better choice after the step.
|
|
|
|
Similarly, we have the `at!(t, i)` macro, which looks at the
|
|
traverser `t`, with offset `i`.
|
|
|
|
We have two loops. The first loop runs as long as we can expand the range in both
|
|
directions, and picks the better direction at each iteration. The second loop
|
|
runs as long as we still want more numbers, but have already hit the edge
|
|
of the list on the left or on the right.
|
|
|
|
Finally, let's look at the solution to `xyz`:
|
|
|
|
{{< codelines "text" "cs325-langs/sols/hw3.lang" 72 95 >}}
|
|
|
|
I won't go in depth, but notice that the expression in the `span` part
|
|
of the `traverser` declaration can access another traverser. We treat
|
|
as a feature the fact that this expression isn't immediately evaluated at the place
|
|
of the traverser declaration. Rather, every time that a comparison for a traverser
|
|
operation is performed, this expression is re-evaluated. This allows us to put
|
|
dynamic bounds on traversers `y` and `z`, one of which must not exceed the other.
|
|
|
|
Note also a new keyword that was just used: `sorted`. This is a harmless little
|
|
language feature that automatically calls `.sort()` on the first argument of
|
|
the function.
|
|
|
|
This is more than enough to work with. Let's move on to the implementation.
|
|
|
|
#### Implementation
|
|
Again, let's not go too far into the details of implementing the language from scratch.
|
|
Instead, let's take a look into specific parts of the language that deserve attention.
|
|
|
|
##### Revenge of the State Monad
|
|
Our previous language was, indeed, a respite from complexity. Translation was
|
|
straightforward, and the resulting expressions and statements were plugged straight
|
|
into a handwritten AST. We cannot get away with this here; the language is powerful
|
|
enough to implement three list-based problems, which comes at the cost of increased
|
|
complexity.
|
|
|
|
We need, once again, to generate temporary variables. We also need to keep track of
|
|
which variables are traversers, and the properties of these traversers, throughout
|
|
each function of the language. We thus fall back to using `Control.Monad.State`:
|
|
|
|
{{< codelines "Haskell" "cs325-langs/src/LanguageThree.hs" 198 198 >}}
|
|
|
|
There's one part of the state tuple that we haven't yet explained: the list of
|
|
statements.
|
|
|
|
##### Generating Statements
|
|
Recall that our translation function for expressions in the first homework had the type:
|
|
|
|
```Haskell
|
|
translateExpr :: Expr -> Translator ([Py.PyStmt], Py.PyExpr)
|
|
```
|
|
|
|
We then had to use `do`-notation, and explicitly concatenate lists
|
|
of emitted statements. In this language, I took an alternative route: I made
|
|
the statements part of the state. They are thus implicitly generated and
|
|
stored in the monad, and expression generators don't have to worry about
|
|
concatenating them. When the program is ready to use the generated statements
|
|
(say, when an `if`-statement needs to use the statements emitted by the condition
|
|
expression), we retrieve them from the monad:
|
|
|
|
{{< codelines "Haskell" "cs325-langs/src/LanguageThree.hs" 228 234 >}}
|
|
|
|
I should note, for transparency, that there's a bug in my use of this function.
|
|
When I compile `if`-statements, I accidentally place statements generated by
|
|
the condition into the body of the `if`. This bug doesn't manifest
|
|
in the solutions to the homework problems, and so I decided not to spend any more
|
|
time on fixing it.
|
|
|
|
##### Validating Traverser Declarations
|
|
We declare two separate types that hold traverser data. The first is a kind of "draft"
|
|
type, `TraverserData`:
|
|
|
|
{{< codelines "Haskell" "cs325-langs/src/LanguageThree.hs" 184 190 >}}
|
|
|
|
This record holds all possible configurations of a traverser
|
|
that occur as the program is iterating through the various `key: value` pairs in
|
|
the declaration. For instance, at the very beginning of processing a traverser declaration,
|
|
our program will use a "default" `TraverserData`, with all fields set to `Nothing` or
|
|
their default value. This value will then be modified by the first key/value pair,
|
|
changing, for instance, the list that the traverser operates on. This new modified
|
|
`TraverserData` will then be modified by the next key/value pair, and so on. Doing
|
|
this with every key/value pair (called an option in the below snippet)
|
|
is effectively a foldl operation.
|
|
|
|
{{< codelines "Haskell" "cs325-langs/src/LanguageThree.hs" 378 387 >}}
|
|
|
|
The data may not have all the required fields until the very end, and its type
|
|
reflects that: `Maybe String` here, `Maybe TraverserBounds` there. We don't
|
|
want to deal with unwrapping the `Maybe a` values every time we use the traverser,
|
|
especially if we've done so before. So, we define a `ValidTraverserData` record
|
|
that does not have `Maybe` arguments, and thus, has all the required data. At the
|
|
end of a traverser declaration, we attempt to translate a `TraverserData` into
|
|
a `ValidTraverserData`, invoking `fail` if we can't, and storing the `ValidTraverserData`
|
|
into the state otherwise:
|
|
|
|
{{< codelines "Haskell" "cs325-langs/src/LanguageThree.hs" 408 420 >}}
|
|
|
|
Then, every time we retrieve a traverser from the state,
|
|
define a lookup monadic operation like this:
|
|
|
|
{{< codelines "Haskell" "cs325-langs/src/LanguageThree.hs" 240 244 >}}
|
|
|
|
##### Compiling Macros
|
|
I didn't call them macros for no reason. Clearly, we don't want to generate
|
|
code that
|
|
{{< sidenote "right" "increment-note" "calls functions only to increment an index." >}}
|
|
In fact, there's no easy way to do this at all. Python's integers (if we choose to
|
|
represent our traversers using integers), are immutable. Furthermore, unlike C++,
|
|
where passing by reference allows a function to change its parameters "outside"
|
|
the call, Python offers no way to reassign a different value to a variable given
|
|
to a function.
|
|
<br><br>
|
|
For an example use of C++'s pass-by-reference mechanic, consider <code>std::swap</code>:
|
|
it's a function, but it modifies the two variables given to it. There's no
|
|
way to generically implement such a function in Python.
|
|
{{< /sidenote >}} We also can't allow arbitrary expressions to serve as traversers:
|
|
our translator keeps some context about which variables are traversers, what their
|
|
bounds are, and how they behave. Thus, __calls to traverser macros are very much macros__:
|
|
they operate on AST nodes, and __require__ that their first argument is a variable,
|
|
named like the traverser. We use the `requireTraverser` monadic operation
|
|
to get the traverser associated with the given variable name, and then perform
|
|
the operation as intended. The `at!(t)` operation is straightforward:
|
|
|
|
{{< codelines "Haskell" "cs325-langs/src/LanguageThree.hs" 317 319 >}}
|
|
|
|
The `at!(t,i)` is less so, since it deals with the intricacies of accessing
|
|
the list at either a positive of negative offset, depending on the direction
|
|
of the traverser. We implement a function to properly generate an expression for the offset:
|
|
|
|
{{< codelines "Haskell" "cs325-langs/src/LanguageThree.hs" 246 249 >}}
|
|
|
|
We then implement `at!(t,i)` as follows:
|
|
|
|
{{< codelines "Haskell" "cs325-langs/src/LanguageThree.hs" 320 323 >}}
|
|
|
|
The most complicated macro is `bisect!`. It must be able to step the traverser,
|
|
and also return a tuple of two lists that the bisection yields. We also
|
|
prefer that it didn't pollute the environment with extra variables. To
|
|
achieve this, we want `bisect!` to be a function call. We want this
|
|
function to implement the iteration and list construction.
|
|
|
|
`bisect!`, by definition, takes a lambda. This lambda, in our language, is declared
|
|
in the lexical scope in which `bisect!` is called. Thus, to guarantee correct translation,
|
|
we must do one of two things:
|
|
|
|
1. Translate 1-to-1, and create a lambda, passing it to a fixed `bisect` function declared
|
|
elsewhere.
|
|
2. Translate to a nested function declaration,
|
|
{{< sidenote "right" "inline-note" "inlining the lambda." >}}
|
|
Inlining, in this case, means replacing a call to a function with the function's body.
|
|
We do this to prevent the overhead of calling a function, which typically involves pushing
|
|
on a stack and other extraneous work. If our function is simple, like a simple
|
|
comparison, it doesn't make sense to spend the effort calling it.
|
|
{{< /sidenote >}}
|
|
|
|
Since I quite like the idea of inlining a lambda, let's settle for that. To do this,
|
|
we pull a fresh temporary variable and declare a function, into which we place
|
|
the traverser iteration code, as well as the body of the lambda, with the variable
|
|
substituted for the list access expression.
|
|
{{< sidenote "left" "nonlocal-note" "Here's the code:" >}}
|
|
Reading the lexical scope is one thing, but modifying it is another. To prevent
|
|
accidental changes to the variables outside a nested function, Python assumes
|
|
that variables assigned inside the function body are local to the function. Thus, to make
|
|
sure changing our variable (the traverser index) has an effect outside the function
|
|
(as it should) we must include the <code>nonlocal</code> keyword, telling
|
|
Python that we're not declaring a new, local variable, but mutating the old one.
|
|
{{< /sidenote >}}
|
|
|
|
{{< codelines "Haskell" "cs325-langs/src/LanguageThree.hs" 342 363 >}}
|
|
|
|
### The Output
|
|
Let's see what the compiler spits out:
|
|
|
|
```Python
|
|
from bisect import bisect
|
|
import random
|
|
def qselect(xs,k,c):
|
|
if xs==[]:
|
|
return 0
|
|
bisector = 0
|
|
pivot = random.randrange(len(xs))
|
|
pivotE = xs.pop(pivot)
|
|
def temp1():
|
|
nonlocal bisector
|
|
l = []
|
|
r = []
|
|
while bisector<len(xs):
|
|
if c(xs[bisector])<c(pivotE):
|
|
l.append(xs[bisector])
|
|
else:
|
|
r.append(xs[bisector])
|
|
bisector = bisector+1
|
|
return (l, r)
|
|
(leftList,rightList) = temp1()
|
|
if k>len(leftList)+1:
|
|
return qselect(rightList, k-len(leftList)-1, c)
|
|
elif k==len(leftList)+1:
|
|
return pivotE
|
|
else:
|
|
return qselect(leftList, k, c)
|
|
def closestUnsorted(xs,k,n):
|
|
min = qselect(list(xs), k, (lambda x: abs(x-n)))
|
|
out = []
|
|
countEqual = k
|
|
iter = 0
|
|
while iter<len(xs):
|
|
if abs(xs[iter]-n)<abs(min-n):
|
|
countEqual = countEqual-1
|
|
iter = iter+1
|
|
0
|
|
iter = 0
|
|
while iter<len(xs):
|
|
if abs(xs[iter]-n)==abs(min-n) and countEqual>0:
|
|
countEqual = countEqual-1
|
|
out = out+[xs[iter]]
|
|
elif abs(xs[iter]-n)<abs(min-n):
|
|
out = out+[xs[iter]]
|
|
iter = iter+1
|
|
0
|
|
return out
|
|
def closestSorted(xs,k,n):
|
|
start = bisect(xs, n)
|
|
counter = 0
|
|
left = start
|
|
right = start
|
|
while counter!=k and left-1*1>=0 and right<len(xs):
|
|
if abs(xs[left-1*1]-n)<abs(xs[right]-n):
|
|
left = left-1
|
|
0
|
|
else:
|
|
right = right+1
|
|
0
|
|
counter = counter+1
|
|
while counter!=k and (left-1*1>=0 or right<len(xs)):
|
|
if left-1*1>=0:
|
|
left = left-1
|
|
0
|
|
else:
|
|
right = right+1
|
|
0
|
|
counter = counter+1
|
|
return xs[(left):(right)]
|
|
def xyz(xs,k):
|
|
xs.sort()
|
|
x = 0
|
|
dest = []
|
|
while x<len(xs):
|
|
z = x+2
|
|
y = x+1
|
|
while y<z and z<len(xs):
|
|
if xs[x]+xs[y]==xs[z]:
|
|
dest = dest+[(xs[x], xs[y], xs[z])]
|
|
z = z+1
|
|
0
|
|
elif xs[x]+xs[y]>xs[z]:
|
|
z = z+1
|
|
0
|
|
else:
|
|
y = y+1
|
|
0
|
|
x = x+1
|
|
0
|
|
return dest
|
|
```
|
|
|
|
Observe that the generated code just uses indices, `+`, `-`, and various comparison operators.
|
|
Our traverser is an example of a __zero cost abstraction__, a feature that, conceptually,
|
|
operates at a higher level, making us no longer worry about adding, subtracting, and
|
|
comparing numbers, while, in the final output, not damaging the performance of safety
|
|
of the code. Also observe the various `0` standalone statements. This is an issue
|
|
with the translator: traverser macros may not always yield an expression, but
|
|
the type of `translateExpr` and `translateStmt` effectively requires one. Thus,
|
|
when a macro doesn't generate anything useful, we give it the placeholder expression `0`.
|
|
|
|
That concludes this third post in the series. I hope to see you in the next one!
|