220 lines
8.8 KiB
Markdown
220 lines
8.8 KiB
Markdown
---
|
|
title: A Language for an Assignment - Homework 2
|
|
date: 2019-12-30T20:05:10-08:00
|
|
tags: ["Haskell", "Python", "Algorithms", "Programming Languages"]
|
|
series: "A Language for an Assignment"
|
|
---
|
|
|
|
After the madness of the
|
|
[language for homework 1]({{< relref "00_cs325_languages_hw1.md" >}}),
|
|
the solution to the second homework offers a moment of respite.
|
|
Let's get right into the problems, shall we?
|
|
|
|
### Homework 2
|
|
Besides some free-response questions, the homework contains
|
|
two problems. The first:
|
|
|
|
{{< codelines "text" "cs325-langs/hws/hw2.txt" 29 34 >}}
|
|
|
|
And the second:
|
|
|
|
{{< codelines "text" "cs325-langs/hws/hw2.txt" 36 44 >}}
|
|
|
|
At first glance, it's not obvious why these problems are good for
|
|
us. However, there's one key observation: __`num_inversions` can be implemented
|
|
using a slightly-modified `mergesort`__. The trick is to maintain a counter
|
|
of inversions in every recursive call to `mergesort`, updating
|
|
it every time we take an element from the
|
|
{{< sidenote "right" "right-note" "right list" >}}
|
|
If this nomenclature is not clear to you, recall that
|
|
mergesort divides a list into two smaller lists. The
|
|
"right list" refers to the second of the two, because
|
|
if you visualize the original list as a rectangle, and cut
|
|
it in half (vertically, down the middle), then the second list
|
|
(from the left) is on the right.
|
|
{{< /sidenote >}} while there are still elements in the
|
|
{{< sidenote "left" "left-note" "left list" >}}
|
|
Why this is the case is left as an exercise to the reader.
|
|
{{< /sidenote >}}.
|
|
When we return from the call,
|
|
we add up the number of inversions from running `num_inversions`
|
|
on the smaller lists, and the number of inversions that we counted
|
|
as I described. We then return both the total number
|
|
of inversions and the sorted list.
|
|
|
|
So, we either perform the standard mergesort, or we perform mergesort
|
|
with additional steps added on. The additional steps can be divided into
|
|
three general categories:
|
|
|
|
1. __Initialization__: We create / set some initial state. This state
|
|
doesn't depend on the lists or anything else.
|
|
2. __Effect__: Each time that an element is moved from one of the two smaller
|
|
lists into the output list, we may change the state in some way (create
|
|
an effect).
|
|
3. __Combination__: The final state, and the results of the two
|
|
sub-problem states, are combined into the output of the function.
|
|
|
|
This is all very abstract. In the concrete case of inversions,
|
|
these steps are as follows:
|
|
|
|
1. __Initializtion__: The initial state, which is just the counter, is set to 0.
|
|
2. __Effect__: Each time an element is moved, if it comes from the right list,
|
|
the number of inversions is updated.
|
|
3. __Combination__: We update the state, simply adding the left and right
|
|
inversion counts.
|
|
|
|
We can make a language out of this!
|
|
|
|
### A Language
|
|
Again, let's start by visualizing what the solution will look like. How about this:
|
|
|
|
{{< rawblock "cs325-langs/sols/hw2.lang" >}}
|
|
|
|
We divide the code into the same three steps that we described above. The first
|
|
section is the initial state. Since it doesn't depend on anything, we expect
|
|
it to be some kind of literal, like an integer. Next, we have the effect section,
|
|
which has access to the variables below:
|
|
|
|
* `STATE`, to manipulate or check the current state.
|
|
* `LEFT` and `RIGHT`, to access the two lists being merged.
|
|
* `L` and `R`, constants that are used to compare against the `SOURCE` variable.
|
|
* `SOURCE`, to denote which list a number came from.
|
|
* `LSTATE` and `RSTATE`, to denote the final states from the two subproblems.
|
|
|
|
We use an `if`-statement to check if the element that was popped came
|
|
from the right list (by checking `SOURCE == R`). If it did, we increment the counter
|
|
(state) by the proper amount. In the combine step, which has access to the
|
|
same variables, we simply increment the state by the counters from the left
|
|
and right solutions, stored in `LSTATE` and `RSTATE`. That's it!
|
|
|
|
#### Implementation
|
|
The implementation is not tricky at all. We don't need to use monads like we did last
|
|
time, and nor do we have to perform any fancy Python nested function declarations.
|
|
|
|
To keep with the Python convention of lowercase variables, we'll translate the
|
|
uppercase "global" variables to lowercase. We'll do it like so:
|
|
|
|
{{< codelines "Haskell" "cs325-langs/src/LanguageTwo.hs" 167 176 >}}
|
|
|
|
Note that we translated `L` and `R` to integer literals. We'll indicate the source of
|
|
each element with an integer, since there's no real point to representing it with
|
|
a string or a variable. We'll need to be aware of this when we implement the actual, generic
|
|
mergesort code. Let's do that now:
|
|
|
|
{{< codelines "Haskell" "cs325-langs/src/LanguageTwo.hs" 101 161 >}}
|
|
|
|
This is probably the ugliest part of this assignment: we handwrote a Python
|
|
AST in Haskell that implements mergesort with our augmentations. Note that
|
|
this is a function, which takes a `Py.PyExpr` (the initial state expression),
|
|
and two lists of `Py.PyStmt`, which are the "effect" and "combination" code,
|
|
respectively. We simply splice them into our regular mergesort function.
|
|
The translation is otherwise pretty trivial, so there's no real reason
|
|
to show it here.
|
|
|
|
### The Output
|
|
What's the output of our solution to `num_inversions`? Take a look for yourself:
|
|
|
|
```Python
|
|
def prog(xs):
|
|
if len(xs)<2:
|
|
return (0, xs)
|
|
leng = len(xs)//2
|
|
left = xs[:(leng)]
|
|
right = xs[(leng):]
|
|
(ls,left) = prog(left)
|
|
(rs,right) = prog(right)
|
|
left.reverse()
|
|
right.reverse()
|
|
state = 0
|
|
source = 0
|
|
total = []
|
|
while (left!=[])and(right!=[]):
|
|
if left[-1]<=right[-1]:
|
|
total.append(left.pop())
|
|
source = 1
|
|
else:
|
|
total.append(right.pop())
|
|
source = 2
|
|
if source==2:
|
|
state = state+len(left)
|
|
state = state+ls+rs
|
|
left.reverse()
|
|
right.reverse()
|
|
return (state, total+left+right)
|
|
```
|
|
|
|
Honestly, that's pretty clean. As clean as `left.reverse()` to allow for \(O(1)\) pop is.
|
|
What's really clean, however, is the implementation of mergesort in our language.
|
|
It goes as follows:
|
|
|
|
```
|
|
state 0;
|
|
effect {}
|
|
combine {}
|
|
```
|
|
|
|
To implement mergesort in our language, which describes mergesort variants, all
|
|
we have to do is not specify any additional behavior. Cool, huh?
|
|
|
|
That's the end of this post. If you liked this one (and the previous one!),
|
|
keep an eye out for more!
|
|
|
|
### Appendix (Missing Homework Question)
|
|
I should not view homework assignments on a small-screen device. There __was__ a third problem
|
|
on homework 2:
|
|
|
|
{{< codelines "text" "cs325-langs/hws/hw2.txt" 46 65 >}}
|
|
|
|
This is not a mergesort variant, and adding support for it into our second language
|
|
will prevent us from making it the neat specialized
|
|
{{< sidenote "right" "dsl-note" "DSL" >}}
|
|
DSL is a shortened form of "domain specific language", which was briefly
|
|
described in another sidenote while solving homework 1.
|
|
{{< /sidenote >}} that was just saw. We'll do something else, instead:
|
|
we'll use the language we defined in homework 1 to solve this
|
|
problem:
|
|
|
|
```
|
|
empty() = [0, 0];
|
|
longest(xs) =
|
|
if |xs| != 0
|
|
then _longest(longest(xs[0]), longest(xs[2]))
|
|
else empty();
|
|
_longest(l, r) = [max(l[0], r[0]) + 1, max(l[0]+r[0], max(l[1], r[1]))];
|
|
```
|
|
|
|
{{< sidenote "right" "terrible-note" "This is quite terrible." >}}
|
|
This is probably true with any program written in our first
|
|
language.
|
|
{{< /sidenote >}} In these 6 lines of code, there are two hacks
|
|
to work around the peculiarities of the language.
|
|
|
|
At each recursive call, we want to keep track of both the depth
|
|
of the tree and the existing longest path. This is because
|
|
the longest path could be found either somewhere down
|
|
a subtree, or from combining the largest depths of
|
|
two subtrees. To return two values from a function in Python,
|
|
we'd use a tuple. Here, we use a list.
|
|
|
|
Alarm bells should be going off here. There's no reason why we should
|
|
ever return an empty list from the recursive call: at the very least, we
|
|
want to return `[0,0]`. But placing such a list literal in a function
|
|
will trigger the special case insertion. So, we have to hide this literal
|
|
from the compiler. Fortunately, that's not too hard to do - the compiler
|
|
is pretty halfhearted in its inference of types. Simply putting
|
|
the literal behind a constant function (`empty`) does the trick.
|
|
|
|
The program uses the subproblem depths multiple times in the
|
|
final computation. We thus probably want to assign these values
|
|
to names so we don't have to perform any repeated work. Since
|
|
the only two mechanisms for
|
|
{{< sidenote "right" "binding-note" "binding variables" >}}
|
|
To bind a variable means to assign a value to it.
|
|
{{< /sidenote >}} in this language are function calls
|
|
and list selectors, we use a helper function `_longest`,
|
|
which takes two subproblem solutions an combines them
|
|
into a new solution. It's pretty obvious that `_longest`
|
|
returns a list, so the compiler will try insert a base
|
|
case. Fortunately, subproblem solutions are always
|
|
lists of two numbers, so this doesn't affect us too much.
|