Add first draft of Language 1 for CS325
This commit is contained in:
parent
75664e90bb
commit
a406fb0846
281
content/blog/00_cs325_languages_hw1.md
Normal file
281
content/blog/00_cs325_languages_hw1.md
Normal file
|
@ -0,0 +1,281 @@
|
|||
---
|
||||
title: A Language for an Assignment - Homework 1
|
||||
date: 2019-12-27T23:27:09-08:00
|
||||
draft: true
|
||||
tags: ["Haskell", "Python", "Algorithms"]
|
||||
---
|
||||
|
||||
On a rainy Oregon day, I was walking between classes with a group of friends.
|
||||
We were discussing the various ways to obfuscate solutions to the weekly
|
||||
homework assignments in our Algorithms course: replace every `if` with
|
||||
a ternary expression, use single variable names, put everything on one line.
|
||||
I said:
|
||||
|
||||
> The
|
||||
{{< sidenote "right" "chad-note" "chad" >}}
|
||||
This is in reference to a meme, <a href="https://knowyourmeme.com/memes/virgin-vs-chad">Virgin vs Chad</a>.
|
||||
A "chad" characteristic is masculine or "alpha" to the point of absurdity.
|
||||
{{< /sidenote >}} move would be to make your own, different language for every homework assignment.
|
||||
|
||||
It was required of us to use
|
||||
{{< sidenote "left" "python-note" "Python" >}}
|
||||
A friend suggested making a Haskell program
|
||||
that generates Python-based interpreters for languages. While that would be truly
|
||||
absurd, I'll leave <em>this</em> challenge for another day.
|
||||
{{< /sidenote >}} for our solutions, so that was the first limitation on this challenge.
|
||||
Someone suggested to write the languages in Haskell, since that's what we used
|
||||
in our Programming Languages class. So the final goal ended up:
|
||||
|
||||
* For each of the 10 homework assignments in CS325 - Analysis of Algorithms,
|
||||
* Create a Haskell program that translates a language into,
|
||||
* A valid Python program that works (nearly) out of the box and passes all the test cases.
|
||||
|
||||
It may not be worth it to create a whole
|
||||
{{< sidenote "right" "general-purpose-note" "general-purpose" >}}
|
||||
A general purpose language is one that's designed to be used in vairous
|
||||
domains. For instance, C++ is a general-purpose language because it can
|
||||
be used for embedded systems, GUI programs, and pretty much anything else.
|
||||
This is in contrast to a domain-specific language, such as Game Maker Language,
|
||||
which is aimed at a much narrower set of uses.
|
||||
{{< /sidenote >}} language for each problem,
|
||||
but nowhere in the challenge did we say that it had to be general-purpose. In
|
||||
fact, some interesting design thinking can go into designing a domain-specific
|
||||
language for a particular assignment. So let's jump right into it, and make
|
||||
a language for the the first homework assignment.
|
||||
|
||||
### Homework 1
|
||||
There are two problems in Homework 1. Here they are, verbatim:
|
||||
|
||||
{{< codelines "text" "cs325-langs/hws/hw1.txt" 32 38 >}}
|
||||
|
||||
And the second:
|
||||
|
||||
{{< codelines "text" "cs325-langs/hws/hw1.txt" 47 68 >}}
|
||||
|
||||
We want to make a language __specifically__ for these two tasks (one of which
|
||||
is split into many tasks). What common things can we isolate? I see two:
|
||||
|
||||
First, __all the problems deal with lists__. This may seem like a trivial observation,
|
||||
but these two problems are the __only__ thing we use our language for. We have
|
||||
list access,
|
||||
{{< sidenote "right" "filterting-note" "list filtering" >}}
|
||||
Quickselect is a variation on quicksort, which itself
|
||||
finds all the "lesser" and "greater" elements in the input array.
|
||||
{{< /sidenote >}} and list creation. That should serve as a good base!
|
||||
|
||||
If you squint a little bit, __all the problems are recursive with the same base case__.
|
||||
Consider the first few lines of `search`, implemented naively:
|
||||
|
||||
```Python
|
||||
def search(xs, k):
|
||||
if xs == []:
|
||||
return false
|
||||
```
|
||||
|
||||
How about `sorted`? Take a look:
|
||||
|
||||
```Python
|
||||
def sorted(xs):
|
||||
if xs == []:
|
||||
return []
|
||||
```
|
||||
|
||||
I'm sure you see the picture. But it will take some real mental gymnastics to twist the
|
||||
rest of the problems into this shape. What about `qselect`, for instance? There's two
|
||||
cases for what it may return:
|
||||
|
||||
* `None` or equivalent if the index is out of bounds (we give it `4` an a list `[1, 2]`).
|
||||
* A number if `qselect` worked.
|
||||
|
||||
The test cases never provide a concrete example of what should be returned from
|
||||
`qselect` in the first case, so we'll interpret it like
|
||||
{{< sidenote "right" "undefined-note" "undefined behavior" >}}
|
||||
For a quick sidenote about undefined behavior, check out how
|
||||
C++ optimizes the <a href="https://godbolt.org/z/3skK9j">Collatz Conjecture function</a>.
|
||||
Clang doesn't know whether or not the function will terminate (whether the Collatz Conjecture
|
||||
function terminates is an <a href="https://en.wikipedia.org/wiki/Collatz_conjecture">unsolved problem</a>),
|
||||
but functions that don't terminate are undefined behavior. There's only one other way the function
|
||||
returns, and that's with "1". Thus, clang optimzes the entire function to a single "return 1" call.
|
||||
{{< /sidenote >}} in C++:
|
||||
we can do whatever we want. So, let's allow it to return `[]` in the `None` case.
|
||||
This makes this base case valid:
|
||||
|
||||
```Python
|
||||
def qselect(xs, k):
|
||||
if xs == []:
|
||||
return []
|
||||
```
|
||||
|
||||
"Oh yeah, now it's all coming together." With one more observation (which will come
|
||||
from a piece I haven't yet shown you!), we'll be able to generalize this base case.
|
||||
|
||||
The observation is this section in the assignment:
|
||||
|
||||
{{< codelines "text" "cs325-langs/hws/hw1.txt" 83 98 >}}
|
||||
|
||||
The real key is the part about "returning the `[]` where x should be inserted". It so
|
||||
happens that when the list given to the function is empty, the number should be inserted
|
||||
precisely into that list. Thus:
|
||||
|
||||
```Python
|
||||
def _search(xs, k):
|
||||
if xs == []:
|
||||
return xs
|
||||
```
|
||||
|
||||
The same works for `qselect`:
|
||||
|
||||
```Python
|
||||
def qselect(xs, k):
|
||||
if xs == []:
|
||||
return xs
|
||||
```
|
||||
|
||||
And for sorted, too:
|
||||
|
||||
```Python
|
||||
def sorted(xs):
|
||||
if xs == []:
|
||||
return xs
|
||||
```
|
||||
|
||||
There are some functions that are exceptions, though:
|
||||
|
||||
```Python
|
||||
def insert(xs, k):
|
||||
# We can't return early here!
|
||||
# If we do, we'll never insert anything.
|
||||
```
|
||||
|
||||
Also:
|
||||
|
||||
```Python
|
||||
def search(xs, k):
|
||||
# We have to return true or false, never
|
||||
# an empty list.
|
||||
```
|
||||
|
||||
So, whenever we __don't__ return a list, we don't want to add a special case.
|
||||
We arrive at the following common base case: __whenever a function returns a list, if its first argument
|
||||
is the empty list, the first argument is immediately returned__.
|
||||
|
||||
We've largely exhasuted the conclusiosn we can draw from these problems. Let's get to designing a language.
|
||||
|
||||
### A Silly Language
|
||||
Let's start by visualizing our goals. Without base cases, the solution to `_search`
|
||||
would be something like this:
|
||||
|
||||
{{< codelines "text" "cs325-langs/sols/hw1.lang" 11 14 >}}
|
||||
|
||||
Here we have an __`if`-expression__. It has to have an `else`, and evaluates to the value
|
||||
of the chosen branch. That is, `if true then 0 else 1` evaluates to `0`, while
|
||||
`if false then 0 else 1` evaluates to `1`. Otherwise, we follow the binary tree search
|
||||
algorithm faithfully.
|
||||
|
||||
Using this definition of `_search`, we can define `search` pretty easily:
|
||||
|
||||
{{< codelines "text" "cs325-langs/sols/hw1.lang" 17 17 >}}
|
||||
|
||||
Let's use Haskell's `(++)` operator for concatentation. This will help us understand
|
||||
when the user is operating on lists, and when they're not. With this, `sorted` becomes:
|
||||
|
||||
{{< codelines "text" "cs325-langs/sols/hw1.lang" 16 16 >}}
|
||||
|
||||
Let's go for `qselect` now. We'll introduce a very silly language feature for this
|
||||
problem:
|
||||
{{< sidenote "right" "selector-note" "list selectors" >}}
|
||||
You've probably never heard of list selectors, and for a good reason:
|
||||
this is a <em>terrible</em> language feature. I'll go in more detail
|
||||
later, but I wanted to make this clear right away.
|
||||
{{< /sidenote >}}. We observe that `qselect` aims to partition the list into
|
||||
other lists. We thus add the following pieces of syntax:
|
||||
|
||||
```
|
||||
~xs -> {
|
||||
pivot <- xs[rand]!
|
||||
left <- xs[#0 <= pivot]
|
||||
...
|
||||
} -> ...
|
||||
```
|
||||
|
||||
There are three new things here.
|
||||
|
||||
1. The actual "list selector": `~xs -> { .. } -> ...`. Between the curly braces
|
||||
are branches which select parts of the list and assign them to new variables.
|
||||
Thus, `pivot <- xs[rand]!` assigns the element at a random index to the variable `pivot`.
|
||||
the `!` at the end means "after taking this out of `xs`, delete it from `xs`". The
|
||||
syntax {{< sidenote "right" "curly-note" "starts with \"~\"" >}}
|
||||
An observant reader will note that there's no need for the "xs" after the "~".
|
||||
The idea was to add a special case syntax to reference the "selected list", but
|
||||
I ended up not bothering. So in fact, this part of the syntax is useless.
|
||||
{{< /sidenote >}} to make it easier to parse.
|
||||
2. The `rand` list access syntax. `xs[rand]` is a special case that picks a random
|
||||
element from `xs`.
|
||||
3. The `xs[#0 <= pivot]` syntax. This is another special case that selects all elements
|
||||
from `xs` that match the given predicate (where `#0` is replaced with each element in `xs`).
|
||||
|
||||
The big part of qselect is to not evaluate `right` unless you have to. So, we shouldn't
|
||||
eagerly evaluate the list selector. We also don't want something like `right[|right|-1]` to evaluate
|
||||
`right` twice. So we settle on
|
||||
{{< sidenote "right" "lazy-note" "lazy evaluation" >}}
|
||||
Lazy evaluation means only evaluating an expression when we need to. Thus,
|
||||
although we might encounter the expression for <code>right</code>, we
|
||||
only evaluate it when the time comes. Lazy evaluation, at least
|
||||
the way that Haskell has it, is more specific: an expression is evaluated only
|
||||
once, or not at all.
|
||||
{{</ sidenote >}}.
|
||||
Ah, but the `!` marker introduces
|
||||
{{< sidenote "left" "side-effect-note" "side effects" >}}
|
||||
A side effect is a term frequently used when talking about functional programming.
|
||||
Evaluating the expression <code>xs[rand]!</code> doesn't just get a random element,
|
||||
it also changes <em>something else</em>. In this case, that something else is
|
||||
the <code>xs</code> list.
|
||||
{{< /sidenote >}}. So we can't just evaluate these things all willy-nilly.
|
||||
So, let's make it so that each expression in the selector list requires the ones above it. Thus,
|
||||
`left` will require `pivot`, and `right` will require `left` and `pivot`. So,
|
||||
lazily evaluated, ordered expressions. The whole `qselect` becomes:
|
||||
|
||||
{{< codelines "text" "cs325-langs/sols/hw1.lang" 1 9 >}}
|
||||
|
||||
We've now figured out all the language constructs. Let's start working on
|
||||
some implementation!
|
||||
|
||||
#### Data Definitions
|
||||
Let's start with defining the AST and other data types for our language:
|
||||
|
||||
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 14 52 >}}
|
||||
|
||||
The `PossibleType` class will be used when we figure out if a function returns
|
||||
a list or not, for our base case insertion rule. The `Selector` type
|
||||
will hold a single line in the list selector we defined earlier, and
|
||||
the `SelectorMarker` will indicate if the user added the `!` "remove from list"
|
||||
marker at the end. To represent the various operators in our language, we create
|
||||
the `Op` data type. Note that unlike Python, `++` (list concatenation) and
|
||||
`+` (addition) are different operators in our language.
|
||||
|
||||
We then define valid expressions. Obviously, a variable (like `xs`), an
|
||||
integer literal (like `1`) and a list literal (like `[]`) are allowed.
|
||||
We also put in our selector, which consists of the expression on the
|
||||
left, the list of selector branches (`[Selector]`) and the expression
|
||||
of "what to actually do with the new variables". We also
|
||||
add `if`-expressions (like we discussed), and function calls. Lastly,
|
||||
we add binary operators like (`x+y`), the length operator (`|xs|`),
|
||||
and the list access operator (`xs[0]`). We also make `#0` a part
|
||||
of the expression syntax, even though it's only allowed inside
|
||||
a list access.
|
||||
|
||||
Of course, we wouldn't want to write our language using
|
||||
Haskell. We want to actually write a text file, like `hw1.lang`,
|
||||
and then have our program translate that to Python. The first
|
||||
step to that is __parsing__: we need to turn our language text
|
||||
into the `Expr` structure we have.
|
||||
|
||||
#### Parsing
|
||||
We'll be using `Parsec` for parsing. `Parsec` is a parsing library
|
||||
based on
|
||||
{{< sidenote "right" "monad-note" "monadic" >}}
|
||||
Haskell is a language with more monad tutorials than
|
||||
programmers. For this reason, I will resist the temptation
|
||||
to explain what monads are. If you <em>don't</em> know
|
||||
what they are, don't worry, there are plenty of other resources.
|
||||
{{< /sidenote >}} parser combinators.
|
Loading…
Reference in New Issue
Block a user