513 lines
21 KiB
Markdown
513 lines
21 KiB
Markdown
---
|
|
title: A Language for an Assignment - Homework 1
|
|
date: 2019-12-27T23:27:09-08:00
|
|
tags: ["Haskell", "Python", "Algorithms", "Programming Languages"]
|
|
series: "A Language for an Assignment"
|
|
---
|
|
|
|
On a rainy Oregon day, I was walking between classes with a group of friends.
|
|
We were discussing the various ways to obfuscate solutions to the weekly
|
|
homework assignments in our Algorithms course: replace every `if` with
|
|
a ternary expression, use single variable names, put everything on one line.
|
|
I said:
|
|
|
|
> The
|
|
{{< sidenote "right" "chad-note" "chad" >}}
|
|
This is in reference to a meme, <a href="https://knowyourmeme.com/memes/virgin-vs-chad">Virgin vs Chad</a>.
|
|
A "chad" characteristic is masculine or "alpha" to the point of absurdity.
|
|
{{< /sidenote >}} move would be to make your own, different language for every homework assignment.
|
|
|
|
It was required of us to use
|
|
{{< sidenote "left" "python-note" "Python" >}}
|
|
A friend suggested making a Haskell program
|
|
that generates Python-based interpreters for languages. While that would be truly
|
|
absurd, I'll leave <em>this</em> challenge for another day.
|
|
{{< /sidenote >}} for our solutions, so that was the first limitation on this challenge.
|
|
Someone suggested to write the languages in Haskell, since that's what we used
|
|
in our Programming Languages class. So the final goal ended up:
|
|
|
|
* For each of the 10 homework assignments in CS325 - Analysis of Algorithms,
|
|
* Create a Haskell program that translates a language into,
|
|
* A valid Python program that works (nearly) out of the box and passes all the test cases.
|
|
|
|
It may not be worth it to create a whole
|
|
{{< sidenote "right" "general-purpose-note" "general-purpose" >}}
|
|
A general purpose language is one that's designed to be used in various
|
|
domains. For instance, C++ is a general-purpose language because it can
|
|
be used for embedded systems, GUI programs, and pretty much anything else.
|
|
This is in contrast to a domain-specific language, such as Game Maker Language,
|
|
which is aimed at a much narrower set of uses.
|
|
{{< /sidenote >}} language for each problem,
|
|
but nowhere in the challenge did we say that it had to be general-purpose. In
|
|
fact, some interesting design thinking can go into designing a domain-specific
|
|
language for a particular assignment. So let's jump right into it, and make
|
|
a language for the first homework assignment.
|
|
|
|
### Homework 1
|
|
There are two problems in Homework 1. Here they are, verbatim:
|
|
|
|
{{< codelines "text" "cs325-langs/hws/hw1.txt" 32 38 >}}
|
|
|
|
And the second:
|
|
|
|
{{< codelines "text" "cs325-langs/hws/hw1.txt" 47 68 >}}
|
|
|
|
We want to make a language __specifically__ for these two tasks (one of which
|
|
is split into many tasks). What common things can we isolate? I see two:
|
|
|
|
First, __all the problems deal with lists__. This may seem like a trivial observation,
|
|
but these two problems are the __only__ thing we use our language for. We have
|
|
list access,
|
|
{{< sidenote "right" "filterting-note" "list filtering" >}}
|
|
Quickselect is a variation on quicksort, which itself
|
|
finds all the "lesser" and "greater" elements in the input array.
|
|
{{< /sidenote >}} and list creation. That should serve as a good base!
|
|
|
|
If you squint a little bit, __all the problems are recursive with the same base case__.
|
|
Consider the first few lines of `search`, implemented naively:
|
|
|
|
```Python
|
|
def search(xs, k):
|
|
if xs == []:
|
|
return false
|
|
```
|
|
|
|
How about `sorted`? Take a look:
|
|
|
|
```Python
|
|
def sorted(xs):
|
|
if xs == []:
|
|
return []
|
|
```
|
|
|
|
I'm sure you see the picture. But it will take some real mental gymnastics to twist the
|
|
rest of the problems into this shape. What about `qselect`, for instance? There's two
|
|
cases for what it may return:
|
|
|
|
* `None` or equivalent if the index is out of bounds (we give it `4` an a list `[1, 2]`).
|
|
* A number if `qselect` worked.
|
|
|
|
The test cases never provide a concrete example of what should be returned from
|
|
`qselect` in the first case, so we'll interpret it like
|
|
{{< sidenote "right" "undefined-note" "undefined behavior" >}}
|
|
For a quick sidenote about undefined behavior, check out how
|
|
C++ optimizes the <a href="https://godbolt.org/z/3skK9j">Collatz Conjecture function</a>.
|
|
Clang doesn't know whether or not the function will terminate (whether the Collatz Conjecture
|
|
function terminates is an <a href="https://en.wikipedia.org/wiki/Collatz_conjecture">unsolved problem</a>),
|
|
but functions that don't terminate are undefined behavior. There's only one other way the function
|
|
returns, and that's with "1". Thus, clang optimizes the entire function to a single "return 1" call.
|
|
{{< /sidenote >}} in C++:
|
|
we can do whatever we want. So, let's allow it to return `[]` in the `None` case.
|
|
This makes this base case valid:
|
|
|
|
```Python
|
|
def qselect(xs, k):
|
|
if xs == []:
|
|
return []
|
|
```
|
|
|
|
"Oh yeah, now it's all coming together." With one more observation (which will come
|
|
from a piece I haven't yet shown you!), we'll be able to generalize this base case.
|
|
|
|
The observation is this section in the assignment:
|
|
|
|
{{< codelines "text" "cs325-langs/hws/hw1.txt" 83 98 >}}
|
|
|
|
The real key is the part about "returning the `[]` where x should be inserted". It so
|
|
happens that when the list given to the function is empty, the number should be inserted
|
|
precisely into that list. Thus:
|
|
|
|
```Python
|
|
def _search(xs, k):
|
|
if xs == []:
|
|
return xs
|
|
```
|
|
|
|
The same works for `qselect`:
|
|
|
|
```Python
|
|
def qselect(xs, k):
|
|
if xs == []:
|
|
return xs
|
|
```
|
|
|
|
And for sorted, too:
|
|
|
|
```Python
|
|
def sorted(xs):
|
|
if xs == []:
|
|
return xs
|
|
```
|
|
|
|
There are some functions that are exceptions, though:
|
|
|
|
```Python
|
|
def insert(xs, k):
|
|
# We can't return early here!
|
|
# If we do, we'll never insert anything.
|
|
```
|
|
|
|
Also:
|
|
|
|
```Python
|
|
def search(xs, k):
|
|
# We have to return true or false, never
|
|
# an empty list.
|
|
```
|
|
|
|
So, whenever we __don't__ return a list, we don't want to add a special case.
|
|
We arrive at the following common base case: __whenever a function returns a list, if its first argument
|
|
is the empty list, the first argument is immediately returned__.
|
|
|
|
We've largely exhasuted the conclusiosn we can draw from these problems. Let's get to designing a language.
|
|
|
|
### A Silly Language
|
|
Let's start by visualizing our goals. Without base cases, the solution to `_search`
|
|
would be something like this:
|
|
|
|
{{< codelines "text" "cs325-langs/sols/hw1.lang" 11 14 >}}
|
|
|
|
Here we have an __`if`-expression__. It has to have an `else`, and evaluates to the value
|
|
of the chosen branch. That is, `if true then 0 else 1` evaluates to `0`, while
|
|
`if false then 0 else 1` evaluates to `1`. Otherwise, we follow the binary tree search
|
|
algorithm faithfully.
|
|
|
|
Using this definition of `_search`, we can define `search` pretty easily:
|
|
|
|
{{< codelines "text" "cs325-langs/sols/hw1.lang" 17 17 >}}
|
|
|
|
Let's use Haskell's `(++)` operator for concatentation. This will help us understand
|
|
when the user is operating on lists, and when they're not. With this, `sorted` becomes:
|
|
|
|
{{< codelines "text" "cs325-langs/sols/hw1.lang" 16 16 >}}
|
|
|
|
Let's go for `qselect` now. We'll introduce a very silly language feature for this
|
|
problem:
|
|
{{< sidenote "right" "selector-note" "list selectors" >}}
|
|
You've probably never heard of list selectors, and for a good reason:
|
|
this is a <em>terrible</em> language feature. I'll go in more detail
|
|
later, but I wanted to make this clear right away.
|
|
{{< /sidenote >}}. We observe that `qselect` aims to partition the list into
|
|
other lists. We thus add the following pieces of syntax:
|
|
|
|
```
|
|
~xs -> {
|
|
pivot <- xs[rand]!
|
|
left <- xs[#0 <= pivot]
|
|
...
|
|
} -> ...
|
|
```
|
|
|
|
There are three new things here.
|
|
|
|
1. The actual "list selector": `~xs -> { .. } -> ...`. Between the curly braces
|
|
are branches which select parts of the list and assign them to new variables.
|
|
Thus, `pivot <- xs[rand]!` assigns the element at a random index to the variable `pivot`.
|
|
the `!` at the end means "after taking this out of `xs`, delete it from `xs`". The
|
|
syntax {{< sidenote "right" "curly-note" "starts with \"~\"" >}}
|
|
An observant reader will note that there's no need for the "xs" after the "~".
|
|
The idea was to add a special case syntax to reference the "selected list", but
|
|
I ended up not bothering. So in fact, this part of the syntax is useless.
|
|
{{< /sidenote >}} to make it easier to parse.
|
|
2. The `rand` list access syntax. `xs[rand]` is a special case that picks a random
|
|
element from `xs`.
|
|
3. The `xs[#0 <= pivot]` syntax. This is another special case that selects all elements
|
|
from `xs` that match the given predicate (where `#0` is replaced with each element in `xs`).
|
|
|
|
The big part of qselect is to not evaluate `right` unless you have to. So, we shouldn't
|
|
eagerly evaluate the list selector. We also don't want something like `right[|right|-1]` to evaluate
|
|
`right` twice. So we settle on
|
|
{{< sidenote "right" "lazy-note" "lazy evaluation" >}}
|
|
Lazy evaluation means only evaluating an expression when we need to. Thus,
|
|
although we might encounter the expression for <code>right</code>, we
|
|
only evaluate it when the time comes. Lazy evaluation, at least
|
|
the way that Haskell has it, is more specific: an expression is evaluated only
|
|
once, or not at all.
|
|
{{</ sidenote >}}.
|
|
Ah, but the `!` marker introduces
|
|
{{< sidenote "left" "side-effect-note" "side effects" >}}
|
|
A side effect is a term frequently used when talking about functional programming.
|
|
Evaluating the expression <code>xs[rand]!</code> doesn't just get a random element,
|
|
it also changes <em>something else</em>. In this case, that something else is
|
|
the <code>xs</code> list.
|
|
{{< /sidenote >}}. So we can't just evaluate these things all willy-nilly.
|
|
So, let's make it so that each expression in the selector list requires the ones above it. Thus,
|
|
`left` will require `pivot`, and `right` will require `left` and `pivot`. So,
|
|
lazily evaluated, ordered expressions. The whole `qselect` becomes:
|
|
|
|
{{< codelines "text" "cs325-langs/sols/hw1.lang" 1 9 >}}
|
|
|
|
We've now figured out all the language constructs. Let's start working on
|
|
some implementation!
|
|
|
|
#### Implementation
|
|
It would be silly of me to explain every detail of creating a language in Haskell
|
|
in this post; this is neither the purpose of the post, nor is it plausible
|
|
to do this without covering monads, parser combinators, grammars, abstract syntax
|
|
trees, and more. So, instead, I'll discuss the _interesting_ parts of the
|
|
implementation.
|
|
|
|
##### Temporary Variables
|
|
Our language is expression-based, yes. A function is a single,
|
|
arbitrarily complex expression (involving `if/else`, list
|
|
selectors, and more). So it would make sense to translate
|
|
a function to a single, arbitrarily complex Python expression.
|
|
However, the way we've designed our language makes it
|
|
not-so-suitable for converting to a single expression! For
|
|
instance, consider `xs[rand]`. We need to compute the list,
|
|
get its length, generate a random number, and then access
|
|
the corresponding element in the list. We use the list
|
|
here twice, and simply repeating the expression would not
|
|
be very smart: we'd be evaluating twice. So instead,
|
|
we'll use a variable, assign the list to that variable,
|
|
and then access that variable multiple times.
|
|
|
|
To be extra safe, let's use a fresh temporary variable
|
|
every time we need to store something. The simplest
|
|
way is to simply maintain a counter of how many temporary
|
|
variables we've already used, and generate a new variable
|
|
by prepending the word "temp" to that number. We start
|
|
with `temp0`, then `temp1`, and so on. To keep a counter,
|
|
we can use a state monad:
|
|
|
|
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 230 230 >}}
|
|
|
|
Don't worry about the `Map.Map String [String]`, we'll get to that in a bit.
|
|
For now, all we have to worry about is the second element of the tuple,
|
|
the integer counting how many temporary variables we've used. We can
|
|
get the current temporary variable as follows:
|
|
|
|
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 232 235 >}}
|
|
|
|
We can also get a fresh temporary variable like this:
|
|
|
|
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 237 240 >}}
|
|
|
|
Now, the
|
|
{{< sidenote "left" "code-note" "code" >}}
|
|
Since we are translating an expression, we must have the result of
|
|
the translation yield an Python expression we can use in generating
|
|
larger Python expressions. However, as we've seen, we occasionally
|
|
have to use statements. Thus, the <code>translateExpr</code> function
|
|
returns a <code>Translator ([Py.PyStmt], Py.PyExpr)</code>.
|
|
{{< /sidenote >}}for generating a random list access looks like
|
|
{{< sidenote "right" "ast-note" "this:" >}}
|
|
The <code>Py.*</code> constructors are a part of a Python AST module I quickly
|
|
threw together. I won't showcase it here, but you can always look at the
|
|
source code for the blog (which includes this project)
|
|
<a href="https://dev.danilafe.com/Web-Projects/blog-static">here</a>.
|
|
{{< /sidenote >}}
|
|
|
|
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 325 330 >}}
|
|
|
|
##### Implementing "lazy evaluation"
|
|
Lazy evaluation in functional programs usually arises from
|
|
{{< sidenote "right" "graph-note" "graph reduction" >}}
|
|
Graph reduction, more specifically the <em>Spineless,
|
|
Tagless G-machine</em> is at the core of the Glasgow Haskell
|
|
Compiler (GHC). Simon Peyton Jones' earlier book,
|
|
<em>Implementing Functional Languages: a tutorial</em>
|
|
details an earlier version of the G-machine.
|
|
{{< /sidenote >}}. However, Python is neither
|
|
functional nor graph-based, and we only lazily
|
|
evaluate list selectors. Thus, we'll have to do
|
|
some work to get our lazy evaluation to work as we desire.
|
|
Here's what I came up with:
|
|
|
|
1. It's difficult to insert Python statements where they are
|
|
needed: we'd have to figure out in which scope each variable
|
|
has already been declared, and in which scope it's yet
|
|
to be assigned.
|
|
2. Instead, we can use a Python dictionary, called `cache`,
|
|
and store computed versions of each variable in the cache.
|
|
3. It's pretty difficult to check if a variable
|
|
is in the cache, compute it if not, and then return the
|
|
result of the computation, in one expression. This is
|
|
true, unless that single expression is a function call, and we have a dedicated
|
|
function that takes no arguments, computes the expression if needed,
|
|
and uses the cache otherwise. We choose this route.
|
|
4. We have already promised that we'd evaluate all the selected
|
|
variables above a given variable before evaluating the variable
|
|
itself. So, each function will first call (and therefore
|
|
{{< sidenote "right" "force-note" "force" >}}
|
|
Forcing, in this case, comes from the context of lazy evaluation. To
|
|
force a variable or an expression is to tell the program to compute its
|
|
value, even though it may have been putting it off.
|
|
{{< /sidenote >}}) the functions
|
|
generated for variables declared above the function's own variable.
|
|
5. To keep track of all of this, we use the already-existing state monad
|
|
as a reader monad (that is, we clear the changes we make to the monad
|
|
after we're done translating the list selector). This is where the `Map.Map String [String]`
|
|
comes from.
|
|
|
|
The `Map.Map String [String]` keeps track of variables that will be lazily computed,
|
|
and also of the dependencies of each variable (the variables that need
|
|
to be access before the variable itself). We compute such a map for
|
|
each selector as follows:
|
|
|
|
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 298 298 >}}
|
|
|
|
We update the existing map using `Map.union`:
|
|
|
|
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 299 299 >}}
|
|
|
|
And, after we're done generating expressions in the body of this selector,
|
|
we clear it to its previous value `vs`:
|
|
|
|
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 302 302 >}}
|
|
|
|
We generate a single selector as follows:
|
|
|
|
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 268 281 >}}
|
|
|
|
This generates a function definition statement, which we will examine in
|
|
generated Python code later on.
|
|
|
|
Solving the problem this way also introduces another gotcha: sometimes,
|
|
a variable is produced by a function call, and other times the variable
|
|
is just a Python variable. We write this as follows:
|
|
|
|
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 283 288 >}}
|
|
|
|
##### Special Case Insertion
|
|
This is a silly language for a single homework assignment. I'm not
|
|
planning to implement Hindley-Milner type inference, or anything
|
|
of that sort. For the purpose of this language, things will be
|
|
either a list, or not a list. And as long as a function __can__ return
|
|
a list, it can also return the list from its base case. Thus,
|
|
that's all we will try to figure out. The checking code is so
|
|
short that we can include the whole snippet at once:
|
|
|
|
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 219 227 >}}
|
|
|
|
`mergePossibleType`
|
|
{{< sidenote "right" "bool-identity-note" "figures out" >}}
|
|
An observant reader will note that this is just a logical
|
|
OR function. It's not, however, good practice to use
|
|
booleans for types that have two constructors with no arguments.
|
|
Check out this <a href="https://programming-elm.com/blog/2019-05-20-solving-the-boolean-identity-crisis-part-1/">
|
|
Elm-based article</a> about this, which the author calls the
|
|
Boolean Identity Crisis.
|
|
{{< /sidenote >}}, given two possible types for an
|
|
expression, the final type for the expression.
|
|
|
|
There's only one real trick to this. Sometimes, like in
|
|
`_search`, the only time we return something _known_ to be a list, that
|
|
something is `xs`. Since we're making a list manipulation language,
|
|
let's __assume the first argument to the function is a list__, and
|
|
__use this information to determine expression types__. We guess
|
|
types in a very basic manner otherwise: If you use the concatenation
|
|
operator, or a list literal, then obviously we're working on a list.
|
|
If you're returning the first argument of the function, that's also
|
|
a list. Otherwise, it could be anything.
|
|
|
|
My Haskell linter actually suggested a pretty clever way of writing
|
|
the whole "add a base case if this function returns a list" code.
|
|
Check it out:
|
|
|
|
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 260 266 >}}
|
|
|
|
Specifically, look at the line with `let fastReturn = ...`. It
|
|
uses a list comprehension: we take a parameter `p` from the list of
|
|
parameter `ps`, but only produce the statements for the base case
|
|
if the possible type computed using `p` is `List`.
|
|
|
|
### The Output
|
|
What kind of beast have we created? Take a look for yourself:
|
|
```Python
|
|
def qselect(xs,k):
|
|
if xs==[]:
|
|
return xs
|
|
cache = {}
|
|
def pivot():
|
|
if ("pivot") not in (cache):
|
|
cache["pivot"] = xs.pop(0)
|
|
return cache["pivot"]
|
|
def left():
|
|
def temp2(arg):
|
|
out = []
|
|
for arg0 in arg:
|
|
if arg0<=pivot():
|
|
out.append(arg0)
|
|
return out
|
|
pivot()
|
|
if ("left") not in (cache):
|
|
cache["left"] = temp2(xs)
|
|
return cache["left"]
|
|
def right():
|
|
def temp3(arg):
|
|
out = []
|
|
for arg0 in arg:
|
|
if arg0>pivot():
|
|
out.append(arg0)
|
|
return out
|
|
left()
|
|
pivot()
|
|
if ("right") not in (cache):
|
|
cache["right"] = temp3(xs)
|
|
return cache["right"]
|
|
if k>(len(left())+1):
|
|
temp4 = qselect(right(), k-len(left())-1)
|
|
else:
|
|
if k==(len(left())+1):
|
|
temp5 = [pivot()]
|
|
else:
|
|
temp5 = qselect(left(), k)
|
|
temp4 = temp5
|
|
return temp4
|
|
def _search(xs,k):
|
|
if xs==[]:
|
|
return xs
|
|
if xs[1]==k:
|
|
temp6 = xs
|
|
else:
|
|
if xs[1]>k:
|
|
temp8 = _search(xs[0], k)
|
|
else:
|
|
temp8 = _search(xs[2], k)
|
|
temp6 = temp8
|
|
return temp6
|
|
def sorted(xs):
|
|
if xs==[]:
|
|
return xs
|
|
return sorted(xs[0])+[xs[1]]+sorted(xs[2])
|
|
def search(xs,k):
|
|
return len(_search(xs, k))!=0
|
|
def insert(xs,k):
|
|
return _insert(k, _search(xs, k))
|
|
def _insert(k,xs):
|
|
if k==[]:
|
|
return k
|
|
if len(xs)==0:
|
|
temp16 = xs
|
|
temp16.append([])
|
|
temp17 = temp16
|
|
temp17.append(k)
|
|
temp18 = temp17
|
|
temp18.append([])
|
|
temp15 = temp18
|
|
else:
|
|
temp15 = xs
|
|
return temp15
|
|
```
|
|
It's...horrible! All the `tempX` variables, __three layers of nested function declarations__, hardcoded cache access. This is not something you'd ever want to write.
|
|
Even to get this code, I had to come up with hacks __in a language I created__.
|
|
The first is the hack is to make the `qselect` function use the `xs == []` base
|
|
case. This doesn't happen by default, because `qselect` doesn't return a list!
|
|
To "fix" this, I made `qselect` return the number it found, wrapped in a
|
|
list literal. This is not up to spec, and would require another function
|
|
to unwrap this list.
|
|
|
|
While `qselect` was struggling with not having the base case, `insert` had
|
|
a base case it didn't need: `insert` shouldn't return the list itself
|
|
when it's empty, it should insert into it! However, when we use the `<<`
|
|
list insertion operator, the language infers `insert` to be a list-returning
|
|
function itself, inserting into an empty list will always fail. So, we
|
|
make a function `_insert`, which __takes the arguments in reverse__.
|
|
The base case will still be generated, but the first argument (against
|
|
which the base case is checked) will be a number, so the `k == []` check
|
|
will always fail.
|
|
|
|
That concludes this post. I'll be working on more solutions to homework
|
|
assignments in self-made languages, so keep an eye out!
|