Add first draft of Language 1 for CS325

2019-12-28 23:12:15 -08:00 · 2019-12-28 23:12:15 -08:00 · a406fb0846
commit a406fb0846
parent 75664e90bb
1 changed files with 281 additions and 0 deletions
--- a/content/blog/00_cs325_languages_hw1.md
+++ b/content/blog/00_cs325_languages_hw1.md
@ -0,0 +1,281 @@
+---
+title: A Language for an Assignment - Homework 1
+date: 2019-12-27T23:27:09-08:00
+draft: true
+tags: ["Haskell", "Python", "Algorithms"]
+---
+
+On a rainy Oregon day, I was walking between classes with a group of friends.
+We were discussing the various ways to obfuscate solutions to the weekly
+homework assignments in our Algorithms course: replace every `if` with
+a ternary expression, use single variable names, put everything on one line.
+I said:
+
+> The
+{{< sidenote "right" "chad-note" "chad" >}}
+This is in reference to a meme, <a href="https://knowyourmeme.com/memes/virgin-vs-chad">Virgin vs Chad</a>.
+A "chad" characteristic is masculine or "alpha" to the point of absurdity.
+{{< /sidenote >}} move would be to make your own, different language for every homework assignment.
+
+It was required of us to use
+{{< sidenote "left" "python-note" "Python" >}}
+A friend suggested making a Haskell program
+that generates Python-based interpreters for languages. While that would be truly
+absurd, I'll leave <em>this</em> challenge for another day.
+{{< /sidenote >}} for our solutions, so that was the first limitation on this challenge.
+Someone suggested to write the languages in Haskell, since that's what we used
+in our Programming Languages class. So the final goal ended up:
+
+* For each of the 10 homework assignments in CS325 - Analysis of Algorithms,
+* Create a Haskell program that translates a language into,
+* A valid Python program that works (nearly) out of the box and passes all the test cases.
+
+It may not be worth it to create a whole
+{{< sidenote "right" "general-purpose-note" "general-purpose" >}}
+A general purpose language is one that's designed to be used in vairous
+domains. For instance, C++ is a general-purpose language because it can
+be used for embedded systems, GUI programs, and pretty much anything else.
+This is in contrast to a domain-specific language, such as Game Maker Language,
+which is aimed at a much narrower set of uses.
+{{< /sidenote >}} language for each problem,
+but nowhere in the challenge did we say that it had to be general-purpose. In
+fact, some interesting design thinking can go into designing a domain-specific
+language for a particular assignment. So let's jump right into it, and make
+a language for the the first homework assignment.
+
+### Homework 1
+There are two problems in Homework 1. Here they are, verbatim:
+
+{{< codelines "text" "cs325-langs/hws/hw1.txt" 32 38 >}}
+
+And the second:
+
+{{< codelines "text" "cs325-langs/hws/hw1.txt" 47 68 >}}
+
+We want to make a language __specifically__ for these two tasks (one of which
+is split into many tasks). What common things can we isolate? I see two:
+
+First, __all the problems deal with lists__. This may seem like a trivial observation,
+but these two problems are the __only__ thing we use our language for. We have
+list access,
+{{< sidenote "right" "filterting-note" "list filtering" >}}
+Quickselect is a variation on quicksort, which itself
+finds all the "lesser" and "greater" elements in the input array.
+{{< /sidenote >}} and list creation. That should serve as a good base!
+
+If you squint a little bit, __all the problems are recursive with the same base case__.
+Consider the first few lines of `search`, implemented naively:
+
+```Python
+def search(xs, k):
+    if xs == []:
+        return false
+```
+
+How about `sorted`? Take a look:
+
+```Python
+def sorted(xs):
+    if xs == []:
+        return []
+```
+
+I'm sure you see the picture. But it will take some real mental gymnastics to twist the
+rest of the problems into this shape. What about `qselect`, for instance? There's two
+cases for what it may return:
+
+* `None` or equivalent if the index is out of bounds (we give it `4` an a list `[1, 2]`).
+* A number if `qselect` worked.
+
+The test cases never provide a concrete example of what should be returned from
+`qselect` in the first case, so we'll interpret it like
+{{< sidenote "right" "undefined-note" "undefined behavior" >}}
+For a quick sidenote about undefined behavior, check out how
+C++ optimizes the <a href="https://godbolt.org/z/3skK9j">Collatz Conjecture function</a>.
+Clang doesn't know whether or not the function will terminate (whether the Collatz Conjecture
+function terminates is an <a href="https://en.wikipedia.org/wiki/Collatz_conjecture">unsolved problem</a>),
+but functions that don't terminate are undefined behavior. There's only one other way the function
+returns, and that's with "1". Thus, clang optimzes the entire function to a single "return 1" call.
+{{< /sidenote >}} in C++:
+we can do whatever we want. So, let's allow it to return `[]` in the `None` case.
+This makes this base case valid:
+
+```Python
+def qselect(xs, k):
+    if xs == []:
+        return []
+```
+
+"Oh yeah, now it's all coming together." With one more observation (which will come
+from a piece I haven't yet shown you!), we'll be able to generalize this base case.
+
+The observation is this section in the assignment:
+
+{{< codelines "text" "cs325-langs/hws/hw1.txt" 83 98 >}}
+
+The real key is the part about "returning the `[]` where x should be inserted". It so
+happens that when the list given to the function is empty, the number should be inserted
+precisely into that list. Thus:
+
+```Python
+def _search(xs, k):
+    if xs == []:
+        return xs
+```
+
+The same works for `qselect`:
+
+```Python
+def qselect(xs, k):
+    if xs == []:
+        return xs
+```
+
+And for sorted, too:
+
+```Python
+def sorted(xs):
+    if xs == []:
+        return xs
+```
+
+There are some functions that are exceptions, though:
+
+```Python
+def insert(xs, k):
+    # We can't return early here!
+    # If we do, we'll never insert anything.
+```
+
+Also:
+
+```Python
+def search(xs, k):
+    # We have to return true or false, never
+    # an empty list.
+```
+
+So, whenever we __don't__ return a list, we don't want to add a special case.
+We arrive at the following common base case: __whenever a function returns a list, if its first argument
+is the empty list, the first argument is immediately returned__.
+
+We've largely exhasuted the conclusiosn we can draw from these problems. Let's get to designing a language.
+
+### A Silly Language
+Let's start by visualizing our goals. Without base cases, the solution to `_search`
+would be something like this:
+
+{{< codelines "text" "cs325-langs/sols/hw1.lang" 11 14 >}}
+
+Here we have an __`if`-expression__. It has to have an `else`, and evaluates to the value
+of the chosen branch. That is, `if true then 0 else 1` evaluates to `0`, while
+`if false then 0 else 1` evaluates to `1`. Otherwise, we follow the binary tree search
+algorithm faithfully.
+
+Using this definition of `_search`, we can define `search` pretty easily:
+
+{{< codelines "text" "cs325-langs/sols/hw1.lang" 17 17 >}}
+
+Let's use Haskell's `(++)` operator for concatentation. This will help us understand
+when the user is operating on lists, and when they're not. With this, `sorted` becomes:
+
+{{< codelines "text" "cs325-langs/sols/hw1.lang" 16 16 >}}
+
+Let's go for `qselect` now. We'll introduce a very silly language feature for this
+problem:
+{{< sidenote "right" "selector-note" "list selectors" >}}
+You've probably never heard of list selectors, and for a good reason:
+this is a <em>terrible</em> language feature. I'll go in more detail
+later, but I wanted to make this clear right away.
+{{< /sidenote >}}. We observe that `qselect` aims to partition the list into
+other lists. We thus add the following pieces of syntax:
+
+```
+~xs -> {
+    pivot <- xs[rand]!
+    left <- xs[#0 <= pivot]
+    ...
+} -> ...
+```
+
+There are three new things here.
+
+1. The actual "list selector": `~xs -> { .. } -> ...`. Between the curly braces
+are branches which select parts of the list and assign them to new variables.
+Thus, `pivot <- xs[rand]!` assigns the element at a random index to the variable `pivot`.
+the `!` at the end means "after taking this out of `xs`, delete it from `xs`". The
+syntax {{< sidenote "right" "curly-note" "starts with \"~\"" >}}
+An observant reader will note that there's no need for the "xs" after the "~".
+The idea was to add a special case syntax to reference the "selected list", but
+I ended up not bothering. So in fact, this part of the syntax is useless.
+{{< /sidenote >}} to make it easier to parse.
+2. The `rand` list access syntax. `xs[rand]` is a special case that picks a random
+element from `xs`.
+3. The `xs[#0 <= pivot]` syntax. This is another special case that selects all elements
+from `xs` that match the given predicate (where `#0` is replaced with each element in `xs`).
+
+The big part of qselect is to not evaluate `right` unless you have to. So, we shouldn't
+eagerly evaluate the list selector. We also don't want something like `right[|right|-1]` to evaluate
+`right` twice. So we settle on
+{{< sidenote "right" "lazy-note" "lazy evaluation" >}}
+Lazy evaluation means only evaluating an expression when we need to. Thus,
+although we might encounter the expression for <code>right</code>, we
+only evaluate it when the time comes. Lazy evaluation, at least
+the way that Haskell has it, is more specific: an expression is evaluated only
+once, or not at all.
+{{</ sidenote >}}.
+Ah, but the `!` marker introduces
+{{< sidenote "left" "side-effect-note" "side effects" >}}
+A side effect is a term frequently used when talking about functional programming.
+Evaluating the expression <code>xs[rand]!</code> doesn't just get a random element,
+it also changes <em>something else</em>. In this case, that something else is 
+the <code>xs</code> list.
+{{< /sidenote >}}. So we can't just evaluate these things all willy-nilly.
+So, let's make it so that each expression in the selector list requires the ones above it. Thus,
+`left` will require `pivot`, and `right` will require `left` and `pivot`. So,
+lazily evaluated, ordered expressions. The whole `qselect` becomes:
+
+{{< codelines "text" "cs325-langs/sols/hw1.lang" 1 9 >}}
+
+We've now figured out all the language constructs. Let's start working on
+some implementation!
+
+#### Data Definitions
+Let's start with defining the AST and other data types for our language:
+
+{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 14 52 >}}
+
+The `PossibleType` class will be used when we figure out if a function returns
+a list or not, for our base case insertion rule. The `Selector` type
+will hold a single line in the list selector we defined earlier, and
+the `SelectorMarker` will indicate if the user added the `!` "remove from list"
+marker at the end. To represent the various operators in our language, we create
+the `Op` data type. Note that unlike Python, `++` (list concatenation) and
+`+` (addition) are different operators in our language.
+
+We then define valid expressions. Obviously, a variable (like `xs`), an
+integer literal (like `1`) and a list literal (like `[]`) are allowed.
+We also put in our selector, which consists of the expression on the
+left, the list of selector branches (`[Selector]`) and the expression
+of "what to actually do with the new variables". We also
+add `if`-expressions (like we discussed), and function calls. Lastly,
+we add binary operators like (`x+y`), the length operator (`|xs|`),
+and the list access operator (`xs[0]`). We also make `#0` a part
+of the expression syntax, even though it's only allowed inside
+a list access.
+
+Of course, we wouldn't want to write our language using
+Haskell. We want to actually write a text file, like `hw1.lang`,
+and then have our program translate that to Python. The first
+step to that is __parsing__: we need to turn our language text
+into the `Expr` structure we have.
+
+#### Parsing
+We'll be using `Parsec` for parsing. `Parsec` is a parsing library
+based on
+{{< sidenote "right" "monad-note" "monadic" >}}
+Haskell is a language with more monad tutorials than
+programmers. For this reason, I will resist the temptation
+to explain what monads are. If you <em>don't</em> know
+what they are, don't worry, there are plenty of other resources.
+{{< /sidenote >}} parser combinators.