diff --git a/content/blog/00_cs325_languages_hw1.md b/content/blog/00_cs325_languages_hw1.md new file mode 100644 index 0000000..870571a --- /dev/null +++ b/content/blog/00_cs325_languages_hw1.md @@ -0,0 +1,281 @@ +--- +title: A Language for an Assignment - Homework 1 +date: 2019-12-27T23:27:09-08:00 +draft: true +tags: ["Haskell", "Python", "Algorithms"] +--- + +On a rainy Oregon day, I was walking between classes with a group of friends. +We were discussing the various ways to obfuscate solutions to the weekly +homework assignments in our Algorithms course: replace every `if` with +a ternary expression, use single variable names, put everything on one line. +I said: + +> The +{{< sidenote "right" "chad-note" "chad" >}} +This is in reference to a meme, Virgin vs Chad. +A "chad" characteristic is masculine or "alpha" to the point of absurdity. +{{< /sidenote >}} move would be to make your own, different language for every homework assignment. + +It was required of us to use +{{< sidenote "left" "python-note" "Python" >}} +A friend suggested making a Haskell program +that generates Python-based interpreters for languages. While that would be truly +absurd, I'll leave this challenge for another day. +{{< /sidenote >}} for our solutions, so that was the first limitation on this challenge. +Someone suggested to write the languages in Haskell, since that's what we used +in our Programming Languages class. So the final goal ended up: + +* For each of the 10 homework assignments in CS325 - Analysis of Algorithms, +* Create a Haskell program that translates a language into, +* A valid Python program that works (nearly) out of the box and passes all the test cases. + +It may not be worth it to create a whole +{{< sidenote "right" "general-purpose-note" "general-purpose" >}} +A general purpose language is one that's designed to be used in vairous +domains. For instance, C++ is a general-purpose language because it can +be used for embedded systems, GUI programs, and pretty much anything else. +This is in contrast to a domain-specific language, such as Game Maker Language, +which is aimed at a much narrower set of uses. +{{< /sidenote >}} language for each problem, +but nowhere in the challenge did we say that it had to be general-purpose. In +fact, some interesting design thinking can go into designing a domain-specific +language for a particular assignment. So let's jump right into it, and make +a language for the the first homework assignment. + +### Homework 1 +There are two problems in Homework 1. Here they are, verbatim: + +{{< codelines "text" "cs325-langs/hws/hw1.txt" 32 38 >}} + +And the second: + +{{< codelines "text" "cs325-langs/hws/hw1.txt" 47 68 >}} + +We want to make a language __specifically__ for these two tasks (one of which +is split into many tasks). What common things can we isolate? I see two: + +First, __all the problems deal with lists__. This may seem like a trivial observation, +but these two problems are the __only__ thing we use our language for. We have +list access, +{{< sidenote "right" "filterting-note" "list filtering" >}} +Quickselect is a variation on quicksort, which itself +finds all the "lesser" and "greater" elements in the input array. +{{< /sidenote >}} and list creation. That should serve as a good base! + +If you squint a little bit, __all the problems are recursive with the same base case__. +Consider the first few lines of `search`, implemented naively: + +```Python +def search(xs, k): + if xs == []: + return false +``` + +How about `sorted`? Take a look: + +```Python +def sorted(xs): + if xs == []: + return [] +``` + +I'm sure you see the picture. But it will take some real mental gymnastics to twist the +rest of the problems into this shape. What about `qselect`, for instance? There's two +cases for what it may return: + +* `None` or equivalent if the index is out of bounds (we give it `4` an a list `[1, 2]`). +* A number if `qselect` worked. + +The test cases never provide a concrete example of what should be returned from +`qselect` in the first case, so we'll interpret it like +{{< sidenote "right" "undefined-note" "undefined behavior" >}} +For a quick sidenote about undefined behavior, check out how +C++ optimizes the Collatz Conjecture function. +Clang doesn't know whether or not the function will terminate (whether the Collatz Conjecture +function terminates is an unsolved problem), +but functions that don't terminate are undefined behavior. There's only one other way the function +returns, and that's with "1". Thus, clang optimzes the entire function to a single "return 1" call. +{{< /sidenote >}} in C++: +we can do whatever we want. So, let's allow it to return `[]` in the `None` case. +This makes this base case valid: + +```Python +def qselect(xs, k): + if xs == []: + return [] +``` + +"Oh yeah, now it's all coming together." With one more observation (which will come +from a piece I haven't yet shown you!), we'll be able to generalize this base case. + +The observation is this section in the assignment: + +{{< codelines "text" "cs325-langs/hws/hw1.txt" 83 98 >}} + +The real key is the part about "returning the `[]` where x should be inserted". It so +happens that when the list given to the function is empty, the number should be inserted +precisely into that list. Thus: + +```Python +def _search(xs, k): + if xs == []: + return xs +``` + +The same works for `qselect`: + +```Python +def qselect(xs, k): + if xs == []: + return xs +``` + +And for sorted, too: + +```Python +def sorted(xs): + if xs == []: + return xs +``` + +There are some functions that are exceptions, though: + +```Python +def insert(xs, k): + # We can't return early here! + # If we do, we'll never insert anything. +``` + +Also: + +```Python +def search(xs, k): + # We have to return true or false, never + # an empty list. +``` + +So, whenever we __don't__ return a list, we don't want to add a special case. +We arrive at the following common base case: __whenever a function returns a list, if its first argument +is the empty list, the first argument is immediately returned__. + +We've largely exhasuted the conclusiosn we can draw from these problems. Let's get to designing a language. + +### A Silly Language +Let's start by visualizing our goals. Without base cases, the solution to `_search` +would be something like this: + +{{< codelines "text" "cs325-langs/sols/hw1.lang" 11 14 >}} + +Here we have an __`if`-expression__. It has to have an `else`, and evaluates to the value +of the chosen branch. That is, `if true then 0 else 1` evaluates to `0`, while +`if false then 0 else 1` evaluates to `1`. Otherwise, we follow the binary tree search +algorithm faithfully. + +Using this definition of `_search`, we can define `search` pretty easily: + +{{< codelines "text" "cs325-langs/sols/hw1.lang" 17 17 >}} + +Let's use Haskell's `(++)` operator for concatentation. This will help us understand +when the user is operating on lists, and when they're not. With this, `sorted` becomes: + +{{< codelines "text" "cs325-langs/sols/hw1.lang" 16 16 >}} + +Let's go for `qselect` now. We'll introduce a very silly language feature for this +problem: +{{< sidenote "right" "selector-note" "list selectors" >}} +You've probably never heard of list selectors, and for a good reason: +this is a terrible language feature. I'll go in more detail +later, but I wanted to make this clear right away. +{{< /sidenote >}}. We observe that `qselect` aims to partition the list into +other lists. We thus add the following pieces of syntax: + +``` +~xs -> { + pivot <- xs[rand]! + left <- xs[#0 <= pivot] + ... +} -> ... +``` + +There are three new things here. + +1. The actual "list selector": `~xs -> { .. } -> ...`. Between the curly braces +are branches which select parts of the list and assign them to new variables. +Thus, `pivot <- xs[rand]!` assigns the element at a random index to the variable `pivot`. +the `!` at the end means "after taking this out of `xs`, delete it from `xs`". The +syntax {{< sidenote "right" "curly-note" "starts with \"~\"" >}} +An observant reader will note that there's no need for the "xs" after the "~". +The idea was to add a special case syntax to reference the "selected list", but +I ended up not bothering. So in fact, this part of the syntax is useless. +{{< /sidenote >}} to make it easier to parse. +2. The `rand` list access syntax. `xs[rand]` is a special case that picks a random +element from `xs`. +3. The `xs[#0 <= pivot]` syntax. This is another special case that selects all elements +from `xs` that match the given predicate (where `#0` is replaced with each element in `xs`). + +The big part of qselect is to not evaluate `right` unless you have to. So, we shouldn't +eagerly evaluate the list selector. We also don't want something like `right[|right|-1]` to evaluate +`right` twice. So we settle on +{{< sidenote "right" "lazy-note" "lazy evaluation" >}} +Lazy evaluation means only evaluating an expression when we need to. Thus, +although we might encounter the expression for right, we +only evaluate it when the time comes. Lazy evaluation, at least +the way that Haskell has it, is more specific: an expression is evaluated only +once, or not at all. +{{}}. +Ah, but the `!` marker introduces +{{< sidenote "left" "side-effect-note" "side effects" >}} +A side effect is a term frequently used when talking about functional programming. +Evaluating the expression xs[rand]! doesn't just get a random element, +it also changes something else. In this case, that something else is +the xs list. +{{< /sidenote >}}. So we can't just evaluate these things all willy-nilly. +So, let's make it so that each expression in the selector list requires the ones above it. Thus, +`left` will require `pivot`, and `right` will require `left` and `pivot`. So, +lazily evaluated, ordered expressions. The whole `qselect` becomes: + +{{< codelines "text" "cs325-langs/sols/hw1.lang" 1 9 >}} + +We've now figured out all the language constructs. Let's start working on +some implementation! + +#### Data Definitions +Let's start with defining the AST and other data types for our language: + +{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 14 52 >}} + +The `PossibleType` class will be used when we figure out if a function returns +a list or not, for our base case insertion rule. The `Selector` type +will hold a single line in the list selector we defined earlier, and +the `SelectorMarker` will indicate if the user added the `!` "remove from list" +marker at the end. To represent the various operators in our language, we create +the `Op` data type. Note that unlike Python, `++` (list concatenation) and +`+` (addition) are different operators in our language. + +We then define valid expressions. Obviously, a variable (like `xs`), an +integer literal (like `1`) and a list literal (like `[]`) are allowed. +We also put in our selector, which consists of the expression on the +left, the list of selector branches (`[Selector]`) and the expression +of "what to actually do with the new variables". We also +add `if`-expressions (like we discussed), and function calls. Lastly, +we add binary operators like (`x+y`), the length operator (`|xs|`), +and the list access operator (`xs[0]`). We also make `#0` a part +of the expression syntax, even though it's only allowed inside +a list access. + +Of course, we wouldn't want to write our language using +Haskell. We want to actually write a text file, like `hw1.lang`, +and then have our program translate that to Python. The first +step to that is __parsing__: we need to turn our language text +into the `Expr` structure we have. + +#### Parsing +We'll be using `Parsec` for parsing. `Parsec` is a parsing library +based on +{{< sidenote "right" "monad-note" "monadic" >}} +Haskell is a language with more monad tutorials than +programmers. For this reason, I will resist the temptation +to explain what monads are. If you don't know +what they are, don't worry, there are plenty of other resources. +{{< /sidenote >}} parser combinators.