Add the first post in CS325 series

2019-12-29 22:47:36 -08:00
parent a406fb0846
commit 19aa126025
3 changed files with 269 additions and 43 deletions
--- a/code/cs325-langs/sols/hw1.lang
+++ b/code/cs325-langs/sols/hw1.lang
@@ -17,4 +17,3 @@ sorted(xs) = sorted(xs[0]) ++ [xs[1]] ++ sorted(xs[2]);
 search(xs, k) = |_search(xs, k)| != 0;
 insert(xs, k) = _insert(k, _search(xs, k));
 _insert(k, xs) = if |xs| == 0 then xs << [] << k << [] else xs
-
--- a/code/cs325-langs/src/LanguageOne.hs
+++ b/code/cs325-langs/src/LanguageOne.hs
@@ -270,14 +270,13 @@ type Translator = Control.Monad.State.State (Map.Map String [String], Int)

 currentTemp :: Translator String
 currentTemp = do
-    (_, t) <- get
+    t <- gets snd
    return $ "temp" ++ show t

 incrementTemp :: Translator String
 incrementTemp = do
-    (vs, t) <- get
-    put (vs, t+1)
-    return $ "temp" ++ show t
+    modify (second (+1))
+    currentTemp

 hasLambda :: Expr -> Bool
 hasLambda (ListLiteral es) = any hasLambda es
--- a/content/blog/00_cs325_languages_hw1.md
+++ b/content/blog/00_cs325_languages_hw1.md
@@ -1,7 +1,6 @@
 ---
 title: A Language for an Assignment - Homework 1
 date: 2019-12-27T23:27:09-08:00
-draft: true
 tags: ["Haskell", "Python", "Algorithms"]
 ---

@@ -32,7 +31,7 @@ in our Programming Languages class. So the final goal ended up:

 It may not be worth it to create a whole
 {{< sidenote "right" "general-purpose-note" "general-purpose" >}}
-A general purpose language is one that's designed to be used in vairous
+A general purpose language is one that's designed to be used in various
 domains. For instance, C++ is a general-purpose language because it can
 be used for embedded systems, GUI programs, and pretty much anything else.
 This is in contrast to a domain-specific language, such as Game Maker Language,
@@ -41,7 +40,7 @@ which is aimed at a much narrower set of uses.
 but nowhere in the challenge did we say that it had to be general-purpose. In
 fact, some interesting design thinking can go into designing a domain-specific
 language for a particular assignment. So let's jump right into it, and make
-a language for the the first homework assignment.
+a language for the first homework assignment.

 ### Homework 1
 There are two problems in Homework 1. Here they are, verbatim:
@@ -95,7 +94,7 @@ C++ optimizes the <a href="https://godbolt.org/z/3skK9j">Collatz Conjecture func
 Clang doesn't know whether or not the function will terminate (whether the Collatz Conjecture
 function terminates is an <a href="https://en.wikipedia.org/wiki/Collatz_conjecture">unsolved problem</a>),
 but functions that don't terminate are undefined behavior. There's only one other way the function
-returns, and that's with "1". Thus, clang optimzes the entire function to a single "return 1" call.
+returns, and that's with "1". Thus, clang optimizes the entire function to a single "return 1" call.
 {{< /sidenote >}} in C++:
 we can do whatever we want. So, let's allow it to return `[]` in the `None` case.
 This makes this base case valid:
@@ -240,42 +239,271 @@ lazily evaluated, ordered expressions. The whole `qselect` becomes:
 We've now figured out all the language constructs. Let's start working on
 some implementation!

-#### Data Definitions
-Let's start with defining the AST and other data types for our language:
+#### Implementation
+It would be silly of me to explain every detail of creating a language in Haskell
+in this post; this is neither the purpose of the post, nor is it plausible
+to do this without covering monads, parser combinators, grammars, abstract syntax
+trees, and more. So, instead, I'll discuss the _interesting_ parts of the
+implementation. 

-{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 14 52 >}}
+##### Temporary Variables
+Our language is expression-based, yes. A function is a single,
+arbitrarily complex expression (involving `if/else`, list
+selectors, and more). So it would make sense to translate
+a function to a single, arbitrarily complex Python expression.
+However, the way we've designed our language makes it
+not-so-suitable for converting to a single expression! For
+instance, consider `xs[rand]`. We need to compute the list,
+get its length, generate a random number, and then access
+the corresponding element in the list. We use the list
+here twice, and simply repeating the expression would not
+be very smart: we'd be evaluating twice. So instead,
+we'll use a variable, assign the list to that variable,
+and then access that variable multiple times.

-The `PossibleType` class will be used when we figure out if a function returns
-a list or not, for our base case insertion rule. The `Selector` type
-will hold a single line in the list selector we defined earlier, and
-the `SelectorMarker` will indicate if the user added the `!` "remove from list"
-marker at the end. To represent the various operators in our language, we create
-the `Op` data type. Note that unlike Python, `++` (list concatenation) and
-`+` (addition) are different operators in our language.
+To be extra safe, let's use a fresh temporary variable
+every time we need to store something. The simplest
+way is to simply maintain a counter of how many temporary
+variables we've already used, and generate a new variable
+by prepending the word "temp" to that number. We start
+with `temp0`, then `temp1`, and so on. To keep a counter,
+we can use a state monad:

-We then define valid expressions. Obviously, a variable (like `xs`), an
-integer literal (like `1`) and a list literal (like `[]`) are allowed.
-We also put in our selector, which consists of the expression on the
-left, the list of selector branches (`[Selector]`) and the expression
-of "what to actually do with the new variables". We also
-add `if`-expressions (like we discussed), and function calls. Lastly,
-we add binary operators like (`x+y`), the length operator (`|xs|`),
-and the list access operator (`xs[0]`). We also make `#0` a part
-of the expression syntax, even though it's only allowed inside
-a list access.
+{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 269 269 >}}

-Of course, we wouldn't want to write our language using
-Haskell. We want to actually write a text file, like `hw1.lang`,
-and then have our program translate that to Python. The first
-step to that is __parsing__: we need to turn our language text
-into the `Expr` structure we have.
+Don't worry about the `Map.Map String [String]`, we'll get to that in a bit.
+For now, all we have to worry about is the second element of the tuple,
+the integer counting how many temporary variables we've used. We can
+get the current temporary variable as follows:

-#### Parsing
-We'll be using `Parsec` for parsing. `Parsec` is a parsing library
-based on
-{{< sidenote "right" "monad-note" "monadic" >}}
-Haskell is a language with more monad tutorials than
-programmers. For this reason, I will resist the temptation
-to explain what monads are. If you <em>don't</em> know
-what they are, don't worry, there are plenty of other resources.
-{{< /sidenote >}} parser combinators.
+{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 271 274 >}}
+
+We can also get a fresh temporary variable like this:
+
+{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 276 279 >}}
+
+Now, the
+{{< sidenote "left" "" "code" >}}
+Since we are translating an expression, we must have the result of
+the translation yield an Python expression we can use in generating
+larger Python expressions. However, as we've seen, we occasionally
+have to use statements. Thus, the <code>translateExpr</code> function
+returns a <code>Translator ([Py.PyStmt], Py.PyExpr)</code>.
+{{< /sidenote >}}for generating a random list access looks like
+{{< sidenote "right" "ast-note" "this:" >}}
+The <code>Py.*</code> constructors are a part of a Python AST module I quickly
+threw together. I won't showcase it here, but you can always look at the
+source code for the blog (which includes this project)
+<a href="https://dev.danilafe.com/Web-Projects/blog-static">here</a>.
+{{< /sidenote >}}
+
+{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 364 369 >}}
+
+##### Implementing "lazy evaluation"
+Lazy evaluation in functional programs usually arises from
+{{< sidenote "right" "graph-note" "graph reduction" >}}
+Graph reduction, more specifically the <em>Spineless,
+Tagless G-machine</em> is at the core of the Glasgow Haskell
+Compiler (GHC). Simon Peyton Jones' earlier book,
+<em>Implementing Functional Languages: a tutorial</em>
+details an earlier version of the G-machine.
+{{< /sidenote >}}. However, Python is neither
+functional nor graph-based, and we only lazily
+evaluate list selectors. Thus, we'll have to do
+some work to get our lazy evaluation to work as we desire.
+Here's what I came up with:
+
+1. It's difficult to insert Python statements where they are
+needed: we'd have to figure out in which scope each variable
+has already been declared, and in which scope it's yet
+to be assigned. 
+2. Instead, we can use a Python dictionary, called `cache`,
+and store computed versions of each variable in the cache.
+3. It's pretty difficult to check if a variable
+is in the cache, compute it if not, and then return the
+result of the computation, in one expression. This is
+true, unless that single expression is a function call, and we have a dedicated
+function that takes no arguments, computes the expression if needed,
+and uses the cache otherwise. We choose this route.
+4. We have already promised that we'd evaluate all the selected
+variables above a given variable before evaluating the variable
+itself. So, each function will first call (and therefore
+{{< sidenote "right" "force-note" "force" >}}
+{{< todo >}}Explain forcing{{< /todo >}}
+{{< /sidenote >}}) the functions
+generated for variables declared above the function's own variable.
+5. To keep track of all of this, we use the already-existing state monad
+as a reader monad (that is, we clear the changes we make to the monad
+after we're done translating the list selector). This is where the `Map.Map String [String]`
+comes from.
+
+The `Map.Map String [String]` keeps track of variables that will be lazily computed,
+and also of the dependencies of each variable (the variables that need
+to be access before the variable itself). We compute such a map for
+each selector as follows: 
+
+{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 337 337 >}}
+
+We update the existing map using `Map.union`:
+
+{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 338 338 >}}
+
+And, after we're done generating expressions in the body of this selector,
+we clear it to its previous value `vs`:
+
+{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 341 341 >}}
+
+We generate a single selector as follows:
+
+{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 307 320 >}}
+
+This generates a function definition statement, which we will examine in
+generated Python code later on.
+
+Solving the problem this way also introduces another gotcha: sometimes,
+a variable is produced by a function call, and other times the variable
+is just a Python variable. We write this as follows:
+
+{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 322 327 >}}
+
+##### Special Case Insertion
+This is a silly language for a single homework assignment. I'm not
+planning to implement Hindley-Milner type inference, or anything
+of that sort. For the purpose of this language, things will be
+either a list, or not a list. And as long as a function __can__ return
+a list, it can also return the list from its base case. Thus,
+that's all we will try to figure out. The checking code is so
+short that we can include the whole snippet at once:
+
+{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 258 266 >}}
+
+`mergePossibleType`
+{{< sidenote "right" "bool-identity-note" "figures out" >}}
+An observant reader will note that this is just a logical
+OR function. It's not, however, good practice to use
+booleans for types that have two constructors with no arguments.
+Check out this <a href="https://programming-elm.com/blog/2019-05-20-solving-the-boolean-identity-crisis-part-1/">
+Elm-based article</a> about this, which the author calls the
+Boolean Identity Crisis.
+{{< /sidenote >}}, given two possible types for an
+expression, the final type for the expression.
+
+There's only one real trick to this. Sometimes, like in
+`_search`, the only time we return something _known_ to be a list, that
+something is `xs`. Since we're making a list manipulation language,
+let's __assume the first argument to the function is a list__, and
+__use this information to determine expression types__. We guess
+types in a very basic manner otherwise: If you use the concatenation
+operator, or a list literal, then obviously we're working on a list.
+If you're returning the first argument of the function, that's also
+a list. Otherwise, it could be anything.
+
+My Haskell linter actually suggested a pretty clever way of writing
+the whole "add a base case if this function returns a list" code.
+Check it out:
+
+{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 299 305 >}}
+
+Specifically, look at the line with `let fastReturn = ...`. It
+uses a list comprehension: we take a parameter `p` from the list of
+parameter `ps`, but only produce the statements for the base case
+if the possible type computed using `p` is `List`.
+
+### The Output
+What kind of beast have we created? Take a look for yourself:
+```Python
+def qselect(xs,k):
+    if xs==[]:
+        return xs
+    cache = {}
+    def pivot():
+        if ("pivot") not in (cache):
+            cache["pivot"] = xs.pop(0)
+        return cache["pivot"]
+    def left():
+        def temp2(arg):
+            out = []
+            for arg0 in arg:
+                if arg0<=pivot():
+                    out.append(arg0)
+            return out
+        pivot()
+        if ("left") not in (cache):
+            cache["left"] = temp2(xs)
+        return cache["left"]
+    def right():
+        def temp3(arg):
+            out = []
+            for arg0 in arg:
+                if arg0>pivot():
+                    out.append(arg0)
+            return out
+        left()
+        pivot()
+        if ("right") not in (cache):
+            cache["right"] = temp3(xs)
+        return cache["right"]
+    if k>(len(left())+1):
+        temp4 = qselect(right(), k-len(left())-1)
+    else:
+        if k==(len(left())+1):
+            temp5 = [pivot()]
+        else:
+            temp5 = qselect(left(), k)
+        temp4 = temp5
+    return temp4
+def _search(xs,k):
+    if xs==[]:
+        return xs
+    if xs[1]==k:
+        temp6 = xs
+    else:
+        if xs[1]>k:
+            temp8 = _search(xs[0], k)
+        else:
+            temp8 = _search(xs[2], k)
+        temp6 = temp8
+    return temp6
+def sorted(xs):
+    if xs==[]:
+        return xs
+    return sorted(xs[0])+[xs[1]]+sorted(xs[2])
+def search(xs,k):
+    return len(_search(xs, k))!=0
+def insert(xs,k):
+    return _insert(k, _search(xs, k))
+def _insert(k,xs):
+    if k==[]:
+        return k
+    if len(xs)==0:
+        temp16 = xs
+        temp16.append([])
+        temp17 = temp16
+        temp17.append(k)
+        temp18 = temp17
+        temp18.append([])
+        temp15 = temp18
+    else:
+        temp15 = xs
+    return temp15
+```
+It's...horrible! All the `tempX` variables, __three layers of nested function declarations__, hardcoded cache access. This is not something you'd ever want to write.
+Even to get this code, I had to come up with hacks __in a language I created__.
+The first is the hack is to make the `qselect` function use the `xs == []` base
+case. This doesn't happen by default, because `qselect` doesn't return a list!
+To "fix" this, I made `qselect` return the number it found, wrapped in a
+list literal. This is not up to spec, and would require another function
+to unwrap this list.
+
+While `qselect` was struggling with not having the base case, `insert` had
+a base case it didn't need: `insert` shouldn't return the list itself
+when it's empty, it should insert into it! However, when we use the `<<`
+list insertion operator, the language infers `insert` to be a list-returning
+function itself, inserting into an empty list will always fail. So, we
+make a function `_insert`, which __takes the arguments in reverse__.
+The base case will still be generated, but the first argument (against
+which the base case is checked) will be a number, so the `k == []` check
+will always fail.
+
+That concludes this post. I'll be working on more solutions to homework
+assignments in self-made languages, so keep an eye out!