diff --git a/code/cs325-langs/sols/hw1.lang b/code/cs325-langs/sols/hw1.lang index 1fd834e..221a7d1 100644 --- a/code/cs325-langs/sols/hw1.lang +++ b/code/cs325-langs/sols/hw1.lang @@ -17,4 +17,3 @@ sorted(xs) = sorted(xs[0]) ++ [xs[1]] ++ sorted(xs[2]); search(xs, k) = |_search(xs, k)| != 0; insert(xs, k) = _insert(k, _search(xs, k)); _insert(k, xs) = if |xs| == 0 then xs << [] << k << [] else xs - diff --git a/code/cs325-langs/src/LanguageOne.hs b/code/cs325-langs/src/LanguageOne.hs index f6d80a9..3cff31f 100644 --- a/code/cs325-langs/src/LanguageOne.hs +++ b/code/cs325-langs/src/LanguageOne.hs @@ -270,14 +270,13 @@ type Translator = Control.Monad.State.State (Map.Map String [String], Int) currentTemp :: Translator String currentTemp = do - (_, t) <- get + t <- gets snd return $ "temp" ++ show t incrementTemp :: Translator String incrementTemp = do - (vs, t) <- get - put (vs, t+1) - return $ "temp" ++ show t + modify (second (+1)) + currentTemp hasLambda :: Expr -> Bool hasLambda (ListLiteral es) = any hasLambda es diff --git a/content/blog/00_cs325_languages_hw1.md b/content/blog/00_cs325_languages_hw1.md index 870571a..06703fa 100644 --- a/content/blog/00_cs325_languages_hw1.md +++ b/content/blog/00_cs325_languages_hw1.md @@ -1,7 +1,6 @@ --- title: A Language for an Assignment - Homework 1 date: 2019-12-27T23:27:09-08:00 -draft: true tags: ["Haskell", "Python", "Algorithms"] --- @@ -32,7 +31,7 @@ in our Programming Languages class. So the final goal ended up: It may not be worth it to create a whole {{< sidenote "right" "general-purpose-note" "general-purpose" >}} -A general purpose language is one that's designed to be used in vairous +A general purpose language is one that's designed to be used in various domains. For instance, C++ is a general-purpose language because it can be used for embedded systems, GUI programs, and pretty much anything else. This is in contrast to a domain-specific language, such as Game Maker Language, @@ -41,7 +40,7 @@ which is aimed at a much narrower set of uses. but nowhere in the challenge did we say that it had to be general-purpose. In fact, some interesting design thinking can go into designing a domain-specific language for a particular assignment. So let's jump right into it, and make -a language for the the first homework assignment. +a language for the first homework assignment. ### Homework 1 There are two problems in Homework 1. Here they are, verbatim: @@ -95,7 +94,7 @@ C++ optimizes the Collatz Conjecture func Clang doesn't know whether or not the function will terminate (whether the Collatz Conjecture function terminates is an unsolved problem), but functions that don't terminate are undefined behavior. There's only one other way the function -returns, and that's with "1". Thus, clang optimzes the entire function to a single "return 1" call. +returns, and that's with "1". Thus, clang optimizes the entire function to a single "return 1" call. {{< /sidenote >}} in C++: we can do whatever we want. So, let's allow it to return `[]` in the `None` case. This makes this base case valid: @@ -240,42 +239,271 @@ lazily evaluated, ordered expressions. The whole `qselect` becomes: We've now figured out all the language constructs. Let's start working on some implementation! -#### Data Definitions -Let's start with defining the AST and other data types for our language: +#### Implementation +It would be silly of me to explain every detail of creating a language in Haskell +in this post; this is neither the purpose of the post, nor is it plausible +to do this without covering monads, parser combinators, grammars, abstract syntax +trees, and more. So, instead, I'll discuss the _interesting_ parts of the +implementation. -{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 14 52 >}} +##### Temporary Variables +Our language is expression-based, yes. A function is a single, +arbitrarily complex expression (involving `if/else`, list +selectors, and more). So it would make sense to translate +a function to a single, arbitrarily complex Python expression. +However, the way we've designed our language makes it +not-so-suitable for converting to a single expression! For +instance, consider `xs[rand]`. We need to compute the list, +get its length, generate a random number, and then access +the corresponding element in the list. We use the list +here twice, and simply repeating the expression would not +be very smart: we'd be evaluating twice. So instead, +we'll use a variable, assign the list to that variable, +and then access that variable multiple times. -The `PossibleType` class will be used when we figure out if a function returns -a list or not, for our base case insertion rule. The `Selector` type -will hold a single line in the list selector we defined earlier, and -the `SelectorMarker` will indicate if the user added the `!` "remove from list" -marker at the end. To represent the various operators in our language, we create -the `Op` data type. Note that unlike Python, `++` (list concatenation) and -`+` (addition) are different operators in our language. +To be extra safe, let's use a fresh temporary variable +every time we need to store something. The simplest +way is to simply maintain a counter of how many temporary +variables we've already used, and generate a new variable +by prepending the word "temp" to that number. We start +with `temp0`, then `temp1`, and so on. To keep a counter, +we can use a state monad: -We then define valid expressions. Obviously, a variable (like `xs`), an -integer literal (like `1`) and a list literal (like `[]`) are allowed. -We also put in our selector, which consists of the expression on the -left, the list of selector branches (`[Selector]`) and the expression -of "what to actually do with the new variables". We also -add `if`-expressions (like we discussed), and function calls. Lastly, -we add binary operators like (`x+y`), the length operator (`|xs|`), -and the list access operator (`xs[0]`). We also make `#0` a part -of the expression syntax, even though it's only allowed inside -a list access. +{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 269 269 >}} -Of course, we wouldn't want to write our language using -Haskell. We want to actually write a text file, like `hw1.lang`, -and then have our program translate that to Python. The first -step to that is __parsing__: we need to turn our language text -into the `Expr` structure we have. +Don't worry about the `Map.Map String [String]`, we'll get to that in a bit. +For now, all we have to worry about is the second element of the tuple, +the integer counting how many temporary variables we've used. We can +get the current temporary variable as follows: -#### Parsing -We'll be using `Parsec` for parsing. `Parsec` is a parsing library -based on -{{< sidenote "right" "monad-note" "monadic" >}} -Haskell is a language with more monad tutorials than -programmers. For this reason, I will resist the temptation -to explain what monads are. If you don't know -what they are, don't worry, there are plenty of other resources. -{{< /sidenote >}} parser combinators. +{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 271 274 >}} + +We can also get a fresh temporary variable like this: + +{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 276 279 >}} + +Now, the +{{< sidenote "left" "" "code" >}} +Since we are translating an expression, we must have the result of +the translation yield an Python expression we can use in generating +larger Python expressions. However, as we've seen, we occasionally +have to use statements. Thus, the translateExpr function +returns a Translator ([Py.PyStmt], Py.PyExpr). +{{< /sidenote >}}for generating a random list access looks like +{{< sidenote "right" "ast-note" "this:" >}} +The Py.* constructors are a part of a Python AST module I quickly +threw together. I won't showcase it here, but you can always look at the +source code for the blog (which includes this project) +here. +{{< /sidenote >}} + +{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 364 369 >}} + +##### Implementing "lazy evaluation" +Lazy evaluation in functional programs usually arises from +{{< sidenote "right" "graph-note" "graph reduction" >}} +Graph reduction, more specifically the Spineless, +Tagless G-machine is at the core of the Glasgow Haskell +Compiler (GHC). Simon Peyton Jones' earlier book, +Implementing Functional Languages: a tutorial +details an earlier version of the G-machine. +{{< /sidenote >}}. However, Python is neither +functional nor graph-based, and we only lazily +evaluate list selectors. Thus, we'll have to do +some work to get our lazy evaluation to work as we desire. +Here's what I came up with: + +1. It's difficult to insert Python statements where they are +needed: we'd have to figure out in which scope each variable +has already been declared, and in which scope it's yet +to be assigned. +2. Instead, we can use a Python dictionary, called `cache`, +and store computed versions of each variable in the cache. +3. It's pretty difficult to check if a variable +is in the cache, compute it if not, and then return the +result of the computation, in one expression. This is +true, unless that single expression is a function call, and we have a dedicated +function that takes no arguments, computes the expression if needed, +and uses the cache otherwise. We choose this route. +4. We have already promised that we'd evaluate all the selected +variables above a given variable before evaluating the variable +itself. So, each function will first call (and therefore +{{< sidenote "right" "force-note" "force" >}} +{{< todo >}}Explain forcing{{< /todo >}} +{{< /sidenote >}}) the functions +generated for variables declared above the function's own variable. +5. To keep track of all of this, we use the already-existing state monad +as a reader monad (that is, we clear the changes we make to the monad +after we're done translating the list selector). This is where the `Map.Map String [String]` +comes from. + +The `Map.Map String [String]` keeps track of variables that will be lazily computed, +and also of the dependencies of each variable (the variables that need +to be access before the variable itself). We compute such a map for +each selector as follows: + +{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 337 337 >}} + +We update the existing map using `Map.union`: + +{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 338 338 >}} + +And, after we're done generating expressions in the body of this selector, +we clear it to its previous value `vs`: + +{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 341 341 >}} + +We generate a single selector as follows: + +{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 307 320 >}} + +This generates a function definition statement, which we will examine in +generated Python code later on. + +Solving the problem this way also introduces another gotcha: sometimes, +a variable is produced by a function call, and other times the variable +is just a Python variable. We write this as follows: + +{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 322 327 >}} + +##### Special Case Insertion +This is a silly language for a single homework assignment. I'm not +planning to implement Hindley-Milner type inference, or anything +of that sort. For the purpose of this language, things will be +either a list, or not a list. And as long as a function __can__ return +a list, it can also return the list from its base case. Thus, +that's all we will try to figure out. The checking code is so +short that we can include the whole snippet at once: + +{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 258 266 >}} + +`mergePossibleType` +{{< sidenote "right" "bool-identity-note" "figures out" >}} +An observant reader will note that this is just a logical +OR function. It's not, however, good practice to use +booleans for types that have two constructors with no arguments. +Check out this +Elm-based article about this, which the author calls the +Boolean Identity Crisis. +{{< /sidenote >}}, given two possible types for an +expression, the final type for the expression. + +There's only one real trick to this. Sometimes, like in +`_search`, the only time we return something _known_ to be a list, that +something is `xs`. Since we're making a list manipulation language, +let's __assume the first argument to the function is a list__, and +__use this information to determine expression types__. We guess +types in a very basic manner otherwise: If you use the concatenation +operator, or a list literal, then obviously we're working on a list. +If you're returning the first argument of the function, that's also +a list. Otherwise, it could be anything. + +My Haskell linter actually suggested a pretty clever way of writing +the whole "add a base case if this function returns a list" code. +Check it out: + +{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 299 305 >}} + +Specifically, look at the line with `let fastReturn = ...`. It +uses a list comprehension: we take a parameter `p` from the list of +parameter `ps`, but only produce the statements for the base case +if the possible type computed using `p` is `List`. + +### The Output +What kind of beast have we created? Take a look for yourself: +```Python +def qselect(xs,k): + if xs==[]: + return xs + cache = {} + def pivot(): + if ("pivot") not in (cache): + cache["pivot"] = xs.pop(0) + return cache["pivot"] + def left(): + def temp2(arg): + out = [] + for arg0 in arg: + if arg0<=pivot(): + out.append(arg0) + return out + pivot() + if ("left") not in (cache): + cache["left"] = temp2(xs) + return cache["left"] + def right(): + def temp3(arg): + out = [] + for arg0 in arg: + if arg0>pivot(): + out.append(arg0) + return out + left() + pivot() + if ("right") not in (cache): + cache["right"] = temp3(xs) + return cache["right"] + if k>(len(left())+1): + temp4 = qselect(right(), k-len(left())-1) + else: + if k==(len(left())+1): + temp5 = [pivot()] + else: + temp5 = qselect(left(), k) + temp4 = temp5 + return temp4 +def _search(xs,k): + if xs==[]: + return xs + if xs[1]==k: + temp6 = xs + else: + if xs[1]>k: + temp8 = _search(xs[0], k) + else: + temp8 = _search(xs[2], k) + temp6 = temp8 + return temp6 +def sorted(xs): + if xs==[]: + return xs + return sorted(xs[0])+[xs[1]]+sorted(xs[2]) +def search(xs,k): + return len(_search(xs, k))!=0 +def insert(xs,k): + return _insert(k, _search(xs, k)) +def _insert(k,xs): + if k==[]: + return k + if len(xs)==0: + temp16 = xs + temp16.append([]) + temp17 = temp16 + temp17.append(k) + temp18 = temp17 + temp18.append([]) + temp15 = temp18 + else: + temp15 = xs + return temp15 +``` +It's...horrible! All the `tempX` variables, __three layers of nested function declarations__, hardcoded cache access. This is not something you'd ever want to write. +Even to get this code, I had to come up with hacks __in a language I created__. +The first is the hack is to make the `qselect` function use the `xs == []` base +case. This doesn't happen by default, because `qselect` doesn't return a list! +To "fix" this, I made `qselect` return the number it found, wrapped in a +list literal. This is not up to spec, and would require another function +to unwrap this list. + +While `qselect` was struggling with not having the base case, `insert` had +a base case it didn't need: `insert` shouldn't return the list itself +when it's empty, it should insert into it! However, when we use the `<<` +list insertion operator, the language infers `insert` to be a list-returning +function itself, inserting into an empty list will always fail. So, we +make a function `_insert`, which __takes the arguments in reverse__. +The base case will still be generated, but the first argument (against +which the base case is checked) will be a number, so the `k == []` check +will always fail. + +That concludes this post. I'll be working on more solutions to homework +assignments in self-made languages, so keep an eye out!