Add the first post in CS325 series
This commit is contained in:
		
							parent
							
								
									a406fb0846
								
							
						
					
					
						commit
						19aa126025
					
				| @ -17,4 +17,3 @@ sorted(xs) = sorted(xs[0]) ++ [xs[1]] ++ sorted(xs[2]); | ||||
| search(xs, k) = |_search(xs, k)| != 0; | ||||
| insert(xs, k) = _insert(k, _search(xs, k)); | ||||
| _insert(k, xs) = if |xs| == 0 then xs << [] << k << [] else xs | ||||
| 
 | ||||
|  | ||||
| @ -270,14 +270,13 @@ type Translator = Control.Monad.State.State (Map.Map String [String], Int) | ||||
| 
 | ||||
| currentTemp :: Translator String | ||||
| currentTemp = do | ||||
|     (_, t) <- get | ||||
|     t <- gets snd | ||||
|     return $ "temp" ++ show t | ||||
| 
 | ||||
| incrementTemp :: Translator String | ||||
| incrementTemp = do | ||||
|     (vs, t) <- get | ||||
|     put (vs, t+1) | ||||
|     return $ "temp" ++ show t | ||||
|     modify (second (+1)) | ||||
|     currentTemp | ||||
| 
 | ||||
| hasLambda :: Expr -> Bool | ||||
| hasLambda (ListLiteral es) = any hasLambda es | ||||
|  | ||||
| @ -1,7 +1,6 @@ | ||||
| --- | ||||
| title: A Language for an Assignment - Homework 1 | ||||
| date: 2019-12-27T23:27:09-08:00 | ||||
| draft: true | ||||
| tags: ["Haskell", "Python", "Algorithms"] | ||||
| --- | ||||
| 
 | ||||
| @ -32,7 +31,7 @@ in our Programming Languages class. So the final goal ended up: | ||||
| 
 | ||||
| It may not be worth it to create a whole | ||||
| {{< sidenote "right" "general-purpose-note" "general-purpose" >}} | ||||
| A general purpose language is one that's designed to be used in vairous | ||||
| A general purpose language is one that's designed to be used in various | ||||
| domains. For instance, C++ is a general-purpose language because it can | ||||
| be used for embedded systems, GUI programs, and pretty much anything else. | ||||
| This is in contrast to a domain-specific language, such as Game Maker Language, | ||||
| @ -41,7 +40,7 @@ which is aimed at a much narrower set of uses. | ||||
| but nowhere in the challenge did we say that it had to be general-purpose. In | ||||
| fact, some interesting design thinking can go into designing a domain-specific | ||||
| language for a particular assignment. So let's jump right into it, and make | ||||
| a language for the the first homework assignment. | ||||
| a language for the first homework assignment. | ||||
| 
 | ||||
| ### Homework 1 | ||||
| There are two problems in Homework 1. Here they are, verbatim: | ||||
| @ -95,7 +94,7 @@ C++ optimizes the <a href="https://godbolt.org/z/3skK9j">Collatz Conjecture func | ||||
| Clang doesn't know whether or not the function will terminate (whether the Collatz Conjecture | ||||
| function terminates is an <a href="https://en.wikipedia.org/wiki/Collatz_conjecture">unsolved problem</a>), | ||||
| but functions that don't terminate are undefined behavior. There's only one other way the function | ||||
| returns, and that's with "1". Thus, clang optimzes the entire function to a single "return 1" call. | ||||
| returns, and that's with "1". Thus, clang optimizes the entire function to a single "return 1" call. | ||||
| {{< /sidenote >}} in C++: | ||||
| we can do whatever we want. So, let's allow it to return `[]` in the `None` case. | ||||
| This makes this base case valid: | ||||
| @ -240,42 +239,271 @@ lazily evaluated, ordered expressions. The whole `qselect` becomes: | ||||
| We've now figured out all the language constructs. Let's start working on | ||||
| some implementation! | ||||
| 
 | ||||
| #### Data Definitions | ||||
| Let's start with defining the AST and other data types for our language: | ||||
| #### Implementation | ||||
| It would be silly of me to explain every detail of creating a language in Haskell | ||||
| in this post; this is neither the purpose of the post, nor is it plausible | ||||
| to do this without covering monads, parser combinators, grammars, abstract syntax | ||||
| trees, and more. So, instead, I'll discuss the _interesting_ parts of the | ||||
| implementation.  | ||||
| 
 | ||||
| {{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 14 52 >}} | ||||
| ##### Temporary Variables | ||||
| Our language is expression-based, yes. A function is a single, | ||||
| arbitrarily complex expression (involving `if/else`, list | ||||
| selectors, and more). So it would make sense to translate | ||||
| a function to a single, arbitrarily complex Python expression. | ||||
| However, the way we've designed our language makes it | ||||
| not-so-suitable for converting to a single expression! For | ||||
| instance, consider `xs[rand]`. We need to compute the list, | ||||
| get its length, generate a random number, and then access | ||||
| the corresponding element in the list. We use the list | ||||
| here twice, and simply repeating the expression would not | ||||
| be very smart: we'd be evaluating twice. So instead, | ||||
| we'll use a variable, assign the list to that variable, | ||||
| and then access that variable multiple times. | ||||
| 
 | ||||
| The `PossibleType` class will be used when we figure out if a function returns | ||||
| a list or not, for our base case insertion rule. The `Selector` type | ||||
| will hold a single line in the list selector we defined earlier, and | ||||
| the `SelectorMarker` will indicate if the user added the `!` "remove from list" | ||||
| marker at the end. To represent the various operators in our language, we create | ||||
| the `Op` data type. Note that unlike Python, `++` (list concatenation) and | ||||
| `+` (addition) are different operators in our language. | ||||
| To be extra safe, let's use a fresh temporary variable | ||||
| every time we need to store something. The simplest | ||||
| way is to simply maintain a counter of how many temporary | ||||
| variables we've already used, and generate a new variable | ||||
| by prepending the word "temp" to that number. We start | ||||
| with `temp0`, then `temp1`, and so on. To keep a counter, | ||||
| we can use a state monad: | ||||
| 
 | ||||
| We then define valid expressions. Obviously, a variable (like `xs`), an | ||||
| integer literal (like `1`) and a list literal (like `[]`) are allowed. | ||||
| We also put in our selector, which consists of the expression on the | ||||
| left, the list of selector branches (`[Selector]`) and the expression | ||||
| of "what to actually do with the new variables". We also | ||||
| add `if`-expressions (like we discussed), and function calls. Lastly, | ||||
| we add binary operators like (`x+y`), the length operator (`|xs|`), | ||||
| and the list access operator (`xs[0]`). We also make `#0` a part | ||||
| of the expression syntax, even though it's only allowed inside | ||||
| a list access. | ||||
| {{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 269 269 >}} | ||||
| 
 | ||||
| Of course, we wouldn't want to write our language using | ||||
| Haskell. We want to actually write a text file, like `hw1.lang`, | ||||
| and then have our program translate that to Python. The first | ||||
| step to that is __parsing__: we need to turn our language text | ||||
| into the `Expr` structure we have. | ||||
| Don't worry about the `Map.Map String [String]`, we'll get to that in a bit. | ||||
| For now, all we have to worry about is the second element of the tuple, | ||||
| the integer counting how many temporary variables we've used. We can | ||||
| get the current temporary variable as follows: | ||||
| 
 | ||||
| #### Parsing | ||||
| We'll be using `Parsec` for parsing. `Parsec` is a parsing library | ||||
| based on | ||||
| {{< sidenote "right" "monad-note" "monadic" >}} | ||||
| Haskell is a language with more monad tutorials than | ||||
| programmers. For this reason, I will resist the temptation | ||||
| to explain what monads are. If you <em>don't</em> know | ||||
| what they are, don't worry, there are plenty of other resources. | ||||
| {{< /sidenote >}} parser combinators. | ||||
| {{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 271 274 >}} | ||||
| 
 | ||||
| We can also get a fresh temporary variable like this: | ||||
| 
 | ||||
| {{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 276 279 >}} | ||||
| 
 | ||||
| Now, the | ||||
| {{< sidenote "left" "" "code" >}} | ||||
| Since we are translating an expression, we must have the result of | ||||
| the translation yield an Python expression we can use in generating | ||||
| larger Python expressions. However, as we've seen, we occasionally | ||||
| have to use statements. Thus, the <code>translateExpr</code> function | ||||
| returns a <code>Translator ([Py.PyStmt], Py.PyExpr)</code>. | ||||
| {{< /sidenote >}}for generating a random list access looks like | ||||
| {{< sidenote "right" "ast-note" "this:" >}} | ||||
| The <code>Py.*</code> constructors are a part of a Python AST module I quickly | ||||
| threw together. I won't showcase it here, but you can always look at the | ||||
| source code for the blog (which includes this project) | ||||
| <a href="https://dev.danilafe.com/Web-Projects/blog-static">here</a>. | ||||
| {{< /sidenote >}} | ||||
| 
 | ||||
| {{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 364 369 >}} | ||||
| 
 | ||||
| ##### Implementing "lazy evaluation" | ||||
| Lazy evaluation in functional programs usually arises from | ||||
| {{< sidenote "right" "graph-note" "graph reduction" >}} | ||||
| Graph reduction, more specifically the <em>Spineless, | ||||
| Tagless G-machine</em> is at the core of the Glasgow Haskell | ||||
| Compiler (GHC). Simon Peyton Jones' earlier book, | ||||
| <em>Implementing Functional Languages: a tutorial</em> | ||||
| details an earlier version of the G-machine. | ||||
| {{< /sidenote >}}. However, Python is neither | ||||
| functional nor graph-based, and we only lazily | ||||
| evaluate list selectors. Thus, we'll have to do | ||||
| some work to get our lazy evaluation to work as we desire. | ||||
| Here's what I came up with: | ||||
| 
 | ||||
| 1. It's difficult to insert Python statements where they are | ||||
| needed: we'd have to figure out in which scope each variable | ||||
| has already been declared, and in which scope it's yet | ||||
| to be assigned.  | ||||
| 2. Instead, we can use a Python dictionary, called `cache`, | ||||
| and store computed versions of each variable in the cache. | ||||
| 3. It's pretty difficult to check if a variable | ||||
| is in the cache, compute it if not, and then return the | ||||
| result of the computation, in one expression. This is | ||||
| true, unless that single expression is a function call, and we have a dedicated | ||||
| function that takes no arguments, computes the expression if needed, | ||||
| and uses the cache otherwise. We choose this route. | ||||
| 4. We have already promised that we'd evaluate all the selected | ||||
| variables above a given variable before evaluating the variable | ||||
| itself. So, each function will first call (and therefore | ||||
| {{< sidenote "right" "force-note" "force" >}} | ||||
| {{< todo >}}Explain forcing{{< /todo >}} | ||||
| {{< /sidenote >}}) the functions | ||||
| generated for variables declared above the function's own variable. | ||||
| 5. To keep track of all of this, we use the already-existing state monad | ||||
| as a reader monad (that is, we clear the changes we make to the monad | ||||
| after we're done translating the list selector). This is where the `Map.Map String [String]` | ||||
| comes from. | ||||
| 
 | ||||
| The `Map.Map String [String]` keeps track of variables that will be lazily computed, | ||||
| and also of the dependencies of each variable (the variables that need | ||||
| to be access before the variable itself). We compute such a map for | ||||
| each selector as follows:  | ||||
| 
 | ||||
| {{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 337 337 >}} | ||||
| 
 | ||||
| We update the existing map using `Map.union`: | ||||
| 
 | ||||
| {{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 338 338 >}} | ||||
| 
 | ||||
| And, after we're done generating expressions in the body of this selector, | ||||
| we clear it to its previous value `vs`: | ||||
| 
 | ||||
| {{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 341 341 >}} | ||||
| 
 | ||||
| We generate a single selector as follows: | ||||
| 
 | ||||
| {{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 307 320 >}} | ||||
| 
 | ||||
| This generates a function definition statement, which we will examine in | ||||
| generated Python code later on. | ||||
| 
 | ||||
| Solving the problem this way also introduces another gotcha: sometimes, | ||||
| a variable is produced by a function call, and other times the variable | ||||
| is just a Python variable. We write this as follows: | ||||
| 
 | ||||
| {{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 322 327 >}} | ||||
| 
 | ||||
| ##### Special Case Insertion | ||||
| This is a silly language for a single homework assignment. I'm not | ||||
| planning to implement Hindley-Milner type inference, or anything | ||||
| of that sort. For the purpose of this language, things will be | ||||
| either a list, or not a list. And as long as a function __can__ return | ||||
| a list, it can also return the list from its base case. Thus, | ||||
| that's all we will try to figure out. The checking code is so | ||||
| short that we can include the whole snippet at once: | ||||
| 
 | ||||
| {{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 258 266 >}} | ||||
| 
 | ||||
| `mergePossibleType` | ||||
| {{< sidenote "right" "bool-identity-note" "figures out" >}} | ||||
| An observant reader will note that this is just a logical | ||||
| OR function. It's not, however, good practice to use | ||||
| booleans for types that have two constructors with no arguments. | ||||
| Check out this <a href="https://programming-elm.com/blog/2019-05-20-solving-the-boolean-identity-crisis-part-1/"> | ||||
| Elm-based article</a> about this, which the author calls the | ||||
| Boolean Identity Crisis. | ||||
| {{< /sidenote >}}, given two possible types for an | ||||
| expression, the final type for the expression. | ||||
| 
 | ||||
| There's only one real trick to this. Sometimes, like in | ||||
| `_search`, the only time we return something _known_ to be a list, that | ||||
| something is `xs`. Since we're making a list manipulation language, | ||||
| let's __assume the first argument to the function is a list__, and | ||||
| __use this information to determine expression types__. We guess | ||||
| types in a very basic manner otherwise: If you use the concatenation | ||||
| operator, or a list literal, then obviously we're working on a list. | ||||
| If you're returning the first argument of the function, that's also | ||||
| a list. Otherwise, it could be anything. | ||||
| 
 | ||||
| My Haskell linter actually suggested a pretty clever way of writing | ||||
| the whole "add a base case if this function returns a list" code. | ||||
| Check it out: | ||||
| 
 | ||||
| {{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 299 305 >}} | ||||
| 
 | ||||
| Specifically, look at the line with `let fastReturn = ...`. It | ||||
| uses a list comprehension: we take a parameter `p` from the list of | ||||
| parameter `ps`, but only produce the statements for the base case | ||||
| if the possible type computed using `p` is `List`. | ||||
| 
 | ||||
| ### The Output | ||||
| What kind of beast have we created? Take a look for yourself: | ||||
| ```Python | ||||
| def qselect(xs,k): | ||||
|     if xs==[]: | ||||
|         return xs | ||||
|     cache = {} | ||||
|     def pivot(): | ||||
|         if ("pivot") not in (cache): | ||||
|             cache["pivot"] = xs.pop(0) | ||||
|         return cache["pivot"] | ||||
|     def left(): | ||||
|         def temp2(arg): | ||||
|             out = [] | ||||
|             for arg0 in arg: | ||||
|                 if arg0<=pivot(): | ||||
|                     out.append(arg0) | ||||
|             return out | ||||
|         pivot() | ||||
|         if ("left") not in (cache): | ||||
|             cache["left"] = temp2(xs) | ||||
|         return cache["left"] | ||||
|     def right(): | ||||
|         def temp3(arg): | ||||
|             out = [] | ||||
|             for arg0 in arg: | ||||
|                 if arg0>pivot(): | ||||
|                     out.append(arg0) | ||||
|             return out | ||||
|         left() | ||||
|         pivot() | ||||
|         if ("right") not in (cache): | ||||
|             cache["right"] = temp3(xs) | ||||
|         return cache["right"] | ||||
|     if k>(len(left())+1): | ||||
|         temp4 = qselect(right(), k-len(left())-1) | ||||
|     else: | ||||
|         if k==(len(left())+1): | ||||
|             temp5 = [pivot()] | ||||
|         else: | ||||
|             temp5 = qselect(left(), k) | ||||
|         temp4 = temp5 | ||||
|     return temp4 | ||||
| def _search(xs,k): | ||||
|     if xs==[]: | ||||
|         return xs | ||||
|     if xs[1]==k: | ||||
|         temp6 = xs | ||||
|     else: | ||||
|         if xs[1]>k: | ||||
|             temp8 = _search(xs[0], k) | ||||
|         else: | ||||
|             temp8 = _search(xs[2], k) | ||||
|         temp6 = temp8 | ||||
|     return temp6 | ||||
| def sorted(xs): | ||||
|     if xs==[]: | ||||
|         return xs | ||||
|     return sorted(xs[0])+[xs[1]]+sorted(xs[2]) | ||||
| def search(xs,k): | ||||
|     return len(_search(xs, k))!=0 | ||||
| def insert(xs,k): | ||||
|     return _insert(k, _search(xs, k)) | ||||
| def _insert(k,xs): | ||||
|     if k==[]: | ||||
|         return k | ||||
|     if len(xs)==0: | ||||
|         temp16 = xs | ||||
|         temp16.append([]) | ||||
|         temp17 = temp16 | ||||
|         temp17.append(k) | ||||
|         temp18 = temp17 | ||||
|         temp18.append([]) | ||||
|         temp15 = temp18 | ||||
|     else: | ||||
|         temp15 = xs | ||||
|     return temp15 | ||||
| ``` | ||||
| It's...horrible! All the `tempX` variables, __three layers of nested function declarations__, hardcoded cache access. This is not something you'd ever want to write. | ||||
| Even to get this code, I had to come up with hacks __in a language I created__. | ||||
| The first is the hack is to make the `qselect` function use the `xs == []` base | ||||
| case. This doesn't happen by default, because `qselect` doesn't return a list! | ||||
| To "fix" this, I made `qselect` return the number it found, wrapped in a | ||||
| list literal. This is not up to spec, and would require another function | ||||
| to unwrap this list. | ||||
| 
 | ||||
| While `qselect` was struggling with not having the base case, `insert` had | ||||
| a base case it didn't need: `insert` shouldn't return the list itself | ||||
| when it's empty, it should insert into it! However, when we use the `<<` | ||||
| list insertion operator, the language infers `insert` to be a list-returning | ||||
| function itself, inserting into an empty list will always fail. So, we | ||||
| make a function `_insert`, which __takes the arguments in reverse__. | ||||
| The base case will still be generated, but the first argument (against | ||||
| which the base case is checked) will be a number, so the `k == []` check | ||||
| will always fail. | ||||
| 
 | ||||
| That concludes this post. I'll be working on more solutions to homework | ||||
| assignments in self-made languages, so keep an eye out! | ||||
|  | ||||
		Loading…
	
		Reference in New Issue
	
	Block a user