Add the first post in CS325 series

This commit is contained in:
Danila Fedorin 2019-12-29 22:47:36 -08:00
parent a406fb0846
commit 19aa126025
3 changed files with 269 additions and 43 deletions

View File

@ -17,4 +17,3 @@ sorted(xs) = sorted(xs[0]) ++ [xs[1]] ++ sorted(xs[2]);
search(xs, k) = |_search(xs, k)| != 0;
insert(xs, k) = _insert(k, _search(xs, k));
_insert(k, xs) = if |xs| == 0 then xs << [] << k << [] else xs

View File

@ -270,14 +270,13 @@ type Translator = Control.Monad.State.State (Map.Map String [String], Int)
currentTemp :: Translator String
currentTemp = do
(_, t) <- get
t <- gets snd
return $ "temp" ++ show t
incrementTemp :: Translator String
incrementTemp = do
(vs, t) <- get
put (vs, t+1)
return $ "temp" ++ show t
modify (second (+1))
currentTemp
hasLambda :: Expr -> Bool
hasLambda (ListLiteral es) = any hasLambda es

View File

@ -1,7 +1,6 @@
---
title: A Language for an Assignment - Homework 1
date: 2019-12-27T23:27:09-08:00
draft: true
tags: ["Haskell", "Python", "Algorithms"]
---
@ -32,7 +31,7 @@ in our Programming Languages class. So the final goal ended up:
It may not be worth it to create a whole
{{< sidenote "right" "general-purpose-note" "general-purpose" >}}
A general purpose language is one that's designed to be used in vairous
A general purpose language is one that's designed to be used in various
domains. For instance, C++ is a general-purpose language because it can
be used for embedded systems, GUI programs, and pretty much anything else.
This is in contrast to a domain-specific language, such as Game Maker Language,
@ -41,7 +40,7 @@ which is aimed at a much narrower set of uses.
but nowhere in the challenge did we say that it had to be general-purpose. In
fact, some interesting design thinking can go into designing a domain-specific
language for a particular assignment. So let's jump right into it, and make
a language for the the first homework assignment.
a language for the first homework assignment.
### Homework 1
There are two problems in Homework 1. Here they are, verbatim:
@ -95,7 +94,7 @@ C++ optimizes the <a href="https://godbolt.org/z/3skK9j">Collatz Conjecture func
Clang doesn't know whether or not the function will terminate (whether the Collatz Conjecture
function terminates is an <a href="https://en.wikipedia.org/wiki/Collatz_conjecture">unsolved problem</a>),
but functions that don't terminate are undefined behavior. There's only one other way the function
returns, and that's with "1". Thus, clang optimzes the entire function to a single "return 1" call.
returns, and that's with "1". Thus, clang optimizes the entire function to a single "return 1" call.
{{< /sidenote >}} in C++:
we can do whatever we want. So, let's allow it to return `[]` in the `None` case.
This makes this base case valid:
@ -240,42 +239,271 @@ lazily evaluated, ordered expressions. The whole `qselect` becomes:
We've now figured out all the language constructs. Let's start working on
some implementation!
#### Data Definitions
Let's start with defining the AST and other data types for our language:
#### Implementation
It would be silly of me to explain every detail of creating a language in Haskell
in this post; this is neither the purpose of the post, nor is it plausible
to do this without covering monads, parser combinators, grammars, abstract syntax
trees, and more. So, instead, I'll discuss the _interesting_ parts of the
implementation.
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 14 52 >}}
##### Temporary Variables
Our language is expression-based, yes. A function is a single,
arbitrarily complex expression (involving `if/else`, list
selectors, and more). So it would make sense to translate
a function to a single, arbitrarily complex Python expression.
However, the way we've designed our language makes it
not-so-suitable for converting to a single expression! For
instance, consider `xs[rand]`. We need to compute the list,
get its length, generate a random number, and then access
the corresponding element in the list. We use the list
here twice, and simply repeating the expression would not
be very smart: we'd be evaluating twice. So instead,
we'll use a variable, assign the list to that variable,
and then access that variable multiple times.
The `PossibleType` class will be used when we figure out if a function returns
a list or not, for our base case insertion rule. The `Selector` type
will hold a single line in the list selector we defined earlier, and
the `SelectorMarker` will indicate if the user added the `!` "remove from list"
marker at the end. To represent the various operators in our language, we create
the `Op` data type. Note that unlike Python, `++` (list concatenation) and
`+` (addition) are different operators in our language.
To be extra safe, let's use a fresh temporary variable
every time we need to store something. The simplest
way is to simply maintain a counter of how many temporary
variables we've already used, and generate a new variable
by prepending the word "temp" to that number. We start
with `temp0`, then `temp1`, and so on. To keep a counter,
we can use a state monad:
We then define valid expressions. Obviously, a variable (like `xs`), an
integer literal (like `1`) and a list literal (like `[]`) are allowed.
We also put in our selector, which consists of the expression on the
left, the list of selector branches (`[Selector]`) and the expression
of "what to actually do with the new variables". We also
add `if`-expressions (like we discussed), and function calls. Lastly,
we add binary operators like (`x+y`), the length operator (`|xs|`),
and the list access operator (`xs[0]`). We also make `#0` a part
of the expression syntax, even though it's only allowed inside
a list access.
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 269 269 >}}
Of course, we wouldn't want to write our language using
Haskell. We want to actually write a text file, like `hw1.lang`,
and then have our program translate that to Python. The first
step to that is __parsing__: we need to turn our language text
into the `Expr` structure we have.
Don't worry about the `Map.Map String [String]`, we'll get to that in a bit.
For now, all we have to worry about is the second element of the tuple,
the integer counting how many temporary variables we've used. We can
get the current temporary variable as follows:
#### Parsing
We'll be using `Parsec` for parsing. `Parsec` is a parsing library
based on
{{< sidenote "right" "monad-note" "monadic" >}}
Haskell is a language with more monad tutorials than
programmers. For this reason, I will resist the temptation
to explain what monads are. If you <em>don't</em> know
what they are, don't worry, there are plenty of other resources.
{{< /sidenote >}} parser combinators.
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 271 274 >}}
We can also get a fresh temporary variable like this:
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 276 279 >}}
Now, the
{{< sidenote "left" "" "code" >}}
Since we are translating an expression, we must have the result of
the translation yield an Python expression we can use in generating
larger Python expressions. However, as we've seen, we occasionally
have to use statements. Thus, the <code>translateExpr</code> function
returns a <code>Translator ([Py.PyStmt], Py.PyExpr)</code>.
{{< /sidenote >}}for generating a random list access looks like
{{< sidenote "right" "ast-note" "this:" >}}
The <code>Py.*</code> constructors are a part of a Python AST module I quickly
threw together. I won't showcase it here, but you can always look at the
source code for the blog (which includes this project)
<a href="https://dev.danilafe.com/Web-Projects/blog-static">here</a>.
{{< /sidenote >}}
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 364 369 >}}
##### Implementing "lazy evaluation"
Lazy evaluation in functional programs usually arises from
{{< sidenote "right" "graph-note" "graph reduction" >}}
Graph reduction, more specifically the <em>Spineless,
Tagless G-machine</em> is at the core of the Glasgow Haskell
Compiler (GHC). Simon Peyton Jones' earlier book,
<em>Implementing Functional Languages: a tutorial</em>
details an earlier version of the G-machine.
{{< /sidenote >}}. However, Python is neither
functional nor graph-based, and we only lazily
evaluate list selectors. Thus, we'll have to do
some work to get our lazy evaluation to work as we desire.
Here's what I came up with:
1. It's difficult to insert Python statements where they are
needed: we'd have to figure out in which scope each variable
has already been declared, and in which scope it's yet
to be assigned.
2. Instead, we can use a Python dictionary, called `cache`,
and store computed versions of each variable in the cache.
3. It's pretty difficult to check if a variable
is in the cache, compute it if not, and then return the
result of the computation, in one expression. This is
true, unless that single expression is a function call, and we have a dedicated
function that takes no arguments, computes the expression if needed,
and uses the cache otherwise. We choose this route.
4. We have already promised that we'd evaluate all the selected
variables above a given variable before evaluating the variable
itself. So, each function will first call (and therefore
{{< sidenote "right" "force-note" "force" >}}
{{< todo >}}Explain forcing{{< /todo >}}
{{< /sidenote >}}) the functions
generated for variables declared above the function's own variable.
5. To keep track of all of this, we use the already-existing state monad
as a reader monad (that is, we clear the changes we make to the monad
after we're done translating the list selector). This is where the `Map.Map String [String]`
comes from.
The `Map.Map String [String]` keeps track of variables that will be lazily computed,
and also of the dependencies of each variable (the variables that need
to be access before the variable itself). We compute such a map for
each selector as follows:
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 337 337 >}}
We update the existing map using `Map.union`:
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 338 338 >}}
And, after we're done generating expressions in the body of this selector,
we clear it to its previous value `vs`:
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 341 341 >}}
We generate a single selector as follows:
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 307 320 >}}
This generates a function definition statement, which we will examine in
generated Python code later on.
Solving the problem this way also introduces another gotcha: sometimes,
a variable is produced by a function call, and other times the variable
is just a Python variable. We write this as follows:
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 322 327 >}}
##### Special Case Insertion
This is a silly language for a single homework assignment. I'm not
planning to implement Hindley-Milner type inference, or anything
of that sort. For the purpose of this language, things will be
either a list, or not a list. And as long as a function __can__ return
a list, it can also return the list from its base case. Thus,
that's all we will try to figure out. The checking code is so
short that we can include the whole snippet at once:
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 258 266 >}}
`mergePossibleType`
{{< sidenote "right" "bool-identity-note" "figures out" >}}
An observant reader will note that this is just a logical
OR function. It's not, however, good practice to use
booleans for types that have two constructors with no arguments.
Check out this <a href="https://programming-elm.com/blog/2019-05-20-solving-the-boolean-identity-crisis-part-1/">
Elm-based article</a> about this, which the author calls the
Boolean Identity Crisis.
{{< /sidenote >}}, given two possible types for an
expression, the final type for the expression.
There's only one real trick to this. Sometimes, like in
`_search`, the only time we return something _known_ to be a list, that
something is `xs`. Since we're making a list manipulation language,
let's __assume the first argument to the function is a list__, and
__use this information to determine expression types__. We guess
types in a very basic manner otherwise: If you use the concatenation
operator, or a list literal, then obviously we're working on a list.
If you're returning the first argument of the function, that's also
a list. Otherwise, it could be anything.
My Haskell linter actually suggested a pretty clever way of writing
the whole "add a base case if this function returns a list" code.
Check it out:
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 299 305 >}}
Specifically, look at the line with `let fastReturn = ...`. It
uses a list comprehension: we take a parameter `p` from the list of
parameter `ps`, but only produce the statements for the base case
if the possible type computed using `p` is `List`.
### The Output
What kind of beast have we created? Take a look for yourself:
```Python
def qselect(xs,k):
if xs==[]:
return xs
cache = {}
def pivot():
if ("pivot") not in (cache):
cache["pivot"] = xs.pop(0)
return cache["pivot"]
def left():
def temp2(arg):
out = []
for arg0 in arg:
if arg0<=pivot():
out.append(arg0)
return out
pivot()
if ("left") not in (cache):
cache["left"] = temp2(xs)
return cache["left"]
def right():
def temp3(arg):
out = []
for arg0 in arg:
if arg0>pivot():
out.append(arg0)
return out
left()
pivot()
if ("right") not in (cache):
cache["right"] = temp3(xs)
return cache["right"]
if k>(len(left())+1):
temp4 = qselect(right(), k-len(left())-1)
else:
if k==(len(left())+1):
temp5 = [pivot()]
else:
temp5 = qselect(left(), k)
temp4 = temp5
return temp4
def _search(xs,k):
if xs==[]:
return xs
if xs[1]==k:
temp6 = xs
else:
if xs[1]>k:
temp8 = _search(xs[0], k)
else:
temp8 = _search(xs[2], k)
temp6 = temp8
return temp6
def sorted(xs):
if xs==[]:
return xs
return sorted(xs[0])+[xs[1]]+sorted(xs[2])
def search(xs,k):
return len(_search(xs, k))!=0
def insert(xs,k):
return _insert(k, _search(xs, k))
def _insert(k,xs):
if k==[]:
return k
if len(xs)==0:
temp16 = xs
temp16.append([])
temp17 = temp16
temp17.append(k)
temp18 = temp17
temp18.append([])
temp15 = temp18
else:
temp15 = xs
return temp15
```
It's...horrible! All the `tempX` variables, __three layers of nested function declarations__, hardcoded cache access. This is not something you'd ever want to write.
Even to get this code, I had to come up with hacks __in a language I created__.
The first is the hack is to make the `qselect` function use the `xs == []` base
case. This doesn't happen by default, because `qselect` doesn't return a list!
To "fix" this, I made `qselect` return the number it found, wrapped in a
list literal. This is not up to spec, and would require another function
to unwrap this list.
While `qselect` was struggling with not having the base case, `insert` had
a base case it didn't need: `insert` shouldn't return the list itself
when it's empty, it should insert into it! However, when we use the `<<`
list insertion operator, the language infers `insert` to be a list-returning
function itself, inserting into an empty list will always fail. So, we
make a function `_insert`, which __takes the arguments in reverse__.
The base case will still be generated, but the first argument (against
which the base case is checked) will be a number, so the `k == []` check
will always fail.
That concludes this post. I'll be working on more solutions to homework
assignments in self-made languages, so keep an eye out!