Add the first post in CS325 series
This commit is contained in:
parent
a406fb0846
commit
19aa126025
@ -17,4 +17,3 @@ sorted(xs) = sorted(xs[0]) ++ [xs[1]] ++ sorted(xs[2]);
|
||||
search(xs, k) = |_search(xs, k)| != 0;
|
||||
insert(xs, k) = _insert(k, _search(xs, k));
|
||||
_insert(k, xs) = if |xs| == 0 then xs << [] << k << [] else xs
|
||||
|
||||
|
@ -270,14 +270,13 @@ type Translator = Control.Monad.State.State (Map.Map String [String], Int)
|
||||
|
||||
currentTemp :: Translator String
|
||||
currentTemp = do
|
||||
(_, t) <- get
|
||||
t <- gets snd
|
||||
return $ "temp" ++ show t
|
||||
|
||||
incrementTemp :: Translator String
|
||||
incrementTemp = do
|
||||
(vs, t) <- get
|
||||
put (vs, t+1)
|
||||
return $ "temp" ++ show t
|
||||
modify (second (+1))
|
||||
currentTemp
|
||||
|
||||
hasLambda :: Expr -> Bool
|
||||
hasLambda (ListLiteral es) = any hasLambda es
|
||||
|
@ -1,7 +1,6 @@
|
||||
---
|
||||
title: A Language for an Assignment - Homework 1
|
||||
date: 2019-12-27T23:27:09-08:00
|
||||
draft: true
|
||||
tags: ["Haskell", "Python", "Algorithms"]
|
||||
---
|
||||
|
||||
@ -32,7 +31,7 @@ in our Programming Languages class. So the final goal ended up:
|
||||
|
||||
It may not be worth it to create a whole
|
||||
{{< sidenote "right" "general-purpose-note" "general-purpose" >}}
|
||||
A general purpose language is one that's designed to be used in vairous
|
||||
A general purpose language is one that's designed to be used in various
|
||||
domains. For instance, C++ is a general-purpose language because it can
|
||||
be used for embedded systems, GUI programs, and pretty much anything else.
|
||||
This is in contrast to a domain-specific language, such as Game Maker Language,
|
||||
@ -41,7 +40,7 @@ which is aimed at a much narrower set of uses.
|
||||
but nowhere in the challenge did we say that it had to be general-purpose. In
|
||||
fact, some interesting design thinking can go into designing a domain-specific
|
||||
language for a particular assignment. So let's jump right into it, and make
|
||||
a language for the the first homework assignment.
|
||||
a language for the first homework assignment.
|
||||
|
||||
### Homework 1
|
||||
There are two problems in Homework 1. Here they are, verbatim:
|
||||
@ -95,7 +94,7 @@ C++ optimizes the <a href="https://godbolt.org/z/3skK9j">Collatz Conjecture func
|
||||
Clang doesn't know whether or not the function will terminate (whether the Collatz Conjecture
|
||||
function terminates is an <a href="https://en.wikipedia.org/wiki/Collatz_conjecture">unsolved problem</a>),
|
||||
but functions that don't terminate are undefined behavior. There's only one other way the function
|
||||
returns, and that's with "1". Thus, clang optimzes the entire function to a single "return 1" call.
|
||||
returns, and that's with "1". Thus, clang optimizes the entire function to a single "return 1" call.
|
||||
{{< /sidenote >}} in C++:
|
||||
we can do whatever we want. So, let's allow it to return `[]` in the `None` case.
|
||||
This makes this base case valid:
|
||||
@ -240,42 +239,271 @@ lazily evaluated, ordered expressions. The whole `qselect` becomes:
|
||||
We've now figured out all the language constructs. Let's start working on
|
||||
some implementation!
|
||||
|
||||
#### Data Definitions
|
||||
Let's start with defining the AST and other data types for our language:
|
||||
#### Implementation
|
||||
It would be silly of me to explain every detail of creating a language in Haskell
|
||||
in this post; this is neither the purpose of the post, nor is it plausible
|
||||
to do this without covering monads, parser combinators, grammars, abstract syntax
|
||||
trees, and more. So, instead, I'll discuss the _interesting_ parts of the
|
||||
implementation.
|
||||
|
||||
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 14 52 >}}
|
||||
##### Temporary Variables
|
||||
Our language is expression-based, yes. A function is a single,
|
||||
arbitrarily complex expression (involving `if/else`, list
|
||||
selectors, and more). So it would make sense to translate
|
||||
a function to a single, arbitrarily complex Python expression.
|
||||
However, the way we've designed our language makes it
|
||||
not-so-suitable for converting to a single expression! For
|
||||
instance, consider `xs[rand]`. We need to compute the list,
|
||||
get its length, generate a random number, and then access
|
||||
the corresponding element in the list. We use the list
|
||||
here twice, and simply repeating the expression would not
|
||||
be very smart: we'd be evaluating twice. So instead,
|
||||
we'll use a variable, assign the list to that variable,
|
||||
and then access that variable multiple times.
|
||||
|
||||
The `PossibleType` class will be used when we figure out if a function returns
|
||||
a list or not, for our base case insertion rule. The `Selector` type
|
||||
will hold a single line in the list selector we defined earlier, and
|
||||
the `SelectorMarker` will indicate if the user added the `!` "remove from list"
|
||||
marker at the end. To represent the various operators in our language, we create
|
||||
the `Op` data type. Note that unlike Python, `++` (list concatenation) and
|
||||
`+` (addition) are different operators in our language.
|
||||
To be extra safe, let's use a fresh temporary variable
|
||||
every time we need to store something. The simplest
|
||||
way is to simply maintain a counter of how many temporary
|
||||
variables we've already used, and generate a new variable
|
||||
by prepending the word "temp" to that number. We start
|
||||
with `temp0`, then `temp1`, and so on. To keep a counter,
|
||||
we can use a state monad:
|
||||
|
||||
We then define valid expressions. Obviously, a variable (like `xs`), an
|
||||
integer literal (like `1`) and a list literal (like `[]`) are allowed.
|
||||
We also put in our selector, which consists of the expression on the
|
||||
left, the list of selector branches (`[Selector]`) and the expression
|
||||
of "what to actually do with the new variables". We also
|
||||
add `if`-expressions (like we discussed), and function calls. Lastly,
|
||||
we add binary operators like (`x+y`), the length operator (`|xs|`),
|
||||
and the list access operator (`xs[0]`). We also make `#0` a part
|
||||
of the expression syntax, even though it's only allowed inside
|
||||
a list access.
|
||||
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 269 269 >}}
|
||||
|
||||
Of course, we wouldn't want to write our language using
|
||||
Haskell. We want to actually write a text file, like `hw1.lang`,
|
||||
and then have our program translate that to Python. The first
|
||||
step to that is __parsing__: we need to turn our language text
|
||||
into the `Expr` structure we have.
|
||||
Don't worry about the `Map.Map String [String]`, we'll get to that in a bit.
|
||||
For now, all we have to worry about is the second element of the tuple,
|
||||
the integer counting how many temporary variables we've used. We can
|
||||
get the current temporary variable as follows:
|
||||
|
||||
#### Parsing
|
||||
We'll be using `Parsec` for parsing. `Parsec` is a parsing library
|
||||
based on
|
||||
{{< sidenote "right" "monad-note" "monadic" >}}
|
||||
Haskell is a language with more monad tutorials than
|
||||
programmers. For this reason, I will resist the temptation
|
||||
to explain what monads are. If you <em>don't</em> know
|
||||
what they are, don't worry, there are plenty of other resources.
|
||||
{{< /sidenote >}} parser combinators.
|
||||
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 271 274 >}}
|
||||
|
||||
We can also get a fresh temporary variable like this:
|
||||
|
||||
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 276 279 >}}
|
||||
|
||||
Now, the
|
||||
{{< sidenote "left" "" "code" >}}
|
||||
Since we are translating an expression, we must have the result of
|
||||
the translation yield an Python expression we can use in generating
|
||||
larger Python expressions. However, as we've seen, we occasionally
|
||||
have to use statements. Thus, the <code>translateExpr</code> function
|
||||
returns a <code>Translator ([Py.PyStmt], Py.PyExpr)</code>.
|
||||
{{< /sidenote >}}for generating a random list access looks like
|
||||
{{< sidenote "right" "ast-note" "this:" >}}
|
||||
The <code>Py.*</code> constructors are a part of a Python AST module I quickly
|
||||
threw together. I won't showcase it here, but you can always look at the
|
||||
source code for the blog (which includes this project)
|
||||
<a href="https://dev.danilafe.com/Web-Projects/blog-static">here</a>.
|
||||
{{< /sidenote >}}
|
||||
|
||||
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 364 369 >}}
|
||||
|
||||
##### Implementing "lazy evaluation"
|
||||
Lazy evaluation in functional programs usually arises from
|
||||
{{< sidenote "right" "graph-note" "graph reduction" >}}
|
||||
Graph reduction, more specifically the <em>Spineless,
|
||||
Tagless G-machine</em> is at the core of the Glasgow Haskell
|
||||
Compiler (GHC). Simon Peyton Jones' earlier book,
|
||||
<em>Implementing Functional Languages: a tutorial</em>
|
||||
details an earlier version of the G-machine.
|
||||
{{< /sidenote >}}. However, Python is neither
|
||||
functional nor graph-based, and we only lazily
|
||||
evaluate list selectors. Thus, we'll have to do
|
||||
some work to get our lazy evaluation to work as we desire.
|
||||
Here's what I came up with:
|
||||
|
||||
1. It's difficult to insert Python statements where they are
|
||||
needed: we'd have to figure out in which scope each variable
|
||||
has already been declared, and in which scope it's yet
|
||||
to be assigned.
|
||||
2. Instead, we can use a Python dictionary, called `cache`,
|
||||
and store computed versions of each variable in the cache.
|
||||
3. It's pretty difficult to check if a variable
|
||||
is in the cache, compute it if not, and then return the
|
||||
result of the computation, in one expression. This is
|
||||
true, unless that single expression is a function call, and we have a dedicated
|
||||
function that takes no arguments, computes the expression if needed,
|
||||
and uses the cache otherwise. We choose this route.
|
||||
4. We have already promised that we'd evaluate all the selected
|
||||
variables above a given variable before evaluating the variable
|
||||
itself. So, each function will first call (and therefore
|
||||
{{< sidenote "right" "force-note" "force" >}}
|
||||
{{< todo >}}Explain forcing{{< /todo >}}
|
||||
{{< /sidenote >}}) the functions
|
||||
generated for variables declared above the function's own variable.
|
||||
5. To keep track of all of this, we use the already-existing state monad
|
||||
as a reader monad (that is, we clear the changes we make to the monad
|
||||
after we're done translating the list selector). This is where the `Map.Map String [String]`
|
||||
comes from.
|
||||
|
||||
The `Map.Map String [String]` keeps track of variables that will be lazily computed,
|
||||
and also of the dependencies of each variable (the variables that need
|
||||
to be access before the variable itself). We compute such a map for
|
||||
each selector as follows:
|
||||
|
||||
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 337 337 >}}
|
||||
|
||||
We update the existing map using `Map.union`:
|
||||
|
||||
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 338 338 >}}
|
||||
|
||||
And, after we're done generating expressions in the body of this selector,
|
||||
we clear it to its previous value `vs`:
|
||||
|
||||
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 341 341 >}}
|
||||
|
||||
We generate a single selector as follows:
|
||||
|
||||
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 307 320 >}}
|
||||
|
||||
This generates a function definition statement, which we will examine in
|
||||
generated Python code later on.
|
||||
|
||||
Solving the problem this way also introduces another gotcha: sometimes,
|
||||
a variable is produced by a function call, and other times the variable
|
||||
is just a Python variable. We write this as follows:
|
||||
|
||||
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 322 327 >}}
|
||||
|
||||
##### Special Case Insertion
|
||||
This is a silly language for a single homework assignment. I'm not
|
||||
planning to implement Hindley-Milner type inference, or anything
|
||||
of that sort. For the purpose of this language, things will be
|
||||
either a list, or not a list. And as long as a function __can__ return
|
||||
a list, it can also return the list from its base case. Thus,
|
||||
that's all we will try to figure out. The checking code is so
|
||||
short that we can include the whole snippet at once:
|
||||
|
||||
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 258 266 >}}
|
||||
|
||||
`mergePossibleType`
|
||||
{{< sidenote "right" "bool-identity-note" "figures out" >}}
|
||||
An observant reader will note that this is just a logical
|
||||
OR function. It's not, however, good practice to use
|
||||
booleans for types that have two constructors with no arguments.
|
||||
Check out this <a href="https://programming-elm.com/blog/2019-05-20-solving-the-boolean-identity-crisis-part-1/">
|
||||
Elm-based article</a> about this, which the author calls the
|
||||
Boolean Identity Crisis.
|
||||
{{< /sidenote >}}, given two possible types for an
|
||||
expression, the final type for the expression.
|
||||
|
||||
There's only one real trick to this. Sometimes, like in
|
||||
`_search`, the only time we return something _known_ to be a list, that
|
||||
something is `xs`. Since we're making a list manipulation language,
|
||||
let's __assume the first argument to the function is a list__, and
|
||||
__use this information to determine expression types__. We guess
|
||||
types in a very basic manner otherwise: If you use the concatenation
|
||||
operator, or a list literal, then obviously we're working on a list.
|
||||
If you're returning the first argument of the function, that's also
|
||||
a list. Otherwise, it could be anything.
|
||||
|
||||
My Haskell linter actually suggested a pretty clever way of writing
|
||||
the whole "add a base case if this function returns a list" code.
|
||||
Check it out:
|
||||
|
||||
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 299 305 >}}
|
||||
|
||||
Specifically, look at the line with `let fastReturn = ...`. It
|
||||
uses a list comprehension: we take a parameter `p` from the list of
|
||||
parameter `ps`, but only produce the statements for the base case
|
||||
if the possible type computed using `p` is `List`.
|
||||
|
||||
### The Output
|
||||
What kind of beast have we created? Take a look for yourself:
|
||||
```Python
|
||||
def qselect(xs,k):
|
||||
if xs==[]:
|
||||
return xs
|
||||
cache = {}
|
||||
def pivot():
|
||||
if ("pivot") not in (cache):
|
||||
cache["pivot"] = xs.pop(0)
|
||||
return cache["pivot"]
|
||||
def left():
|
||||
def temp2(arg):
|
||||
out = []
|
||||
for arg0 in arg:
|
||||
if arg0<=pivot():
|
||||
out.append(arg0)
|
||||
return out
|
||||
pivot()
|
||||
if ("left") not in (cache):
|
||||
cache["left"] = temp2(xs)
|
||||
return cache["left"]
|
||||
def right():
|
||||
def temp3(arg):
|
||||
out = []
|
||||
for arg0 in arg:
|
||||
if arg0>pivot():
|
||||
out.append(arg0)
|
||||
return out
|
||||
left()
|
||||
pivot()
|
||||
if ("right") not in (cache):
|
||||
cache["right"] = temp3(xs)
|
||||
return cache["right"]
|
||||
if k>(len(left())+1):
|
||||
temp4 = qselect(right(), k-len(left())-1)
|
||||
else:
|
||||
if k==(len(left())+1):
|
||||
temp5 = [pivot()]
|
||||
else:
|
||||
temp5 = qselect(left(), k)
|
||||
temp4 = temp5
|
||||
return temp4
|
||||
def _search(xs,k):
|
||||
if xs==[]:
|
||||
return xs
|
||||
if xs[1]==k:
|
||||
temp6 = xs
|
||||
else:
|
||||
if xs[1]>k:
|
||||
temp8 = _search(xs[0], k)
|
||||
else:
|
||||
temp8 = _search(xs[2], k)
|
||||
temp6 = temp8
|
||||
return temp6
|
||||
def sorted(xs):
|
||||
if xs==[]:
|
||||
return xs
|
||||
return sorted(xs[0])+[xs[1]]+sorted(xs[2])
|
||||
def search(xs,k):
|
||||
return len(_search(xs, k))!=0
|
||||
def insert(xs,k):
|
||||
return _insert(k, _search(xs, k))
|
||||
def _insert(k,xs):
|
||||
if k==[]:
|
||||
return k
|
||||
if len(xs)==0:
|
||||
temp16 = xs
|
||||
temp16.append([])
|
||||
temp17 = temp16
|
||||
temp17.append(k)
|
||||
temp18 = temp17
|
||||
temp18.append([])
|
||||
temp15 = temp18
|
||||
else:
|
||||
temp15 = xs
|
||||
return temp15
|
||||
```
|
||||
It's...horrible! All the `tempX` variables, __three layers of nested function declarations__, hardcoded cache access. This is not something you'd ever want to write.
|
||||
Even to get this code, I had to come up with hacks __in a language I created__.
|
||||
The first is the hack is to make the `qselect` function use the `xs == []` base
|
||||
case. This doesn't happen by default, because `qselect` doesn't return a list!
|
||||
To "fix" this, I made `qselect` return the number it found, wrapped in a
|
||||
list literal. This is not up to spec, and would require another function
|
||||
to unwrap this list.
|
||||
|
||||
While `qselect` was struggling with not having the base case, `insert` had
|
||||
a base case it didn't need: `insert` shouldn't return the list itself
|
||||
when it's empty, it should insert into it! However, when we use the `<<`
|
||||
list insertion operator, the language infers `insert` to be a list-returning
|
||||
function itself, inserting into an empty list will always fail. So, we
|
||||
make a function `_insert`, which __takes the arguments in reverse__.
|
||||
The base case will still be generated, but the first argument (against
|
||||
which the base case is checked) will be a number, so the `k == []` check
|
||||
will always fail.
|
||||
|
||||
That concludes this post. I'll be working on more solutions to homework
|
||||
assignments in self-made languages, so keep an eye out!
|
||||
|
Loading…
Reference in New Issue
Block a user