Add the first post in CS325 series
This commit is contained in:
parent
a406fb0846
commit
19aa126025
|
@ -17,4 +17,3 @@ sorted(xs) = sorted(xs[0]) ++ [xs[1]] ++ sorted(xs[2]);
|
||||||
search(xs, k) = |_search(xs, k)| != 0;
|
search(xs, k) = |_search(xs, k)| != 0;
|
||||||
insert(xs, k) = _insert(k, _search(xs, k));
|
insert(xs, k) = _insert(k, _search(xs, k));
|
||||||
_insert(k, xs) = if |xs| == 0 then xs << [] << k << [] else xs
|
_insert(k, xs) = if |xs| == 0 then xs << [] << k << [] else xs
|
||||||
|
|
||||||
|
|
|
@ -270,14 +270,13 @@ type Translator = Control.Monad.State.State (Map.Map String [String], Int)
|
||||||
|
|
||||||
currentTemp :: Translator String
|
currentTemp :: Translator String
|
||||||
currentTemp = do
|
currentTemp = do
|
||||||
(_, t) <- get
|
t <- gets snd
|
||||||
return $ "temp" ++ show t
|
return $ "temp" ++ show t
|
||||||
|
|
||||||
incrementTemp :: Translator String
|
incrementTemp :: Translator String
|
||||||
incrementTemp = do
|
incrementTemp = do
|
||||||
(vs, t) <- get
|
modify (second (+1))
|
||||||
put (vs, t+1)
|
currentTemp
|
||||||
return $ "temp" ++ show t
|
|
||||||
|
|
||||||
hasLambda :: Expr -> Bool
|
hasLambda :: Expr -> Bool
|
||||||
hasLambda (ListLiteral es) = any hasLambda es
|
hasLambda (ListLiteral es) = any hasLambda es
|
||||||
|
|
|
@ -1,7 +1,6 @@
|
||||||
---
|
---
|
||||||
title: A Language for an Assignment - Homework 1
|
title: A Language for an Assignment - Homework 1
|
||||||
date: 2019-12-27T23:27:09-08:00
|
date: 2019-12-27T23:27:09-08:00
|
||||||
draft: true
|
|
||||||
tags: ["Haskell", "Python", "Algorithms"]
|
tags: ["Haskell", "Python", "Algorithms"]
|
||||||
---
|
---
|
||||||
|
|
||||||
|
@ -32,7 +31,7 @@ in our Programming Languages class. So the final goal ended up:
|
||||||
|
|
||||||
It may not be worth it to create a whole
|
It may not be worth it to create a whole
|
||||||
{{< sidenote "right" "general-purpose-note" "general-purpose" >}}
|
{{< sidenote "right" "general-purpose-note" "general-purpose" >}}
|
||||||
A general purpose language is one that's designed to be used in vairous
|
A general purpose language is one that's designed to be used in various
|
||||||
domains. For instance, C++ is a general-purpose language because it can
|
domains. For instance, C++ is a general-purpose language because it can
|
||||||
be used for embedded systems, GUI programs, and pretty much anything else.
|
be used for embedded systems, GUI programs, and pretty much anything else.
|
||||||
This is in contrast to a domain-specific language, such as Game Maker Language,
|
This is in contrast to a domain-specific language, such as Game Maker Language,
|
||||||
|
@ -41,7 +40,7 @@ which is aimed at a much narrower set of uses.
|
||||||
but nowhere in the challenge did we say that it had to be general-purpose. In
|
but nowhere in the challenge did we say that it had to be general-purpose. In
|
||||||
fact, some interesting design thinking can go into designing a domain-specific
|
fact, some interesting design thinking can go into designing a domain-specific
|
||||||
language for a particular assignment. So let's jump right into it, and make
|
language for a particular assignment. So let's jump right into it, and make
|
||||||
a language for the the first homework assignment.
|
a language for the first homework assignment.
|
||||||
|
|
||||||
### Homework 1
|
### Homework 1
|
||||||
There are two problems in Homework 1. Here they are, verbatim:
|
There are two problems in Homework 1. Here they are, verbatim:
|
||||||
|
@ -95,7 +94,7 @@ C++ optimizes the <a href="https://godbolt.org/z/3skK9j">Collatz Conjecture func
|
||||||
Clang doesn't know whether or not the function will terminate (whether the Collatz Conjecture
|
Clang doesn't know whether or not the function will terminate (whether the Collatz Conjecture
|
||||||
function terminates is an <a href="https://en.wikipedia.org/wiki/Collatz_conjecture">unsolved problem</a>),
|
function terminates is an <a href="https://en.wikipedia.org/wiki/Collatz_conjecture">unsolved problem</a>),
|
||||||
but functions that don't terminate are undefined behavior. There's only one other way the function
|
but functions that don't terminate are undefined behavior. There's only one other way the function
|
||||||
returns, and that's with "1". Thus, clang optimzes the entire function to a single "return 1" call.
|
returns, and that's with "1". Thus, clang optimizes the entire function to a single "return 1" call.
|
||||||
{{< /sidenote >}} in C++:
|
{{< /sidenote >}} in C++:
|
||||||
we can do whatever we want. So, let's allow it to return `[]` in the `None` case.
|
we can do whatever we want. So, let's allow it to return `[]` in the `None` case.
|
||||||
This makes this base case valid:
|
This makes this base case valid:
|
||||||
|
@ -240,42 +239,271 @@ lazily evaluated, ordered expressions. The whole `qselect` becomes:
|
||||||
We've now figured out all the language constructs. Let's start working on
|
We've now figured out all the language constructs. Let's start working on
|
||||||
some implementation!
|
some implementation!
|
||||||
|
|
||||||
#### Data Definitions
|
#### Implementation
|
||||||
Let's start with defining the AST and other data types for our language:
|
It would be silly of me to explain every detail of creating a language in Haskell
|
||||||
|
in this post; this is neither the purpose of the post, nor is it plausible
|
||||||
|
to do this without covering monads, parser combinators, grammars, abstract syntax
|
||||||
|
trees, and more. So, instead, I'll discuss the _interesting_ parts of the
|
||||||
|
implementation.
|
||||||
|
|
||||||
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 14 52 >}}
|
##### Temporary Variables
|
||||||
|
Our language is expression-based, yes. A function is a single,
|
||||||
|
arbitrarily complex expression (involving `if/else`, list
|
||||||
|
selectors, and more). So it would make sense to translate
|
||||||
|
a function to a single, arbitrarily complex Python expression.
|
||||||
|
However, the way we've designed our language makes it
|
||||||
|
not-so-suitable for converting to a single expression! For
|
||||||
|
instance, consider `xs[rand]`. We need to compute the list,
|
||||||
|
get its length, generate a random number, and then access
|
||||||
|
the corresponding element in the list. We use the list
|
||||||
|
here twice, and simply repeating the expression would not
|
||||||
|
be very smart: we'd be evaluating twice. So instead,
|
||||||
|
we'll use a variable, assign the list to that variable,
|
||||||
|
and then access that variable multiple times.
|
||||||
|
|
||||||
The `PossibleType` class will be used when we figure out if a function returns
|
To be extra safe, let's use a fresh temporary variable
|
||||||
a list or not, for our base case insertion rule. The `Selector` type
|
every time we need to store something. The simplest
|
||||||
will hold a single line in the list selector we defined earlier, and
|
way is to simply maintain a counter of how many temporary
|
||||||
the `SelectorMarker` will indicate if the user added the `!` "remove from list"
|
variables we've already used, and generate a new variable
|
||||||
marker at the end. To represent the various operators in our language, we create
|
by prepending the word "temp" to that number. We start
|
||||||
the `Op` data type. Note that unlike Python, `++` (list concatenation) and
|
with `temp0`, then `temp1`, and so on. To keep a counter,
|
||||||
`+` (addition) are different operators in our language.
|
we can use a state monad:
|
||||||
|
|
||||||
We then define valid expressions. Obviously, a variable (like `xs`), an
|
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 269 269 >}}
|
||||||
integer literal (like `1`) and a list literal (like `[]`) are allowed.
|
|
||||||
We also put in our selector, which consists of the expression on the
|
|
||||||
left, the list of selector branches (`[Selector]`) and the expression
|
|
||||||
of "what to actually do with the new variables". We also
|
|
||||||
add `if`-expressions (like we discussed), and function calls. Lastly,
|
|
||||||
we add binary operators like (`x+y`), the length operator (`|xs|`),
|
|
||||||
and the list access operator (`xs[0]`). We also make `#0` a part
|
|
||||||
of the expression syntax, even though it's only allowed inside
|
|
||||||
a list access.
|
|
||||||
|
|
||||||
Of course, we wouldn't want to write our language using
|
Don't worry about the `Map.Map String [String]`, we'll get to that in a bit.
|
||||||
Haskell. We want to actually write a text file, like `hw1.lang`,
|
For now, all we have to worry about is the second element of the tuple,
|
||||||
and then have our program translate that to Python. The first
|
the integer counting how many temporary variables we've used. We can
|
||||||
step to that is __parsing__: we need to turn our language text
|
get the current temporary variable as follows:
|
||||||
into the `Expr` structure we have.
|
|
||||||
|
|
||||||
#### Parsing
|
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 271 274 >}}
|
||||||
We'll be using `Parsec` for parsing. `Parsec` is a parsing library
|
|
||||||
based on
|
We can also get a fresh temporary variable like this:
|
||||||
{{< sidenote "right" "monad-note" "monadic" >}}
|
|
||||||
Haskell is a language with more monad tutorials than
|
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 276 279 >}}
|
||||||
programmers. For this reason, I will resist the temptation
|
|
||||||
to explain what monads are. If you <em>don't</em> know
|
Now, the
|
||||||
what they are, don't worry, there are plenty of other resources.
|
{{< sidenote "left" "" "code" >}}
|
||||||
{{< /sidenote >}} parser combinators.
|
Since we are translating an expression, we must have the result of
|
||||||
|
the translation yield an Python expression we can use in generating
|
||||||
|
larger Python expressions. However, as we've seen, we occasionally
|
||||||
|
have to use statements. Thus, the <code>translateExpr</code> function
|
||||||
|
returns a <code>Translator ([Py.PyStmt], Py.PyExpr)</code>.
|
||||||
|
{{< /sidenote >}}for generating a random list access looks like
|
||||||
|
{{< sidenote "right" "ast-note" "this:" >}}
|
||||||
|
The <code>Py.*</code> constructors are a part of a Python AST module I quickly
|
||||||
|
threw together. I won't showcase it here, but you can always look at the
|
||||||
|
source code for the blog (which includes this project)
|
||||||
|
<a href="https://dev.danilafe.com/Web-Projects/blog-static">here</a>.
|
||||||
|
{{< /sidenote >}}
|
||||||
|
|
||||||
|
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 364 369 >}}
|
||||||
|
|
||||||
|
##### Implementing "lazy evaluation"
|
||||||
|
Lazy evaluation in functional programs usually arises from
|
||||||
|
{{< sidenote "right" "graph-note" "graph reduction" >}}
|
||||||
|
Graph reduction, more specifically the <em>Spineless,
|
||||||
|
Tagless G-machine</em> is at the core of the Glasgow Haskell
|
||||||
|
Compiler (GHC). Simon Peyton Jones' earlier book,
|
||||||
|
<em>Implementing Functional Languages: a tutorial</em>
|
||||||
|
details an earlier version of the G-machine.
|
||||||
|
{{< /sidenote >}}. However, Python is neither
|
||||||
|
functional nor graph-based, and we only lazily
|
||||||
|
evaluate list selectors. Thus, we'll have to do
|
||||||
|
some work to get our lazy evaluation to work as we desire.
|
||||||
|
Here's what I came up with:
|
||||||
|
|
||||||
|
1. It's difficult to insert Python statements where they are
|
||||||
|
needed: we'd have to figure out in which scope each variable
|
||||||
|
has already been declared, and in which scope it's yet
|
||||||
|
to be assigned.
|
||||||
|
2. Instead, we can use a Python dictionary, called `cache`,
|
||||||
|
and store computed versions of each variable in the cache.
|
||||||
|
3. It's pretty difficult to check if a variable
|
||||||
|
is in the cache, compute it if not, and then return the
|
||||||
|
result of the computation, in one expression. This is
|
||||||
|
true, unless that single expression is a function call, and we have a dedicated
|
||||||
|
function that takes no arguments, computes the expression if needed,
|
||||||
|
and uses the cache otherwise. We choose this route.
|
||||||
|
4. We have already promised that we'd evaluate all the selected
|
||||||
|
variables above a given variable before evaluating the variable
|
||||||
|
itself. So, each function will first call (and therefore
|
||||||
|
{{< sidenote "right" "force-note" "force" >}}
|
||||||
|
{{< todo >}}Explain forcing{{< /todo >}}
|
||||||
|
{{< /sidenote >}}) the functions
|
||||||
|
generated for variables declared above the function's own variable.
|
||||||
|
5. To keep track of all of this, we use the already-existing state monad
|
||||||
|
as a reader monad (that is, we clear the changes we make to the monad
|
||||||
|
after we're done translating the list selector). This is where the `Map.Map String [String]`
|
||||||
|
comes from.
|
||||||
|
|
||||||
|
The `Map.Map String [String]` keeps track of variables that will be lazily computed,
|
||||||
|
and also of the dependencies of each variable (the variables that need
|
||||||
|
to be access before the variable itself). We compute such a map for
|
||||||
|
each selector as follows:
|
||||||
|
|
||||||
|
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 337 337 >}}
|
||||||
|
|
||||||
|
We update the existing map using `Map.union`:
|
||||||
|
|
||||||
|
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 338 338 >}}
|
||||||
|
|
||||||
|
And, after we're done generating expressions in the body of this selector,
|
||||||
|
we clear it to its previous value `vs`:
|
||||||
|
|
||||||
|
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 341 341 >}}
|
||||||
|
|
||||||
|
We generate a single selector as follows:
|
||||||
|
|
||||||
|
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 307 320 >}}
|
||||||
|
|
||||||
|
This generates a function definition statement, which we will examine in
|
||||||
|
generated Python code later on.
|
||||||
|
|
||||||
|
Solving the problem this way also introduces another gotcha: sometimes,
|
||||||
|
a variable is produced by a function call, and other times the variable
|
||||||
|
is just a Python variable. We write this as follows:
|
||||||
|
|
||||||
|
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 322 327 >}}
|
||||||
|
|
||||||
|
##### Special Case Insertion
|
||||||
|
This is a silly language for a single homework assignment. I'm not
|
||||||
|
planning to implement Hindley-Milner type inference, or anything
|
||||||
|
of that sort. For the purpose of this language, things will be
|
||||||
|
either a list, or not a list. And as long as a function __can__ return
|
||||||
|
a list, it can also return the list from its base case. Thus,
|
||||||
|
that's all we will try to figure out. The checking code is so
|
||||||
|
short that we can include the whole snippet at once:
|
||||||
|
|
||||||
|
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 258 266 >}}
|
||||||
|
|
||||||
|
`mergePossibleType`
|
||||||
|
{{< sidenote "right" "bool-identity-note" "figures out" >}}
|
||||||
|
An observant reader will note that this is just a logical
|
||||||
|
OR function. It's not, however, good practice to use
|
||||||
|
booleans for types that have two constructors with no arguments.
|
||||||
|
Check out this <a href="https://programming-elm.com/blog/2019-05-20-solving-the-boolean-identity-crisis-part-1/">
|
||||||
|
Elm-based article</a> about this, which the author calls the
|
||||||
|
Boolean Identity Crisis.
|
||||||
|
{{< /sidenote >}}, given two possible types for an
|
||||||
|
expression, the final type for the expression.
|
||||||
|
|
||||||
|
There's only one real trick to this. Sometimes, like in
|
||||||
|
`_search`, the only time we return something _known_ to be a list, that
|
||||||
|
something is `xs`. Since we're making a list manipulation language,
|
||||||
|
let's __assume the first argument to the function is a list__, and
|
||||||
|
__use this information to determine expression types__. We guess
|
||||||
|
types in a very basic manner otherwise: If you use the concatenation
|
||||||
|
operator, or a list literal, then obviously we're working on a list.
|
||||||
|
If you're returning the first argument of the function, that's also
|
||||||
|
a list. Otherwise, it could be anything.
|
||||||
|
|
||||||
|
My Haskell linter actually suggested a pretty clever way of writing
|
||||||
|
the whole "add a base case if this function returns a list" code.
|
||||||
|
Check it out:
|
||||||
|
|
||||||
|
{{< codelines "Haskell" "cs325-langs/src/LanguageOne.hs" 299 305 >}}
|
||||||
|
|
||||||
|
Specifically, look at the line with `let fastReturn = ...`. It
|
||||||
|
uses a list comprehension: we take a parameter `p` from the list of
|
||||||
|
parameter `ps`, but only produce the statements for the base case
|
||||||
|
if the possible type computed using `p` is `List`.
|
||||||
|
|
||||||
|
### The Output
|
||||||
|
What kind of beast have we created? Take a look for yourself:
|
||||||
|
```Python
|
||||||
|
def qselect(xs,k):
|
||||||
|
if xs==[]:
|
||||||
|
return xs
|
||||||
|
cache = {}
|
||||||
|
def pivot():
|
||||||
|
if ("pivot") not in (cache):
|
||||||
|
cache["pivot"] = xs.pop(0)
|
||||||
|
return cache["pivot"]
|
||||||
|
def left():
|
||||||
|
def temp2(arg):
|
||||||
|
out = []
|
||||||
|
for arg0 in arg:
|
||||||
|
if arg0<=pivot():
|
||||||
|
out.append(arg0)
|
||||||
|
return out
|
||||||
|
pivot()
|
||||||
|
if ("left") not in (cache):
|
||||||
|
cache["left"] = temp2(xs)
|
||||||
|
return cache["left"]
|
||||||
|
def right():
|
||||||
|
def temp3(arg):
|
||||||
|
out = []
|
||||||
|
for arg0 in arg:
|
||||||
|
if arg0>pivot():
|
||||||
|
out.append(arg0)
|
||||||
|
return out
|
||||||
|
left()
|
||||||
|
pivot()
|
||||||
|
if ("right") not in (cache):
|
||||||
|
cache["right"] = temp3(xs)
|
||||||
|
return cache["right"]
|
||||||
|
if k>(len(left())+1):
|
||||||
|
temp4 = qselect(right(), k-len(left())-1)
|
||||||
|
else:
|
||||||
|
if k==(len(left())+1):
|
||||||
|
temp5 = [pivot()]
|
||||||
|
else:
|
||||||
|
temp5 = qselect(left(), k)
|
||||||
|
temp4 = temp5
|
||||||
|
return temp4
|
||||||
|
def _search(xs,k):
|
||||||
|
if xs==[]:
|
||||||
|
return xs
|
||||||
|
if xs[1]==k:
|
||||||
|
temp6 = xs
|
||||||
|
else:
|
||||||
|
if xs[1]>k:
|
||||||
|
temp8 = _search(xs[0], k)
|
||||||
|
else:
|
||||||
|
temp8 = _search(xs[2], k)
|
||||||
|
temp6 = temp8
|
||||||
|
return temp6
|
||||||
|
def sorted(xs):
|
||||||
|
if xs==[]:
|
||||||
|
return xs
|
||||||
|
return sorted(xs[0])+[xs[1]]+sorted(xs[2])
|
||||||
|
def search(xs,k):
|
||||||
|
return len(_search(xs, k))!=0
|
||||||
|
def insert(xs,k):
|
||||||
|
return _insert(k, _search(xs, k))
|
||||||
|
def _insert(k,xs):
|
||||||
|
if k==[]:
|
||||||
|
return k
|
||||||
|
if len(xs)==0:
|
||||||
|
temp16 = xs
|
||||||
|
temp16.append([])
|
||||||
|
temp17 = temp16
|
||||||
|
temp17.append(k)
|
||||||
|
temp18 = temp17
|
||||||
|
temp18.append([])
|
||||||
|
temp15 = temp18
|
||||||
|
else:
|
||||||
|
temp15 = xs
|
||||||
|
return temp15
|
||||||
|
```
|
||||||
|
It's...horrible! All the `tempX` variables, __three layers of nested function declarations__, hardcoded cache access. This is not something you'd ever want to write.
|
||||||
|
Even to get this code, I had to come up with hacks __in a language I created__.
|
||||||
|
The first is the hack is to make the `qselect` function use the `xs == []` base
|
||||||
|
case. This doesn't happen by default, because `qselect` doesn't return a list!
|
||||||
|
To "fix" this, I made `qselect` return the number it found, wrapped in a
|
||||||
|
list literal. This is not up to spec, and would require another function
|
||||||
|
to unwrap this list.
|
||||||
|
|
||||||
|
While `qselect` was struggling with not having the base case, `insert` had
|
||||||
|
a base case it didn't need: `insert` shouldn't return the list itself
|
||||||
|
when it's empty, it should insert into it! However, when we use the `<<`
|
||||||
|
list insertion operator, the language infers `insert` to be a list-returning
|
||||||
|
function itself, inserting into an empty list will always fail. So, we
|
||||||
|
make a function `_insert`, which __takes the arguments in reverse__.
|
||||||
|
The base case will still be generated, but the first argument (against
|
||||||
|
which the base case is checked) will be a number, so the `k == []` check
|
||||||
|
will always fail.
|
||||||
|
|
||||||
|
That concludes this post. I'll be working on more solutions to homework
|
||||||
|
assignments in self-made languages, so keep an eye out!
|
||||||
|
|
Loading…
Reference in New Issue
Block a user