Add more content to post 12 draft.

This commit is contained in:
Danila Fedorin 2020-06-16 23:32:09 -07:00
parent ad1946e9fb
commit 9c4d7c514f
1 changed files with 185 additions and 0 deletions

View File

@ -107,3 +107,188 @@ This is true, but why should we perform transformations on a malformed program?
{{< /sidenote >}} and can be transformed into a sequence of instructions just like any other global function. It has been pulled from its `where` (which, by the way, is pretty much equivalent to a `let/in`) to the top level.
This technique of replacing captured variables with arguments, and pulling closures into the global scope to aid compilation, is called [Lambda Lifting](https://en.wikipedia.org/wiki/Lambda_lifting). Its name is no coincidence - lambda functions need to undergo the same kind of transformation as our nested definitions (unlike nested definitions, though, lambda functions need to be named). This is why they are included in this post together with `let/in`!
### Implementation
Now that we understand what we have to do, it's time to jump straight into
doing it. First, we need to refactor our current code so allow for the changes
we're going to make; then, we can implement `let/in` expressions; finally,
we'll tackle lambda functions.
#### Infrastructure Changes
When finding captured variables, the notion of _free variables_ once again
becomes important. Recall that a free variable in an expression is a variable
that is defined outside of that expression. Consider, for example, the
expression:
```Haskell
let x = 5 in x + y
```
In this expression, `x` is _not_ a free variable, since it's defined
in the `let/in` expression. On the other hand, `y` _is_ a free variable,
since it's not defined locally.
The algorithm that we used for computing free variables was rather biased.
Previously, we only cared about the difference between a local variable
(defined somewhere in a function's body, or referring to one of the function's
parameters) and a global variable (referring to a function name). This shows in
our code for `find_free`. Consider, for example, this segment:
{{< codelines "C++" "compiler/11/ast.cpp" 33 36 >}}
We created bindings in our type environment whenever we saw a new variable
being introduced, which led us to only count variables that we did not bind
_anywhere_ as 'free'. This approach is no longer sufficient. Consider,
for example, the following Haskell code:
```Haskell {linenos=table}
someFunction x =
let
y = x + 5
in
x*y
```
We can see that the variable `x` is introduced on line 1.
Thus, our current algorithm will happily store `x` in an environment,
and not count it as free. But clearly, the definition of `y` on line 3
captures `x`! If we were to lift `y` into global scope, we would need
to pass `x` to it as an argument. To fix this, we have to separate the creation
and assignment of type environments from free variable detection. Why
don't we start with `ast` and its descendants? Our signatures become:
```C++
void ast::find_free(std::set<std::string>& into);
type_ptr ast::typecheck(type_mgr& mgr, type_env_ptr& env);
```
For the most part, the code remains unchanged. We avoid
using `env` (and `this->env`), and default to marking
any variable as a free variable:
{{< codelines "C++" "compiler/12/ast.cpp" 39 41 >}}
Since we no longer use the environment, we resort to an
alternative method of removing bound variables. Here's
`ast_case::find_free`:
{{< codelines "C++" "compiler/12/ast.cpp" 169 181 >}}
For each branch, we find the free variables. However, we
want to avoid marking variables that were introduced through
pattern matching as free (they are not). Thus, we use `pattern::find_variables`
to see which of the variables were bound by that pattern,
and remove them from the list of free variables. We
can then safely add the list of free variables in the pattern to the overall
list of free variables. Other `ast` descendants experience largely
cosmetic changes (such as the removal of the `env` parameter).
Of course, we must implement `find_variables` for each of our `pattern`
subclasses. Here's what I got for `pattern_var`:
{{< codelines "C++" "compiler/12/ast.cpp" 402 404 >}}
And here's an equally terse implementation for `pattern_constr`:
{{< codelines "C++" "compiler/12/ast.cpp" 417 419 >}}
We also want to update `definition_defn` with this change. Our signatures
become:
```C++
void definition_defn::find_free();
void definition_defn::insert_types(type_mgr& mgr, type_env_ptr& env, visibility v);
```
We'll get to the `visiblity` parameter later. The implementations
are fairly simple. Just like `ast_case`, we want to erase each function's
parameters from its list of free variables:
{{< codelines "C++" "compiler/12/definition.cpp" 13 18 >}}
Since `find_free` no longer creates any type bindings or environments,
this functionality is shouldered by `insert_types`:
{{< codelines "C++" "compiler/12/definition.cpp" 20 32 >}}
Now that free variables are properly computed, we are able to move on
to bigger and better things.
#### Nested Definitions
At present, our code for typechecking the whole program is located in
`main.cpp`:
{{< codelines "C++" "compiler/11/main.cpp" 43 61 >}}
This piece of code goes on. We now want this to be more general. Soon, `let/in`
expressions with bring with them definitions that are inside other definitions,
which will not be reachable at the top level. The fundamental topological
sorting algorithm, though, will remain the same. We can abstract a series
of definitions that need to be ordered and then typechecked into a new struct,
`definition_group`:
{{< codelines "C++" "compiler/12/definition.hpp" 73 83 >}}
This will be exactly like a list of `defn`/`data` definitions we have at the
top level, except now, it can also occur in other places, like `let/in`
expressions. Once again, ignore for the moment the `visibility` field.
The way we defined function ordering requires some extra work from
`definition_group`. Recall that conceptually, functions can only depend
on other functions defined in the same `let/in` expression, or, more generally,
in the same `definition_group`. This means that we now classify free variables
in definitions into two categories: free variables that refer to "nearby"
definitions (i.e. definitions in the same group) and free variables that refer
to "far away" definitions. The "nearby" variables will be used to do
topological ordering, while the "far away" variables can be passed along
further up, perhaps into an enclosing `let/in` expression (for which "nearby"
variables aren't actually free, since they are bound in the `let`). We
implement this partitioning of variables in `definition_group::find_free`:
{{< codelines "C++" "compiler/12/definition.cpp" 94 105 >}}
Notice that we have added a new `nearby_variables` field to `definition_defn`.
This is used on line 101, and will be once again used in `definition_group::typecheck`. Speaking of `typecheck`, let's look at its definition:
{{< codelines "C++" "compiler/12/definition.cpp" 107 145 >}}
This function is a little long, but conceptually, each `for` loop
contains a step of the process:
* The first loop declares all data types, so that constructors can
be verified to properly reference them.
* The second loop creates all the data type constructors.
* The third loop adds edges to our dependency graph.
* The fourth loop performs typechecking on the now-ordered groups of mutually
recursive functions.
* The first inner loop inserts the types of all the functions into the environment.
* The second inner loop actually performs typechecking.
* The third inner loop makes as many things polymorphic as possible.
We can now adjust our `parser.y` to use a `definition_group` instead of
two global vectors. First, we declare a global `definition_group`:
{{< codelines "C++" "compiler/12/parser.y" 10 10 >}}
Then, we adjust `definitions` to create `definition_group`s:
{{< codelines "text" "compiler/12/parser.y" 59 68 >}}
We can now adjust `main.cpp` to use the global `definition_group`. Among
other changes (such as removing `extern` references to `vector`s, and updating
function signatures) we also update the `typecheck_program` function:
{{< codelines "C++" "compiler/12/main.cpp" 41 49 >}}
Now, our code is ready for typechecking nested definitions, but not for
compiling them. The main thing that we still have to address is the addition
of new definitions to the global scope. Let's take a look at that next.
#### Global Definitions
We want every function, regardless of whether or not it was declared in the global
scope, to be processed and converted to LLVM code. The LLVM code conversion
takes several steps. First, the function's AST is translated into G-machine
instructions, which we covered in [part 5]({{< relref "05_compiler_execution.md" >}}),
by a process we covered in [part 6]({{< relref "06_compiler_compilation.md" >}}).
Then, an LLVM function is created for every function, and registered globally.
Finally, the G-machine instructions are converted