Add more content to post 12 draft.
This commit is contained in:
parent
ad1946e9fb
commit
9c4d7c514f
|
@ -107,3 +107,188 @@ This is true, but why should we perform transformations on a malformed program?
|
||||||
{{< /sidenote >}} and can be transformed into a sequence of instructions just like any other global function. It has been pulled from its `where` (which, by the way, is pretty much equivalent to a `let/in`) to the top level.
|
{{< /sidenote >}} and can be transformed into a sequence of instructions just like any other global function. It has been pulled from its `where` (which, by the way, is pretty much equivalent to a `let/in`) to the top level.
|
||||||
|
|
||||||
This technique of replacing captured variables with arguments, and pulling closures into the global scope to aid compilation, is called [Lambda Lifting](https://en.wikipedia.org/wiki/Lambda_lifting). Its name is no coincidence - lambda functions need to undergo the same kind of transformation as our nested definitions (unlike nested definitions, though, lambda functions need to be named). This is why they are included in this post together with `let/in`!
|
This technique of replacing captured variables with arguments, and pulling closures into the global scope to aid compilation, is called [Lambda Lifting](https://en.wikipedia.org/wiki/Lambda_lifting). Its name is no coincidence - lambda functions need to undergo the same kind of transformation as our nested definitions (unlike nested definitions, though, lambda functions need to be named). This is why they are included in this post together with `let/in`!
|
||||||
|
|
||||||
|
### Implementation
|
||||||
|
Now that we understand what we have to do, it's time to jump straight into
|
||||||
|
doing it. First, we need to refactor our current code so allow for the changes
|
||||||
|
we're going to make; then, we can implement `let/in` expressions; finally,
|
||||||
|
we'll tackle lambda functions.
|
||||||
|
|
||||||
|
#### Infrastructure Changes
|
||||||
|
When finding captured variables, the notion of _free variables_ once again
|
||||||
|
becomes important. Recall that a free variable in an expression is a variable
|
||||||
|
that is defined outside of that expression. Consider, for example, the
|
||||||
|
expression:
|
||||||
|
|
||||||
|
```Haskell
|
||||||
|
let x = 5 in x + y
|
||||||
|
```
|
||||||
|
|
||||||
|
In this expression, `x` is _not_ a free variable, since it's defined
|
||||||
|
in the `let/in` expression. On the other hand, `y` _is_ a free variable,
|
||||||
|
since it's not defined locally.
|
||||||
|
|
||||||
|
The algorithm that we used for computing free variables was rather biased.
|
||||||
|
Previously, we only cared about the difference between a local variable
|
||||||
|
(defined somewhere in a function's body, or referring to one of the function's
|
||||||
|
parameters) and a global variable (referring to a function name). This shows in
|
||||||
|
our code for `find_free`. Consider, for example, this segment:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/11/ast.cpp" 33 36 >}}
|
||||||
|
|
||||||
|
We created bindings in our type environment whenever we saw a new variable
|
||||||
|
being introduced, which led us to only count variables that we did not bind
|
||||||
|
_anywhere_ as 'free'. This approach is no longer sufficient. Consider,
|
||||||
|
for example, the following Haskell code:
|
||||||
|
|
||||||
|
```Haskell {linenos=table}
|
||||||
|
someFunction x =
|
||||||
|
let
|
||||||
|
y = x + 5
|
||||||
|
in
|
||||||
|
x*y
|
||||||
|
```
|
||||||
|
|
||||||
|
We can see that the variable `x` is introduced on line 1.
|
||||||
|
Thus, our current algorithm will happily store `x` in an environment,
|
||||||
|
and not count it as free. But clearly, the definition of `y` on line 3
|
||||||
|
captures `x`! If we were to lift `y` into global scope, we would need
|
||||||
|
to pass `x` to it as an argument. To fix this, we have to separate the creation
|
||||||
|
and assignment of type environments from free variable detection. Why
|
||||||
|
don't we start with `ast` and its descendants? Our signatures become:
|
||||||
|
|
||||||
|
```C++
|
||||||
|
void ast::find_free(std::set<std::string>& into);
|
||||||
|
type_ptr ast::typecheck(type_mgr& mgr, type_env_ptr& env);
|
||||||
|
```
|
||||||
|
|
||||||
|
For the most part, the code remains unchanged. We avoid
|
||||||
|
using `env` (and `this->env`), and default to marking
|
||||||
|
any variable as a free variable:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/ast.cpp" 39 41 >}}
|
||||||
|
|
||||||
|
Since we no longer use the environment, we resort to an
|
||||||
|
alternative method of removing bound variables. Here's
|
||||||
|
`ast_case::find_free`:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/ast.cpp" 169 181 >}}
|
||||||
|
|
||||||
|
For each branch, we find the free variables. However, we
|
||||||
|
want to avoid marking variables that were introduced through
|
||||||
|
pattern matching as free (they are not). Thus, we use `pattern::find_variables`
|
||||||
|
to see which of the variables were bound by that pattern,
|
||||||
|
and remove them from the list of free variables. We
|
||||||
|
can then safely add the list of free variables in the pattern to the overall
|
||||||
|
list of free variables. Other `ast` descendants experience largely
|
||||||
|
cosmetic changes (such as the removal of the `env` parameter).
|
||||||
|
|
||||||
|
Of course, we must implement `find_variables` for each of our `pattern`
|
||||||
|
subclasses. Here's what I got for `pattern_var`:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/ast.cpp" 402 404 >}}
|
||||||
|
|
||||||
|
And here's an equally terse implementation for `pattern_constr`:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/ast.cpp" 417 419 >}}
|
||||||
|
|
||||||
|
We also want to update `definition_defn` with this change. Our signatures
|
||||||
|
become:
|
||||||
|
|
||||||
|
```C++
|
||||||
|
void definition_defn::find_free();
|
||||||
|
void definition_defn::insert_types(type_mgr& mgr, type_env_ptr& env, visibility v);
|
||||||
|
```
|
||||||
|
|
||||||
|
We'll get to the `visiblity` parameter later. The implementations
|
||||||
|
are fairly simple. Just like `ast_case`, we want to erase each function's
|
||||||
|
parameters from its list of free variables:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/definition.cpp" 13 18 >}}
|
||||||
|
|
||||||
|
Since `find_free` no longer creates any type bindings or environments,
|
||||||
|
this functionality is shouldered by `insert_types`:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/definition.cpp" 20 32 >}}
|
||||||
|
|
||||||
|
Now that free variables are properly computed, we are able to move on
|
||||||
|
to bigger and better things.
|
||||||
|
|
||||||
|
#### Nested Definitions
|
||||||
|
At present, our code for typechecking the whole program is located in
|
||||||
|
`main.cpp`:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/11/main.cpp" 43 61 >}}
|
||||||
|
|
||||||
|
This piece of code goes on. We now want this to be more general. Soon, `let/in`
|
||||||
|
expressions with bring with them definitions that are inside other definitions,
|
||||||
|
which will not be reachable at the top level. The fundamental topological
|
||||||
|
sorting algorithm, though, will remain the same. We can abstract a series
|
||||||
|
of definitions that need to be ordered and then typechecked into a new struct,
|
||||||
|
`definition_group`:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/definition.hpp" 73 83 >}}
|
||||||
|
|
||||||
|
This will be exactly like a list of `defn`/`data` definitions we have at the
|
||||||
|
top level, except now, it can also occur in other places, like `let/in`
|
||||||
|
expressions. Once again, ignore for the moment the `visibility` field.
|
||||||
|
|
||||||
|
The way we defined function ordering requires some extra work from
|
||||||
|
`definition_group`. Recall that conceptually, functions can only depend
|
||||||
|
on other functions defined in the same `let/in` expression, or, more generally,
|
||||||
|
in the same `definition_group`. This means that we now classify free variables
|
||||||
|
in definitions into two categories: free variables that refer to "nearby"
|
||||||
|
definitions (i.e. definitions in the same group) and free variables that refer
|
||||||
|
to "far away" definitions. The "nearby" variables will be used to do
|
||||||
|
topological ordering, while the "far away" variables can be passed along
|
||||||
|
further up, perhaps into an enclosing `let/in` expression (for which "nearby"
|
||||||
|
variables aren't actually free, since they are bound in the `let`). We
|
||||||
|
implement this partitioning of variables in `definition_group::find_free`:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/definition.cpp" 94 105 >}}
|
||||||
|
|
||||||
|
Notice that we have added a new `nearby_variables` field to `definition_defn`.
|
||||||
|
This is used on line 101, and will be once again used in `definition_group::typecheck`. Speaking of `typecheck`, let's look at its definition:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/definition.cpp" 107 145 >}}
|
||||||
|
|
||||||
|
This function is a little long, but conceptually, each `for` loop
|
||||||
|
contains a step of the process:
|
||||||
|
|
||||||
|
* The first loop declares all data types, so that constructors can
|
||||||
|
be verified to properly reference them.
|
||||||
|
* The second loop creates all the data type constructors.
|
||||||
|
* The third loop adds edges to our dependency graph.
|
||||||
|
* The fourth loop performs typechecking on the now-ordered groups of mutually
|
||||||
|
recursive functions.
|
||||||
|
* The first inner loop inserts the types of all the functions into the environment.
|
||||||
|
* The second inner loop actually performs typechecking.
|
||||||
|
* The third inner loop makes as many things polymorphic as possible.
|
||||||
|
|
||||||
|
We can now adjust our `parser.y` to use a `definition_group` instead of
|
||||||
|
two global vectors. First, we declare a global `definition_group`:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/parser.y" 10 10 >}}
|
||||||
|
|
||||||
|
Then, we adjust `definitions` to create `definition_group`s:
|
||||||
|
|
||||||
|
{{< codelines "text" "compiler/12/parser.y" 59 68 >}}
|
||||||
|
|
||||||
|
We can now adjust `main.cpp` to use the global `definition_group`. Among
|
||||||
|
other changes (such as removing `extern` references to `vector`s, and updating
|
||||||
|
function signatures) we also update the `typecheck_program` function:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/main.cpp" 41 49 >}}
|
||||||
|
|
||||||
|
Now, our code is ready for typechecking nested definitions, but not for
|
||||||
|
compiling them. The main thing that we still have to address is the addition
|
||||||
|
of new definitions to the global scope. Let's take a look at that next.
|
||||||
|
|
||||||
|
#### Global Definitions
|
||||||
|
We want every function, regardless of whether or not it was declared in the global
|
||||||
|
scope, to be processed and converted to LLVM code. The LLVM code conversion
|
||||||
|
takes several steps. First, the function's AST is translated into G-machine
|
||||||
|
instructions, which we covered in [part 5]({{< relref "05_compiler_execution.md" >}}),
|
||||||
|
by a process we covered in [part 6]({{< relref "06_compiler_compilation.md" >}}).
|
||||||
|
Then, an LLVM function is created for every function, and registered globally.
|
||||||
|
Finally, the G-machine instructions are converted
|
||||||
|
|
Loading…
Reference in New Issue
Block a user