Add more implementation content to part 12.

This commit is contained in:
Danila Fedorin 2020-06-18 22:29:38 -07:00
parent 9c4d7c514f
commit f6ca13d6dc
1 changed files with 180 additions and 4 deletions

View File

@ -106,6 +106,43 @@ Wait a moment, didn't we just talk about nested polymorphic definitions, and how
This is true, but why should we perform transformations on a malformed program? Typechecking before pulling functions to the global scope will help us save the work, and breaking down one dependency-searching problem (which is \(O(n^3)\) thanks to Warshall's) into smaller, independent problems may even lead to better performance. Furthermore, typechecking before program transformations will help us come up with more helpful error messages.
{{< /sidenote >}} and can be transformed into a sequence of instructions just like any other global function. It has been pulled from its `where` (which, by the way, is pretty much equivalent to a `let/in`) to the top level.
Now, see how `addSingle` became `(addSingle n)`? If we chose to rewrite the
program this way, we'd have to find-and-replace every instance of `addSingle`
in the function body, which would be tedious and require us to keep
track of shadowed variables and the like. Also, what if we used a local
definition twice in the original piece of code? How about something like this:
```Haskell {linenos=table}
fourthPower x = square * square
where
square = x * x
```
Applying the strategy we saw above, we get:
```Haskell {linenos=table}
fourthPower x = (square x) * (square x)
square x = x * x
```
This is valid, except that in our evaluation model, the two instances
of `(square x)` will be built independently of one another, and thus,
will not be shared. This, in turn, will mean that `square` will be called
twice, which is not what we would expect from looking at the original program.
This isn't good. Instead, why don't we keep the `where`, but modify it
as follows:
```Haskell {linenos=table}
fourthPower x = square * square
where square = square' x
square' x = x * x
```
This time, assuming we can properly implement `where`, the call to
`square' x` should only occur once. Though I've been using `where`,
which leads to less clutter in Haskell code, the exact same approach applies
to `let/in`, and that's what we'll be using in our language.
This technique of replacing captured variables with arguments, and pulling closures into the global scope to aid compilation, is called [Lambda Lifting](https://en.wikipedia.org/wiki/Lambda_lifting). Its name is no coincidence - lambda functions need to undergo the same kind of transformation as our nested definitions (unlike nested definitions, though, lambda functions need to be named). This is why they are included in this post together with `let/in`!
### Implementation
@ -285,10 +322,149 @@ compiling them. The main thing that we still have to address is the addition
of new definitions to the global scope. Let's take a look at that next.
#### Global Definitions
We want every function, regardless of whether or not it was declared in the global
scope, to be processed and converted to LLVM code. The LLVM code conversion
takes several steps. First, the function's AST is translated into G-machine
We want every function (and even non-function definitions that capture surrounding
variables), regardless of whether or not it was declared in the global scope,
to be processed and converted to LLVM code. The LLVM code conversion takes
several steps. First, the function's AST is translated into G-machine
instructions, which we covered in [part 5]({{< relref "05_compiler_execution.md" >}}),
by a process we covered in [part 6]({{< relref "06_compiler_compilation.md" >}}).
Then, an LLVM function is created for every function, and registered globally.
Finally, the G-machine instructions are converted
Finally, the G-machine instructions are converted into LLVM IR, which is
inserted into the previously created functions. These things
can't be done in a single pass: at the very least, we can't start translating
G-machine instructions into LLVM IR until functions are globally declared,
because we would otherwise have no means of referencing other functions. It
makes sense to me, then, to pull out all the 'global' definitions into
a single top-level list (perhaps somewhere in `main.cpp`).
Let's start implementing this with a new `global_scope` struct. This
struct will contain all of the global function and constructor definitions:
{{< codelines "C++" "compiler/12/global_scope.hpp" 42 55 >}}
This struct will allow us to keep track of all the global definitions,
emitting them as we go, and then coming back to them as necessary.
There are also signs of another piece of functionality: `occurence_count`
and `mangle_name`. These two will be used to handle duplicate names.
We cannot have two global functions named the same thing, but we can
easily imagine a situation in which two separate `let/in` expressions define
a variable like `x`, which then needs to be lifted to the global scope. We
resolve such conflicts by slightly changing - "mangling" - the name of
one of the resulting global definitions. We allow the first global definition
to be named the same as it was originally (in our example, this would be `x`).
However, if we detect that a global definition `x` already exists (we
track this using `occurence_count`), we rename it to `x_1`. Subsequent
global definitions will end up being named `x_2`, `x_3`, and so on.
Alright, let's take a look at `global_function` and `global_constructor`.
Here's the former:
{{< codelines "C++" "compiler/12/global_scope.hpp" 11 27 >}}
There's nothing really surprising here: all of the fields
are reminiscent of `definition_defn`, though some type-related variables
are missing. We also include the three compilation-related methods,
`compile`, `declare_llvm`, and `generate_llvm`, which were previously in `definition_defn`. Let's look at `global_constructor` now:
{{< codelines "C++" "compiler/12/global_scope.hpp" 29 40 >}}
This maps pretty closely to a single `definition_data::constructor`.
There's a difference here that is not clear at a glance, though. Whereas
the `name` in a `definition_defn` or `definition_data` refers to the
name as given by the user in the code, the `name` of a `global_function`
or `global_constructor` has gone through mangling, and thus, should be
unique.
Let's now look at the implementation of these structs' methods. The methods
`add_function` and `add_constructor` are pretty straightforward. Here's
the former:
{{< codelines "C++" "compiler/12/global_scope.cpp" 39 43 >}}
And here's the latter:
{{< codelines "C++" "compiler/12/global_scope.cpp" 45 49 >}}
In both of these functions, we return a reference to the new global
definition we created. This helps us access the mangled `name` field,
and, in the case of `global_function`, inspect the `ast_ptr` that represents
its body.
Next, we have `global_scope::compile` and `global_scope::generate_llvm`,
which encapsulate these operations on all global definitions. Their
implementations are very straightforward, and are similar to the
`gen_llvm` function we used to have in our `main.cpp`:
{{< codelines "C++" "compiler/12/global_scope.cpp" 51 67 >}}
Finally, we have `mangle`, which takes care of potentially duplicate
variable names:
{{< codelines "C++" "compiler/12/global_scope.cpp" 69 83 >}}
Let's move on to the global definition structs.
The `compile`, `declare_llvm`, and `generate_llvm` methods for
`global_function` are pretty much the same as those that we used to have
in `definition_defn`:
{{< codelines "C++" "compiler/12/global_scope.cpp" 4 24 >}}
The same is true for `global_constructor` and its method `generate_llvm`:
{{< codelines "C++" "compiler/12/global_scope.cpp" 26 37 >}}
Recall that in this case, we need not have two methods for declaring
and generating LLVM, since constructors don't reference other constructors,
and are always generated before any function definitions.
#### Translation
While collecting all of the definitions into a global list, we can
also do some program transformations. Let's return to our earlier example:
```Haskell {linenos=table}
fourthPower x = square * square
where
square = x * x
```
We said it should be translated into something like:
```Haskell {linenos=table}
fourthPower x = square * square
where square = square' x
square' x = x * x
```
In our language, the original program above would be:
```text {linenos=table}
defn fourthPower x = {
let {
defn square = { x * x }
} in {
square * square
}
}
```
And the translated version would be:
```text {linenos=table}
defn fourthPower x = {
let {
defn square = { square' x }
} in {
square * square
}
}
defn square' x = { x * x }
```
Setting aside for the moment the naming of `square'` and `square`, we observe
that we want to perform the following operations:
1. Move the body of the original definition of `square` into its own
global definition, adding all the captured variables as arguments.
2. Replace the right hand side of the `let/in` expression with an application
of the global definition to the variables it requires.