Finish implementation description in part 12.

This commit is contained in:
Danila Fedorin 2020-06-20 20:46:54 -07:00
parent 21851e3a9c
commit c496be1031

View File

@ -574,69 +574,257 @@ The
observant reader will have noticed that we have a new method: `translate`.
This is a new method for all `ast` descendants, and will implement the
steps of moving definitions to the global scope and transforming the
program. Before we get to it, though, let's quickly see the parsing
rules for `ast_let` and `ast_lambda`:
program. Before we get to it, though, let's look at the other relevant
pieces of code for `ast_let` and `ast_lambda`. First, their grammar
rules in `parser.y`:
{{< codelines "text" "compiler/12/parser.y" 107 115 >}}
This is pretty similar to the rest of the grammar, so I will give this no
further explanation.
further explanation. Next, their `find_free` and `typecheck` code.
We can start with `ast_let`:
{{< todo >}}
Explain typechecking for lambda functions and let/in expressions.
{{< /todo >}}
{{< codelines "C++" "compiler/12/ast.cpp" 275 289 >}}
{{< todo >}}
Explain free variable detection for lambda functions and let/in expressions.
{{< /todo >}}
As you can see, `ast_let::find_free` works in a similar manner to `ast_case::find_free`.
It finds the free variables in the `in` node as well as in each of the definitions
(taking advantage of the fact that `definition_group::find_free` populates the
given set with "far away" free variables). It then filters out any variables bound in
the `let` from the set of free variables in `in`, and returns the result.
Typechecking in `ast_let` relies on `definition_group::typecheck`, which holds
all of the required functionality for checking the new definitions.
Once the definitions are typechecked, we use their type information to
typecheck the `in` part of the expression (passing `definitions.env` to the
call to `typecheck` to make the new definitions visible).
Next, we look at `ast_lambda`:
{{< codelines "C++" "compiler/12/ast.cpp" 344 366 >}}
Again, `ast_lambda::find_free` works similarly to `definition_defn`, stripping
the variables expected by the function from the body's list of free variables.
Also like `definition_defn`, this new node remembers the free variables in
its body, which we will later use for lifting.
Typechecking in this node also proceeds similarly to `definition_defn`. We create
new type variables for each parameter and for the return value, and build up
a function type called `full_type`. We then typecheck the body using the
new environment (which now includes the variables), and return the function type we came up with.
#### Translation
While collecting all of the definitions into a global list, we can
also do some program transformations. Let's return to our earlier example:
Recalling the transformations we described earlier, we can observe two
major steps to what we have to do:
```Haskell {linenos=table}
fourthPower x = square * square
where
square = x * x
```
We said it should be translated into something like:
```Haskell {linenos=table}
fourthPower x = square * square
where square = square' x
square' x = x * x
```
In our language, the original program above would be:
```text {linenos=table}
defn fourthPower x = {
let {
defn square = { x * x }
} in {
square * square
}
}
```
And the translated version would be:
```text {linenos=table}
defn fourthPower x = {
let {
defn square = { square' x }
} in {
square * square
}
}
defn square' x = { x * x }
```
Setting aside for the moment the naming of `square'` and `square`, we observe
that we want to perform the following operations:
1. Move the body of the original definition of `square` into its own
1. Move the body of the original definition into its own
global definition, adding all the captured variables as arguments.
2. Replace the right hand side of the `let/in` expression with an application
of the global definition to the variables it requires.
We will implement these in a new `translate` method, with the following
signature:
```C++
void ast::translate(global_scope& scope);
```
The `scope` parameter and its `add_function` and `add_constructor` methods will
be used to add definitions to the global scope. Each AST node will also
uses this method to implement the second step. Currently, only
`ast_let` and `ast_lambda` will need to modify themselves - all other
nodes will simply recursively call this method on their children. Let's jump
straight into implementing this method for `ast_let`:
{{< codelines "C++" "compiler/12/ast.cpp" 291 316 >}}
Since data type definitions don't really depend on anything else, we process
them first. This amounts to simply calling the `definition_data::into_globals`
methd, which itself simply calls `global_scope::add_constructor`:
{{< codelines "C++" "compiler/12/definition.cpp" 86 92 >}}
Note how `into_globals` updates the mangled name of its constructor
via `set_mangled_name`. This will help us decide which global
function to call during code generation. More on that later.
Starting with line 295, we start processing the function definitions
in the `let/in` expression. We remember how many arguments were
explicitly added to the function definition, and then call the
definition's `into_global` method. This method is implemented
as follows:
{{< codelines "C++" "compiler/12/definition.cpp" 40 49 >}}
First, this method collects all the non-global free variables in
its body, which will need to be passed to the global definition
as arguments. It then combines this list with the arguments
the user explicitly added to it, recursively translates
its body, creates a new global definition using `add_function`.
We return to `ast_let::translate` at line 299. Here,
we determine how many variables ended up being captured, by
subtracting the number of explicit parameters from the total
number of parameters the new global definition has. This number,
combined with the fact that we added all the 'implict' arguments
to the function to the beginning of the list, will let us
iterate over all implict arguments, creating a chain of partial
function applications.
But how do we build the application? We could use the mangled name
of the function, but this seems inelegant, especially since we
alreaady keep track of mangling information in `type_env`. Instead,
we create a new, local environment, in which we place an updated
binding for the function, marking it global, and setting
its mangled name to one generated by `global_sope`. This work is done
on lines 301-303. We create a reference to the global function
using the new environment on lines 305 and 306, and apply it to
all the implict arguments on lines 307-313. Finally, we
add the new 'basic' equation into `translated_definitions`.
Let's take a look at translating `ast_lambda` next:
{{< codelines "C++" "compiler/12/ast.cpp" 368 392 >}}
Once again, on lines 369-375 we find all the arguments to the
global definition. On lines 377-382 we create a new global
function and a mangled environment, and start creating the
chain of function applications. On lines 384-390, we actually
create the arguments and apply the function to them. Finally,
on line 391, we store this new chain of applications in the
`translated` field.
#### Compilation
There's still another piece of the puzzle missing, and
that's how we're going to compile `let/in` expressions into
G-machine instructions. We have allowed these expressions
to be recursive, and maybe even mutually recursive. This
worked fine with global definitions; instead of specifying
where on the stack we can find the reference to a global
function, we just created a new global node, and called
it good. Things are different now, though, because the definitions
we're referencing aren't _just_ global functions; they are partial
applications of a global function. And to reference themselves,
or their neighbors, they have to have a handle on their own nodes. We do this
using an instruction that we foreshadowed in part 5, but didn't use
until just now: __Alloc__.
__Alloc__ creates placeholder nodes on the stack. These nodes
are indirections, the same kind that we use for lazy evaluation
and sharing elsewhere. We create an indirection node for every
definition that we then build; when an expression needs access
to a definition, we give it the indirection node. After
building the partial application graph for an expression,
we use __Update__, making the corresponding indirection
point to this new graph. This way, the 'handle' to a
definition is always accessible, and once the definition's expression
is built, the handle correctly points to it. Here's the implementation:
{{< codelines "C++" "compiler/12/ast.cpp" 319 332 >}}
First, we create the __Alloc__ instruction. Then, we update
our environment to map each definition name to a location
within the newly allocated batch of nodes. Since we iterate
the definitions in order, 'pushing' them into our environment,
we end up with the convention of having the later definitions
closer to the top of the G-machine stack. Thus, when we
iterate the definitions again, this time to compile their
bodies, we have to do so starting with the highest offset,
and working our way down to __Update__-ing the top of the stack.
One the definitions have been compiled, we proceed to compiling
the `in` part of the expression as normal, using our updated
environment. Finally, we use __Slide__ to get rid of the definition
graphs, cleaning up the stack.
Compiling the `ast_lambda` is far more straightforward. We just
compile the resulting partial application as we normally would have:
{{< codelines "C++" "compiler/12/ast.cpp" 393 395 >}}
One more thing. Let's adopt the convention of storing __mangled__
names into the environment. This way, rather than looking up
mangled names only for global functions, which would be a 'gotcha'
for anyone working on the compiler, we will always use the mangled
names during compilation. To make this change, we make sure that
`ast_case` also uses `mangled_name`:
{{< codelines "C++" "compiler/12/ast.cpp" 228 228 >}}
We also update the logic for `ast_lid::compile` to use the mangled
name information:
{{< codelines "C++" "compiler/12/ast.cpp" 52 58 >}}
#### Fixing Type Generalization
This is a rather serious bug that made its way into the codebase
since part 10. Recall that we can only generalize type variables
that are free in the environment. Thus far, we haven't done that,
and we really should: I ran into incorrectly inferred types
in my first test of the `let/in` language feature.
We need to make our code capable of finding free variables in the
type environment. This requires the `type_mgr`, which associates
with type variables the real types they represent, if any. We
thus create methods with signatures as follows:
```C++
void type_env::find_free(const type_mgr& mgr, std::set<std::string>& into) const;
void type_env::find_free_except(const type_mgr& mgr, const std::string& avoid,
std::set<std::string>& into) const;
```
Why `find_free_except`? When generalizing a variable whose type was already
stored in the environment, all the type variables we could generalize would
not be 'free'. If they only occur in the type we're generalizing, though,
we shouldn't let that stop us! Thus, when finding free type variables, we will
avoid looking at the particular variable whose type is being generalized. The
implementations of the two methods are straightforward:
{{< codelines "C++" "compiler/12/type_env.cpp" 4 18 >}}
Note that `find_free_except` calls `find_free` in its recursive call. This
is not a bug: we _do_ want to include free type variables from bindings
that have the same name as the variable we're generalizing, but aren't found
in the same scope. As far as we're concerned, they're different variables!
The two methods use another `find_free` method which we add to `type_mgr`:
{{< codelines "C++" "compiler/12/type.cpp" 206 213 >}}
Finally, `generalize` makes sure not to use variables that it finds free:
{{< codelines "C++" "compiler/12/type_env.cpp" 68 81 >}}
#### Putting It All Together
All that's left is to tie the parts we've created into one coherent whole
in `main.cpp`. First of all, since we moved all of the LLVM-related
code into `global_scope`, we can safely replace that functionality
in `main.cpp` with a method call:
{{< codelines "C++" "compiler/12/main.cpp" 121 132 >}}
On the other hand, we need top-level logic to handle `definition_group`s.
This is pretty straightforward, and the main trick is to remember to
update the function's mangled name. Right now, depending on the choice
of manging algorithm, it's possible even for top-level functions to
have their names changed, and we must account for that. The whole code is:
{{< codelines "C++" "compiler/12/main.cpp" 52 62 >}}
Finally, we call `global_scope`'s methods in `main()`:
{{< codelines "C++" "compiler/12/main.cpp" 148 151 >}}
That's it! Please note that I've mentioned or hinted at minor changes to the
codebase. Detailing every single change this late into the project is
needlessly time consuming and verbose; Gitea reports that I've made 677
insertions into and 215 deletions from the code. As always, I provide
the [source code for the compiler](https://dev.danilafe.com/Web-Projects/blog-static/src/branch/master/code/compiler/12), and you can also take a look at the
[Gitea-generated diff](https://dev.danilafe.com/Web-Projects/blog-static/compare/1905601aaa96d11c771eae9c56bb9fc105050cda...21851e3a9c552383ee8c4bc878ea06e7d28c333e)
at the time of writing. If you want to follow along, feel free to check
them out!
### Running Our Programs
It's important to test all the language features that we just added. This
includes recursive definitions, nested function dependency cycles, and
uses of lambda functions. Some of the following examples will be rather
silly, but they should do a good job of checking that everything works
as we expect.