Finish implementation description in part 12.
This commit is contained in:
parent
21851e3a9c
commit
c496be1031
@ -574,69 +574,257 @@ The
|
||||
observant reader will have noticed that we have a new method: `translate`.
|
||||
This is a new method for all `ast` descendants, and will implement the
|
||||
steps of moving definitions to the global scope and transforming the
|
||||
program. Before we get to it, though, let's quickly see the parsing
|
||||
rules for `ast_let` and `ast_lambda`:
|
||||
program. Before we get to it, though, let's look at the other relevant
|
||||
pieces of code for `ast_let` and `ast_lambda`. First, their grammar
|
||||
rules in `parser.y`:
|
||||
|
||||
{{< codelines "text" "compiler/12/parser.y" 107 115 >}}
|
||||
|
||||
This is pretty similar to the rest of the grammar, so I will give this no
|
||||
further explanation.
|
||||
further explanation. Next, their `find_free` and `typecheck` code.
|
||||
We can start with `ast_let`:
|
||||
|
||||
{{< todo >}}
|
||||
Explain typechecking for lambda functions and let/in expressions.
|
||||
{{< /todo >}}
|
||||
{{< codelines "C++" "compiler/12/ast.cpp" 275 289 >}}
|
||||
|
||||
{{< todo >}}
|
||||
Explain free variable detection for lambda functions and let/in expressions.
|
||||
{{< /todo >}}
|
||||
As you can see, `ast_let::find_free` works in a similar manner to `ast_case::find_free`.
|
||||
It finds the free variables in the `in` node as well as in each of the definitions
|
||||
(taking advantage of the fact that `definition_group::find_free` populates the
|
||||
given set with "far away" free variables). It then filters out any variables bound in
|
||||
the `let` from the set of free variables in `in`, and returns the result.
|
||||
|
||||
Typechecking in `ast_let` relies on `definition_group::typecheck`, which holds
|
||||
all of the required functionality for checking the new definitions.
|
||||
Once the definitions are typechecked, we use their type information to
|
||||
typecheck the `in` part of the expression (passing `definitions.env` to the
|
||||
call to `typecheck` to make the new definitions visible).
|
||||
|
||||
Next, we look at `ast_lambda`:
|
||||
|
||||
{{< codelines "C++" "compiler/12/ast.cpp" 344 366 >}}
|
||||
|
||||
Again, `ast_lambda::find_free` works similarly to `definition_defn`, stripping
|
||||
the variables expected by the function from the body's list of free variables.
|
||||
Also like `definition_defn`, this new node remembers the free variables in
|
||||
its body, which we will later use for lifting.
|
||||
|
||||
Typechecking in this node also proceeds similarly to `definition_defn`. We create
|
||||
new type variables for each parameter and for the return value, and build up
|
||||
a function type called `full_type`. We then typecheck the body using the
|
||||
new environment (which now includes the variables), and return the function type we came up with.
|
||||
|
||||
#### Translation
|
||||
While collecting all of the definitions into a global list, we can
|
||||
also do some program transformations. Let's return to our earlier example:
|
||||
Recalling the transformations we described earlier, we can observe two
|
||||
major steps to what we have to do:
|
||||
|
||||
```Haskell {linenos=table}
|
||||
fourthPower x = square * square
|
||||
where
|
||||
square = x * x
|
||||
```
|
||||
|
||||
We said it should be translated into something like:
|
||||
|
||||
```Haskell {linenos=table}
|
||||
fourthPower x = square * square
|
||||
where square = square' x
|
||||
square' x = x * x
|
||||
```
|
||||
|
||||
In our language, the original program above would be:
|
||||
|
||||
```text {linenos=table}
|
||||
defn fourthPower x = {
|
||||
let {
|
||||
defn square = { x * x }
|
||||
} in {
|
||||
square * square
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
And the translated version would be:
|
||||
|
||||
```text {linenos=table}
|
||||
defn fourthPower x = {
|
||||
let {
|
||||
defn square = { square' x }
|
||||
} in {
|
||||
square * square
|
||||
}
|
||||
}
|
||||
defn square' x = { x * x }
|
||||
```
|
||||
|
||||
Setting aside for the moment the naming of `square'` and `square`, we observe
|
||||
that we want to perform the following operations:
|
||||
|
||||
1. Move the body of the original definition of `square` into its own
|
||||
1. Move the body of the original definition into its own
|
||||
global definition, adding all the captured variables as arguments.
|
||||
2. Replace the right hand side of the `let/in` expression with an application
|
||||
of the global definition to the variables it requires.
|
||||
|
||||
We will implement these in a new `translate` method, with the following
|
||||
signature:
|
||||
|
||||
```C++
|
||||
void ast::translate(global_scope& scope);
|
||||
```
|
||||
|
||||
The `scope` parameter and its `add_function` and `add_constructor` methods will
|
||||
be used to add definitions to the global scope. Each AST node will also
|
||||
uses this method to implement the second step. Currently, only
|
||||
`ast_let` and `ast_lambda` will need to modify themselves - all other
|
||||
nodes will simply recursively call this method on their children. Let's jump
|
||||
straight into implementing this method for `ast_let`:
|
||||
|
||||
{{< codelines "C++" "compiler/12/ast.cpp" 291 316 >}}
|
||||
|
||||
Since data type definitions don't really depend on anything else, we process
|
||||
them first. This amounts to simply calling the `definition_data::into_globals`
|
||||
methd, which itself simply calls `global_scope::add_constructor`:
|
||||
|
||||
{{< codelines "C++" "compiler/12/definition.cpp" 86 92 >}}
|
||||
|
||||
Note how `into_globals` updates the mangled name of its constructor
|
||||
via `set_mangled_name`. This will help us decide which global
|
||||
function to call during code generation. More on that later.
|
||||
|
||||
Starting with line 295, we start processing the function definitions
|
||||
in the `let/in` expression. We remember how many arguments were
|
||||
explicitly added to the function definition, and then call the
|
||||
definition's `into_global` method. This method is implemented
|
||||
as follows:
|
||||
|
||||
{{< codelines "C++" "compiler/12/definition.cpp" 40 49 >}}
|
||||
|
||||
First, this method collects all the non-global free variables in
|
||||
its body, which will need to be passed to the global definition
|
||||
as arguments. It then combines this list with the arguments
|
||||
the user explicitly added to it, recursively translates
|
||||
its body, creates a new global definition using `add_function`.
|
||||
|
||||
We return to `ast_let::translate` at line 299. Here,
|
||||
we determine how many variables ended up being captured, by
|
||||
subtracting the number of explicit parameters from the total
|
||||
number of parameters the new global definition has. This number,
|
||||
combined with the fact that we added all the 'implict' arguments
|
||||
to the function to the beginning of the list, will let us
|
||||
iterate over all implict arguments, creating a chain of partial
|
||||
function applications.
|
||||
|
||||
But how do we build the application? We could use the mangled name
|
||||
of the function, but this seems inelegant, especially since we
|
||||
alreaady keep track of mangling information in `type_env`. Instead,
|
||||
we create a new, local environment, in which we place an updated
|
||||
binding for the function, marking it global, and setting
|
||||
its mangled name to one generated by `global_sope`. This work is done
|
||||
on lines 301-303. We create a reference to the global function
|
||||
using the new environment on lines 305 and 306, and apply it to
|
||||
all the implict arguments on lines 307-313. Finally, we
|
||||
add the new 'basic' equation into `translated_definitions`.
|
||||
|
||||
Let's take a look at translating `ast_lambda` next:
|
||||
|
||||
{{< codelines "C++" "compiler/12/ast.cpp" 368 392 >}}
|
||||
|
||||
Once again, on lines 369-375 we find all the arguments to the
|
||||
global definition. On lines 377-382 we create a new global
|
||||
function and a mangled environment, and start creating the
|
||||
chain of function applications. On lines 384-390, we actually
|
||||
create the arguments and apply the function to them. Finally,
|
||||
on line 391, we store this new chain of applications in the
|
||||
`translated` field.
|
||||
|
||||
#### Compilation
|
||||
There's still another piece of the puzzle missing, and
|
||||
that's how we're going to compile `let/in` expressions into
|
||||
G-machine instructions. We have allowed these expressions
|
||||
to be recursive, and maybe even mutually recursive. This
|
||||
worked fine with global definitions; instead of specifying
|
||||
where on the stack we can find the reference to a global
|
||||
function, we just created a new global node, and called
|
||||
it good. Things are different now, though, because the definitions
|
||||
we're referencing aren't _just_ global functions; they are partial
|
||||
applications of a global function. And to reference themselves,
|
||||
or their neighbors, they have to have a handle on their own nodes. We do this
|
||||
using an instruction that we foreshadowed in part 5, but didn't use
|
||||
until just now: __Alloc__.
|
||||
|
||||
__Alloc__ creates placeholder nodes on the stack. These nodes
|
||||
are indirections, the same kind that we use for lazy evaluation
|
||||
and sharing elsewhere. We create an indirection node for every
|
||||
definition that we then build; when an expression needs access
|
||||
to a definition, we give it the indirection node. After
|
||||
building the partial application graph for an expression,
|
||||
we use __Update__, making the corresponding indirection
|
||||
point to this new graph. This way, the 'handle' to a
|
||||
definition is always accessible, and once the definition's expression
|
||||
is built, the handle correctly points to it. Here's the implementation:
|
||||
|
||||
{{< codelines "C++" "compiler/12/ast.cpp" 319 332 >}}
|
||||
|
||||
First, we create the __Alloc__ instruction. Then, we update
|
||||
our environment to map each definition name to a location
|
||||
within the newly allocated batch of nodes. Since we iterate
|
||||
the definitions in order, 'pushing' them into our environment,
|
||||
we end up with the convention of having the later definitions
|
||||
closer to the top of the G-machine stack. Thus, when we
|
||||
iterate the definitions again, this time to compile their
|
||||
bodies, we have to do so starting with the highest offset,
|
||||
and working our way down to __Update__-ing the top of the stack.
|
||||
One the definitions have been compiled, we proceed to compiling
|
||||
the `in` part of the expression as normal, using our updated
|
||||
environment. Finally, we use __Slide__ to get rid of the definition
|
||||
graphs, cleaning up the stack.
|
||||
|
||||
Compiling the `ast_lambda` is far more straightforward. We just
|
||||
compile the resulting partial application as we normally would have:
|
||||
|
||||
{{< codelines "C++" "compiler/12/ast.cpp" 393 395 >}}
|
||||
|
||||
One more thing. Let's adopt the convention of storing __mangled__
|
||||
names into the environment. This way, rather than looking up
|
||||
mangled names only for global functions, which would be a 'gotcha'
|
||||
for anyone working on the compiler, we will always use the mangled
|
||||
names during compilation. To make this change, we make sure that
|
||||
`ast_case` also uses `mangled_name`:
|
||||
|
||||
{{< codelines "C++" "compiler/12/ast.cpp" 228 228 >}}
|
||||
|
||||
We also update the logic for `ast_lid::compile` to use the mangled
|
||||
name information:
|
||||
|
||||
{{< codelines "C++" "compiler/12/ast.cpp" 52 58 >}}
|
||||
|
||||
#### Fixing Type Generalization
|
||||
This is a rather serious bug that made its way into the codebase
|
||||
since part 10. Recall that we can only generalize type variables
|
||||
that are free in the environment. Thus far, we haven't done that,
|
||||
and we really should: I ran into incorrectly inferred types
|
||||
in my first test of the `let/in` language feature.
|
||||
|
||||
We need to make our code capable of finding free variables in the
|
||||
type environment. This requires the `type_mgr`, which associates
|
||||
with type variables the real types they represent, if any. We
|
||||
thus create methods with signatures as follows:
|
||||
|
||||
```C++
|
||||
void type_env::find_free(const type_mgr& mgr, std::set<std::string>& into) const;
|
||||
void type_env::find_free_except(const type_mgr& mgr, const std::string& avoid,
|
||||
std::set<std::string>& into) const;
|
||||
```
|
||||
|
||||
Why `find_free_except`? When generalizing a variable whose type was already
|
||||
stored in the environment, all the type variables we could generalize would
|
||||
not be 'free'. If they only occur in the type we're generalizing, though,
|
||||
we shouldn't let that stop us! Thus, when finding free type variables, we will
|
||||
avoid looking at the particular variable whose type is being generalized. The
|
||||
implementations of the two methods are straightforward:
|
||||
|
||||
{{< codelines "C++" "compiler/12/type_env.cpp" 4 18 >}}
|
||||
|
||||
Note that `find_free_except` calls `find_free` in its recursive call. This
|
||||
is not a bug: we _do_ want to include free type variables from bindings
|
||||
that have the same name as the variable we're generalizing, but aren't found
|
||||
in the same scope. As far as we're concerned, they're different variables!
|
||||
The two methods use another `find_free` method which we add to `type_mgr`:
|
||||
|
||||
{{< codelines "C++" "compiler/12/type.cpp" 206 213 >}}
|
||||
|
||||
Finally, `generalize` makes sure not to use variables that it finds free:
|
||||
|
||||
{{< codelines "C++" "compiler/12/type_env.cpp" 68 81 >}}
|
||||
|
||||
#### Putting It All Together
|
||||
All that's left is to tie the parts we've created into one coherent whole
|
||||
in `main.cpp`. First of all, since we moved all of the LLVM-related
|
||||
code into `global_scope`, we can safely replace that functionality
|
||||
in `main.cpp` with a method call:
|
||||
|
||||
{{< codelines "C++" "compiler/12/main.cpp" 121 132 >}}
|
||||
|
||||
On the other hand, we need top-level logic to handle `definition_group`s.
|
||||
This is pretty straightforward, and the main trick is to remember to
|
||||
update the function's mangled name. Right now, depending on the choice
|
||||
of manging algorithm, it's possible even for top-level functions to
|
||||
have their names changed, and we must account for that. The whole code is:
|
||||
|
||||
{{< codelines "C++" "compiler/12/main.cpp" 52 62 >}}
|
||||
|
||||
Finally, we call `global_scope`'s methods in `main()`:
|
||||
|
||||
{{< codelines "C++" "compiler/12/main.cpp" 148 151 >}}
|
||||
|
||||
That's it! Please note that I've mentioned or hinted at minor changes to the
|
||||
codebase. Detailing every single change this late into the project is
|
||||
needlessly time consuming and verbose; Gitea reports that I've made 677
|
||||
insertions into and 215 deletions from the code. As always, I provide
|
||||
the [source code for the compiler](https://dev.danilafe.com/Web-Projects/blog-static/src/branch/master/code/compiler/12), and you can also take a look at the
|
||||
[Gitea-generated diff](https://dev.danilafe.com/Web-Projects/blog-static/compare/1905601aaa96d11c771eae9c56bb9fc105050cda...21851e3a9c552383ee8c4bc878ea06e7d28c333e)
|
||||
at the time of writing. If you want to follow along, feel free to check
|
||||
them out!
|
||||
|
||||
### Running Our Programs
|
||||
It's important to test all the language features that we just added. This
|
||||
includes recursive definitions, nested function dependency cycles, and
|
||||
uses of lambda functions. Some of the following examples will be rather
|
||||
silly, but they should do a good job of checking that everything works
|
||||
as we expect.
|
||||
|
Loading…
Reference in New Issue
Block a user