Finish implementation description in part 12.
This commit is contained in:
parent
21851e3a9c
commit
c496be1031
|
@ -574,69 +574,257 @@ The
|
||||||
observant reader will have noticed that we have a new method: `translate`.
|
observant reader will have noticed that we have a new method: `translate`.
|
||||||
This is a new method for all `ast` descendants, and will implement the
|
This is a new method for all `ast` descendants, and will implement the
|
||||||
steps of moving definitions to the global scope and transforming the
|
steps of moving definitions to the global scope and transforming the
|
||||||
program. Before we get to it, though, let's quickly see the parsing
|
program. Before we get to it, though, let's look at the other relevant
|
||||||
rules for `ast_let` and `ast_lambda`:
|
pieces of code for `ast_let` and `ast_lambda`. First, their grammar
|
||||||
|
rules in `parser.y`:
|
||||||
|
|
||||||
{{< codelines "text" "compiler/12/parser.y" 107 115 >}}
|
{{< codelines "text" "compiler/12/parser.y" 107 115 >}}
|
||||||
|
|
||||||
This is pretty similar to the rest of the grammar, so I will give this no
|
This is pretty similar to the rest of the grammar, so I will give this no
|
||||||
further explanation.
|
further explanation. Next, their `find_free` and `typecheck` code.
|
||||||
|
We can start with `ast_let`:
|
||||||
|
|
||||||
{{< todo >}}
|
{{< codelines "C++" "compiler/12/ast.cpp" 275 289 >}}
|
||||||
Explain typechecking for lambda functions and let/in expressions.
|
|
||||||
{{< /todo >}}
|
|
||||||
|
|
||||||
{{< todo >}}
|
As you can see, `ast_let::find_free` works in a similar manner to `ast_case::find_free`.
|
||||||
Explain free variable detection for lambda functions and let/in expressions.
|
It finds the free variables in the `in` node as well as in each of the definitions
|
||||||
{{< /todo >}}
|
(taking advantage of the fact that `definition_group::find_free` populates the
|
||||||
|
given set with "far away" free variables). It then filters out any variables bound in
|
||||||
|
the `let` from the set of free variables in `in`, and returns the result.
|
||||||
|
|
||||||
|
Typechecking in `ast_let` relies on `definition_group::typecheck`, which holds
|
||||||
|
all of the required functionality for checking the new definitions.
|
||||||
|
Once the definitions are typechecked, we use their type information to
|
||||||
|
typecheck the `in` part of the expression (passing `definitions.env` to the
|
||||||
|
call to `typecheck` to make the new definitions visible).
|
||||||
|
|
||||||
|
Next, we look at `ast_lambda`:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/ast.cpp" 344 366 >}}
|
||||||
|
|
||||||
|
Again, `ast_lambda::find_free` works similarly to `definition_defn`, stripping
|
||||||
|
the variables expected by the function from the body's list of free variables.
|
||||||
|
Also like `definition_defn`, this new node remembers the free variables in
|
||||||
|
its body, which we will later use for lifting.
|
||||||
|
|
||||||
|
Typechecking in this node also proceeds similarly to `definition_defn`. We create
|
||||||
|
new type variables for each parameter and for the return value, and build up
|
||||||
|
a function type called `full_type`. We then typecheck the body using the
|
||||||
|
new environment (which now includes the variables), and return the function type we came up with.
|
||||||
|
|
||||||
#### Translation
|
#### Translation
|
||||||
While collecting all of the definitions into a global list, we can
|
Recalling the transformations we described earlier, we can observe two
|
||||||
also do some program transformations. Let's return to our earlier example:
|
major steps to what we have to do:
|
||||||
|
|
||||||
```Haskell {linenos=table}
|
1. Move the body of the original definition into its own
|
||||||
fourthPower x = square * square
|
|
||||||
where
|
|
||||||
square = x * x
|
|
||||||
```
|
|
||||||
|
|
||||||
We said it should be translated into something like:
|
|
||||||
|
|
||||||
```Haskell {linenos=table}
|
|
||||||
fourthPower x = square * square
|
|
||||||
where square = square' x
|
|
||||||
square' x = x * x
|
|
||||||
```
|
|
||||||
|
|
||||||
In our language, the original program above would be:
|
|
||||||
|
|
||||||
```text {linenos=table}
|
|
||||||
defn fourthPower x = {
|
|
||||||
let {
|
|
||||||
defn square = { x * x }
|
|
||||||
} in {
|
|
||||||
square * square
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
And the translated version would be:
|
|
||||||
|
|
||||||
```text {linenos=table}
|
|
||||||
defn fourthPower x = {
|
|
||||||
let {
|
|
||||||
defn square = { square' x }
|
|
||||||
} in {
|
|
||||||
square * square
|
|
||||||
}
|
|
||||||
}
|
|
||||||
defn square' x = { x * x }
|
|
||||||
```
|
|
||||||
|
|
||||||
Setting aside for the moment the naming of `square'` and `square`, we observe
|
|
||||||
that we want to perform the following operations:
|
|
||||||
|
|
||||||
1. Move the body of the original definition of `square` into its own
|
|
||||||
global definition, adding all the captured variables as arguments.
|
global definition, adding all the captured variables as arguments.
|
||||||
2. Replace the right hand side of the `let/in` expression with an application
|
2. Replace the right hand side of the `let/in` expression with an application
|
||||||
of the global definition to the variables it requires.
|
of the global definition to the variables it requires.
|
||||||
|
|
||||||
|
We will implement these in a new `translate` method, with the following
|
||||||
|
signature:
|
||||||
|
|
||||||
|
```C++
|
||||||
|
void ast::translate(global_scope& scope);
|
||||||
|
```
|
||||||
|
|
||||||
|
The `scope` parameter and its `add_function` and `add_constructor` methods will
|
||||||
|
be used to add definitions to the global scope. Each AST node will also
|
||||||
|
uses this method to implement the second step. Currently, only
|
||||||
|
`ast_let` and `ast_lambda` will need to modify themselves - all other
|
||||||
|
nodes will simply recursively call this method on their children. Let's jump
|
||||||
|
straight into implementing this method for `ast_let`:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/ast.cpp" 291 316 >}}
|
||||||
|
|
||||||
|
Since data type definitions don't really depend on anything else, we process
|
||||||
|
them first. This amounts to simply calling the `definition_data::into_globals`
|
||||||
|
methd, which itself simply calls `global_scope::add_constructor`:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/definition.cpp" 86 92 >}}
|
||||||
|
|
||||||
|
Note how `into_globals` updates the mangled name of its constructor
|
||||||
|
via `set_mangled_name`. This will help us decide which global
|
||||||
|
function to call during code generation. More on that later.
|
||||||
|
|
||||||
|
Starting with line 295, we start processing the function definitions
|
||||||
|
in the `let/in` expression. We remember how many arguments were
|
||||||
|
explicitly added to the function definition, and then call the
|
||||||
|
definition's `into_global` method. This method is implemented
|
||||||
|
as follows:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/definition.cpp" 40 49 >}}
|
||||||
|
|
||||||
|
First, this method collects all the non-global free variables in
|
||||||
|
its body, which will need to be passed to the global definition
|
||||||
|
as arguments. It then combines this list with the arguments
|
||||||
|
the user explicitly added to it, recursively translates
|
||||||
|
its body, creates a new global definition using `add_function`.
|
||||||
|
|
||||||
|
We return to `ast_let::translate` at line 299. Here,
|
||||||
|
we determine how many variables ended up being captured, by
|
||||||
|
subtracting the number of explicit parameters from the total
|
||||||
|
number of parameters the new global definition has. This number,
|
||||||
|
combined with the fact that we added all the 'implict' arguments
|
||||||
|
to the function to the beginning of the list, will let us
|
||||||
|
iterate over all implict arguments, creating a chain of partial
|
||||||
|
function applications.
|
||||||
|
|
||||||
|
But how do we build the application? We could use the mangled name
|
||||||
|
of the function, but this seems inelegant, especially since we
|
||||||
|
alreaady keep track of mangling information in `type_env`. Instead,
|
||||||
|
we create a new, local environment, in which we place an updated
|
||||||
|
binding for the function, marking it global, and setting
|
||||||
|
its mangled name to one generated by `global_sope`. This work is done
|
||||||
|
on lines 301-303. We create a reference to the global function
|
||||||
|
using the new environment on lines 305 and 306, and apply it to
|
||||||
|
all the implict arguments on lines 307-313. Finally, we
|
||||||
|
add the new 'basic' equation into `translated_definitions`.
|
||||||
|
|
||||||
|
Let's take a look at translating `ast_lambda` next:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/ast.cpp" 368 392 >}}
|
||||||
|
|
||||||
|
Once again, on lines 369-375 we find all the arguments to the
|
||||||
|
global definition. On lines 377-382 we create a new global
|
||||||
|
function and a mangled environment, and start creating the
|
||||||
|
chain of function applications. On lines 384-390, we actually
|
||||||
|
create the arguments and apply the function to them. Finally,
|
||||||
|
on line 391, we store this new chain of applications in the
|
||||||
|
`translated` field.
|
||||||
|
|
||||||
|
#### Compilation
|
||||||
|
There's still another piece of the puzzle missing, and
|
||||||
|
that's how we're going to compile `let/in` expressions into
|
||||||
|
G-machine instructions. We have allowed these expressions
|
||||||
|
to be recursive, and maybe even mutually recursive. This
|
||||||
|
worked fine with global definitions; instead of specifying
|
||||||
|
where on the stack we can find the reference to a global
|
||||||
|
function, we just created a new global node, and called
|
||||||
|
it good. Things are different now, though, because the definitions
|
||||||
|
we're referencing aren't _just_ global functions; they are partial
|
||||||
|
applications of a global function. And to reference themselves,
|
||||||
|
or their neighbors, they have to have a handle on their own nodes. We do this
|
||||||
|
using an instruction that we foreshadowed in part 5, but didn't use
|
||||||
|
until just now: __Alloc__.
|
||||||
|
|
||||||
|
__Alloc__ creates placeholder nodes on the stack. These nodes
|
||||||
|
are indirections, the same kind that we use for lazy evaluation
|
||||||
|
and sharing elsewhere. We create an indirection node for every
|
||||||
|
definition that we then build; when an expression needs access
|
||||||
|
to a definition, we give it the indirection node. After
|
||||||
|
building the partial application graph for an expression,
|
||||||
|
we use __Update__, making the corresponding indirection
|
||||||
|
point to this new graph. This way, the 'handle' to a
|
||||||
|
definition is always accessible, and once the definition's expression
|
||||||
|
is built, the handle correctly points to it. Here's the implementation:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/ast.cpp" 319 332 >}}
|
||||||
|
|
||||||
|
First, we create the __Alloc__ instruction. Then, we update
|
||||||
|
our environment to map each definition name to a location
|
||||||
|
within the newly allocated batch of nodes. Since we iterate
|
||||||
|
the definitions in order, 'pushing' them into our environment,
|
||||||
|
we end up with the convention of having the later definitions
|
||||||
|
closer to the top of the G-machine stack. Thus, when we
|
||||||
|
iterate the definitions again, this time to compile their
|
||||||
|
bodies, we have to do so starting with the highest offset,
|
||||||
|
and working our way down to __Update__-ing the top of the stack.
|
||||||
|
One the definitions have been compiled, we proceed to compiling
|
||||||
|
the `in` part of the expression as normal, using our updated
|
||||||
|
environment. Finally, we use __Slide__ to get rid of the definition
|
||||||
|
graphs, cleaning up the stack.
|
||||||
|
|
||||||
|
Compiling the `ast_lambda` is far more straightforward. We just
|
||||||
|
compile the resulting partial application as we normally would have:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/ast.cpp" 393 395 >}}
|
||||||
|
|
||||||
|
One more thing. Let's adopt the convention of storing __mangled__
|
||||||
|
names into the environment. This way, rather than looking up
|
||||||
|
mangled names only for global functions, which would be a 'gotcha'
|
||||||
|
for anyone working on the compiler, we will always use the mangled
|
||||||
|
names during compilation. To make this change, we make sure that
|
||||||
|
`ast_case` also uses `mangled_name`:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/ast.cpp" 228 228 >}}
|
||||||
|
|
||||||
|
We also update the logic for `ast_lid::compile` to use the mangled
|
||||||
|
name information:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/ast.cpp" 52 58 >}}
|
||||||
|
|
||||||
|
#### Fixing Type Generalization
|
||||||
|
This is a rather serious bug that made its way into the codebase
|
||||||
|
since part 10. Recall that we can only generalize type variables
|
||||||
|
that are free in the environment. Thus far, we haven't done that,
|
||||||
|
and we really should: I ran into incorrectly inferred types
|
||||||
|
in my first test of the `let/in` language feature.
|
||||||
|
|
||||||
|
We need to make our code capable of finding free variables in the
|
||||||
|
type environment. This requires the `type_mgr`, which associates
|
||||||
|
with type variables the real types they represent, if any. We
|
||||||
|
thus create methods with signatures as follows:
|
||||||
|
|
||||||
|
```C++
|
||||||
|
void type_env::find_free(const type_mgr& mgr, std::set<std::string>& into) const;
|
||||||
|
void type_env::find_free_except(const type_mgr& mgr, const std::string& avoid,
|
||||||
|
std::set<std::string>& into) const;
|
||||||
|
```
|
||||||
|
|
||||||
|
Why `find_free_except`? When generalizing a variable whose type was already
|
||||||
|
stored in the environment, all the type variables we could generalize would
|
||||||
|
not be 'free'. If they only occur in the type we're generalizing, though,
|
||||||
|
we shouldn't let that stop us! Thus, when finding free type variables, we will
|
||||||
|
avoid looking at the particular variable whose type is being generalized. The
|
||||||
|
implementations of the two methods are straightforward:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/type_env.cpp" 4 18 >}}
|
||||||
|
|
||||||
|
Note that `find_free_except` calls `find_free` in its recursive call. This
|
||||||
|
is not a bug: we _do_ want to include free type variables from bindings
|
||||||
|
that have the same name as the variable we're generalizing, but aren't found
|
||||||
|
in the same scope. As far as we're concerned, they're different variables!
|
||||||
|
The two methods use another `find_free` method which we add to `type_mgr`:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/type.cpp" 206 213 >}}
|
||||||
|
|
||||||
|
Finally, `generalize` makes sure not to use variables that it finds free:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/type_env.cpp" 68 81 >}}
|
||||||
|
|
||||||
|
#### Putting It All Together
|
||||||
|
All that's left is to tie the parts we've created into one coherent whole
|
||||||
|
in `main.cpp`. First of all, since we moved all of the LLVM-related
|
||||||
|
code into `global_scope`, we can safely replace that functionality
|
||||||
|
in `main.cpp` with a method call:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/main.cpp" 121 132 >}}
|
||||||
|
|
||||||
|
On the other hand, we need top-level logic to handle `definition_group`s.
|
||||||
|
This is pretty straightforward, and the main trick is to remember to
|
||||||
|
update the function's mangled name. Right now, depending on the choice
|
||||||
|
of manging algorithm, it's possible even for top-level functions to
|
||||||
|
have their names changed, and we must account for that. The whole code is:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/main.cpp" 52 62 >}}
|
||||||
|
|
||||||
|
Finally, we call `global_scope`'s methods in `main()`:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/main.cpp" 148 151 >}}
|
||||||
|
|
||||||
|
That's it! Please note that I've mentioned or hinted at minor changes to the
|
||||||
|
codebase. Detailing every single change this late into the project is
|
||||||
|
needlessly time consuming and verbose; Gitea reports that I've made 677
|
||||||
|
insertions into and 215 deletions from the code. As always, I provide
|
||||||
|
the [source code for the compiler](https://dev.danilafe.com/Web-Projects/blog-static/src/branch/master/code/compiler/12), and you can also take a look at the
|
||||||
|
[Gitea-generated diff](https://dev.danilafe.com/Web-Projects/blog-static/compare/1905601aaa96d11c771eae9c56bb9fc105050cda...21851e3a9c552383ee8c4bc878ea06e7d28c333e)
|
||||||
|
at the time of writing. If you want to follow along, feel free to check
|
||||||
|
them out!
|
||||||
|
|
||||||
|
### Running Our Programs
|
||||||
|
It's important to test all the language features that we just added. This
|
||||||
|
includes recursive definitions, nested function dependency cycles, and
|
||||||
|
uses of lambda functions. Some of the following examples will be rather
|
||||||
|
silly, but they should do a good job of checking that everything works
|
||||||
|
as we expect.
|
||||||
|
|
Loading…
Reference in New Issue
Block a user