Add more content to part 12.

This commit is contained in:
Danila Fedorin 2020-06-19 02:22:08 -07:00
parent 600d5b91ea
commit 21851e3a9c
1 changed files with 172 additions and 0 deletions

View File

@ -418,6 +418,178 @@ Recall that in this case, we need not have two methods for declaring
and generating LLVM, since constructors don't reference other constructors,
and are always generated before any function definitions.
#### Visibility
Should we really be turning _all_ free variables in a function definition
into arguments? Consider the following piece of Haskell code:
```Haskell {linenos=table}
add x y = x + y
mul x y = x * y
something = mul (add 1 3) 3
```
In the definition of `something`, `mul` and `add` occur free.
A very naive lifting algorithm might be tempted to rewrite such a program
as follows:
```Haskell {linenos=table}
add x y = x + y
mul x y = x * y
something' add mul = mul (add 1 3) 3
something = something' add mul
```
But that's absurd! Not only are `add` and `mul` available globally,
but such a rewrite generates another definition with free variables,
which means we didn't really improve our program in any way. From this
example, we can see that we don't want to be turning reference to global
variables into function parameters. But how can we tell if a variable
we're trying to operate on is global or not? I propose a flag in our
`type_env`, which we'll augment to be used as a symbol table. To do
this, we update the implementation of `type_env` to map variables to
values of a struct `variable_data`:
{{< codelines "C++" "compiler/12/type_env.hpp" 13 22 >}}
The `visibility` enum is defined as follows:
{{< codelines "C++" "compiler/12/type_env.hpp" 10 10 >}}
As you can see from the above snippet, we also added a `mangled_name` field
to the new `variable_data` struct. We will be using this field shortly. We
also add a few methods to our `type_env`, and end up with the following:
{{< codelines "C++" "compiler/12/type_env.hpp" 31 44 >}}
We will come back to `find_free` and `find_free_except`, as well as
`set_mangled_name` and `get_mangled_name`. For now, we just adjust `bind` to
take a visibility parameter that defaults to `local`, and implement
`is_global`:
{{< codelines "C++" "compiler/12/type_env.cpp" 27 32 >}}
Remember the `visibility::global` in `parser.y`? This is where that comes in.
Specifically, we recall that `definition_defn::insert_types` is responsible
for placing function types into the environment, making them accessible
during typechecking later. At this time, we already need to know whether
or not the definitions are global or local (so that we can create the binding).
Thus, we add `visibility` as a parameter to `insert_types`:
{{< codelines "C++" "compiler/12/definition.hpp" 44 44 >}}
Since we are now moving from manually wrangling definitions towards using
`definition_group`, we make it so that the group itself provides this
argument. To do this, we add the `visibility` field from before to it,
and set it in the parser. One more thing: since constructors never
capture variables, we can always move them straight to the global
scope, and thus, we'll always mark them with `visibility::global`.
#### Managing Mangled Names
Just mangling names is not enough. Consider the following program:
```text {linenos=table}
defn packOne x = {
let {
data Packed a = { Pack a }
} in {
Pack x
}
}
defn packTwo x = {
let {
data Packed a = { Pack a }
} in {
Pack x
}
}
```
{{< sidenote "right" "lifting-types-note" "Lifting the data type declarations" >}}
We are actually not <em>quite</em> doing something like the following snippet.
The reason for this is that we don't mangle the names for types. I pointed
out this potential issue in a sidenote in the previous post. Since the size
of this post is already balooning, I will not deal with this issue here.
Even at the end of this post, our compiler will not be able to distinguish
between the two <code>Packed</code> types. We will hopefully get to it later.
{{< /sidenote >}} and their constructors into the global
scope gives us something like:
``` {linenos=table}
data Packed a = { Pack a }
data Packed_1 a = { Pack_1 a }
defn packOne x = { Pack x }
defn packTwo x = { Pack_1 x }
```
Notice that we had to rename one of the calls to `Pack` to be a call to
be `Pack_1`. To actually change our AST to reference `Pack_1`, we'd have
to traverse the whole tree, and make sure to keep track of definitions
that could shadow `Pack` further down. This is cumbersome. Instead, we
can mark a variable as referring to a mangled version of itself, and
access this information when needed. To do this, we add the `mangled_name`
field to the `variable_data` struct as we've seen above, and implement
the `set_mangled_name` and `get_mangled_name` methods. The former:
{{< codelines "C++" "compiler/12/type_env.cpp" 34 37 >}}
And the latter:
{{< codelines "C++" "compiler/12/type_env.cpp" 39 45 >}}
We don't allow the `set_mangled_name` to affect variables that are declared
above the receiving `type_env`, and use the empty string as a 'none' value.
Now, when lifting data type constructors, we'll be able to use
`set_mangled_name` to make sure constructor calls are made correctly. We
will also be able to use this in other cases, like the translation
of local function definitions.
#### New AST Nodes
Finally, it's time for us to add new AST nodes to our language.
Specifically, these nodes are `ast_let` (for `let/in` expressions)
and `ast_lambda` for lambda functions. We declare them as follows:
{{< codelines "C++" "compiler/12/ast.hpp" 131 166 >}}
In `ast_let`, the `definitions` field corresponds to the original definitions
given by the user in the program, and the `in` field corresponds to the
expression which uses these definitions. In the process of lifting, though,
we eventually transfer each of the definitions to the global scope, replacing
their right hand sides with partial applications. After this transformation,
all the data type definitions are effectively gone, and all the function
definitions are converted into the simple form `x = f a1 ... an`. We hold
these post-transformation equations in the `translated_definitions` field,
and it's them that we compile in this node's `compile` method.
In `ast_lambda`, we allow multiple parameters (like Haskell's `\x y -> x + y`).
We store these parameters in the `params` field, and we store the lambda's
expression in the `body` field. Just like `definition_defn`,
the `ast_lambda` node maintains a separate environment in which its children
have been bound, and a list of variables that occur freely in its body. The
former is used for typechecking, while the latter is used for lifting.
Finally, the `translated` field holds the lambda function's form
after its body has been transformed into a global function. Similarly to
`ast_let`, this node will be in the form `f a1 ... an`.
The
observant reader will have noticed that we have a new method: `translate`.
This is a new method for all `ast` descendants, and will implement the
steps of moving definitions to the global scope and transforming the
program. Before we get to it, though, let's quickly see the parsing
rules for `ast_let` and `ast_lambda`:
{{< codelines "text" "compiler/12/parser.y" 107 115 >}}
This is pretty similar to the rest of the grammar, so I will give this no
further explanation.
{{< todo >}}
Explain typechecking for lambda functions and let/in expressions.
{{< /todo >}}
{{< todo >}}
Explain free variable detection for lambda functions and let/in expressions.
{{< /todo >}}
#### Translation
While collecting all of the definitions into a global list, we can
also do some program transformations. Let's return to our earlier example: