Add more content to part 12.
This commit is contained in:
parent
600d5b91ea
commit
21851e3a9c
|
@ -418,6 +418,178 @@ Recall that in this case, we need not have two methods for declaring
|
||||||
and generating LLVM, since constructors don't reference other constructors,
|
and generating LLVM, since constructors don't reference other constructors,
|
||||||
and are always generated before any function definitions.
|
and are always generated before any function definitions.
|
||||||
|
|
||||||
|
#### Visibility
|
||||||
|
Should we really be turning _all_ free variables in a function definition
|
||||||
|
into arguments? Consider the following piece of Haskell code:
|
||||||
|
|
||||||
|
```Haskell {linenos=table}
|
||||||
|
add x y = x + y
|
||||||
|
mul x y = x * y
|
||||||
|
something = mul (add 1 3) 3
|
||||||
|
```
|
||||||
|
|
||||||
|
In the definition of `something`, `mul` and `add` occur free.
|
||||||
|
A very naive lifting algorithm might be tempted to rewrite such a program
|
||||||
|
as follows:
|
||||||
|
|
||||||
|
```Haskell {linenos=table}
|
||||||
|
add x y = x + y
|
||||||
|
mul x y = x * y
|
||||||
|
something' add mul = mul (add 1 3) 3
|
||||||
|
something = something' add mul
|
||||||
|
```
|
||||||
|
|
||||||
|
But that's absurd! Not only are `add` and `mul` available globally,
|
||||||
|
but such a rewrite generates another definition with free variables,
|
||||||
|
which means we didn't really improve our program in any way. From this
|
||||||
|
example, we can see that we don't want to be turning reference to global
|
||||||
|
variables into function parameters. But how can we tell if a variable
|
||||||
|
we're trying to operate on is global or not? I propose a flag in our
|
||||||
|
`type_env`, which we'll augment to be used as a symbol table. To do
|
||||||
|
this, we update the implementation of `type_env` to map variables to
|
||||||
|
values of a struct `variable_data`:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/type_env.hpp" 13 22 >}}
|
||||||
|
|
||||||
|
The `visibility` enum is defined as follows:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/type_env.hpp" 10 10 >}}
|
||||||
|
|
||||||
|
As you can see from the above snippet, we also added a `mangled_name` field
|
||||||
|
to the new `variable_data` struct. We will be using this field shortly. We
|
||||||
|
also add a few methods to our `type_env`, and end up with the following:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/type_env.hpp" 31 44 >}}
|
||||||
|
|
||||||
|
We will come back to `find_free` and `find_free_except`, as well as
|
||||||
|
`set_mangled_name` and `get_mangled_name`. For now, we just adjust `bind` to
|
||||||
|
take a visibility parameter that defaults to `local`, and implement
|
||||||
|
`is_global`:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/type_env.cpp" 27 32 >}}
|
||||||
|
|
||||||
|
Remember the `visibility::global` in `parser.y`? This is where that comes in.
|
||||||
|
Specifically, we recall that `definition_defn::insert_types` is responsible
|
||||||
|
for placing function types into the environment, making them accessible
|
||||||
|
during typechecking later. At this time, we already need to know whether
|
||||||
|
or not the definitions are global or local (so that we can create the binding).
|
||||||
|
Thus, we add `visibility` as a parameter to `insert_types`:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/definition.hpp" 44 44 >}}
|
||||||
|
|
||||||
|
Since we are now moving from manually wrangling definitions towards using
|
||||||
|
`definition_group`, we make it so that the group itself provides this
|
||||||
|
argument. To do this, we add the `visibility` field from before to it,
|
||||||
|
and set it in the parser. One more thing: since constructors never
|
||||||
|
capture variables, we can always move them straight to the global
|
||||||
|
scope, and thus, we'll always mark them with `visibility::global`.
|
||||||
|
|
||||||
|
#### Managing Mangled Names
|
||||||
|
Just mangling names is not enough. Consider the following program:
|
||||||
|
|
||||||
|
```text {linenos=table}
|
||||||
|
defn packOne x = {
|
||||||
|
let {
|
||||||
|
data Packed a = { Pack a }
|
||||||
|
} in {
|
||||||
|
Pack x
|
||||||
|
}
|
||||||
|
}
|
||||||
|
defn packTwo x = {
|
||||||
|
let {
|
||||||
|
data Packed a = { Pack a }
|
||||||
|
} in {
|
||||||
|
Pack x
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
{{< sidenote "right" "lifting-types-note" "Lifting the data type declarations" >}}
|
||||||
|
We are actually not <em>quite</em> doing something like the following snippet.
|
||||||
|
The reason for this is that we don't mangle the names for types. I pointed
|
||||||
|
out this potential issue in a sidenote in the previous post. Since the size
|
||||||
|
of this post is already balooning, I will not deal with this issue here.
|
||||||
|
Even at the end of this post, our compiler will not be able to distinguish
|
||||||
|
between the two <code>Packed</code> types. We will hopefully get to it later.
|
||||||
|
{{< /sidenote >}} and their constructors into the global
|
||||||
|
scope gives us something like:
|
||||||
|
|
||||||
|
``` {linenos=table}
|
||||||
|
data Packed a = { Pack a }
|
||||||
|
data Packed_1 a = { Pack_1 a }
|
||||||
|
defn packOne x = { Pack x }
|
||||||
|
defn packTwo x = { Pack_1 x }
|
||||||
|
```
|
||||||
|
|
||||||
|
Notice that we had to rename one of the calls to `Pack` to be a call to
|
||||||
|
be `Pack_1`. To actually change our AST to reference `Pack_1`, we'd have
|
||||||
|
to traverse the whole tree, and make sure to keep track of definitions
|
||||||
|
that could shadow `Pack` further down. This is cumbersome. Instead, we
|
||||||
|
can mark a variable as referring to a mangled version of itself, and
|
||||||
|
access this information when needed. To do this, we add the `mangled_name`
|
||||||
|
field to the `variable_data` struct as we've seen above, and implement
|
||||||
|
the `set_mangled_name` and `get_mangled_name` methods. The former:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/type_env.cpp" 34 37 >}}
|
||||||
|
|
||||||
|
And the latter:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/type_env.cpp" 39 45 >}}
|
||||||
|
|
||||||
|
We don't allow the `set_mangled_name` to affect variables that are declared
|
||||||
|
above the receiving `type_env`, and use the empty string as a 'none' value.
|
||||||
|
Now, when lifting data type constructors, we'll be able to use
|
||||||
|
`set_mangled_name` to make sure constructor calls are made correctly. We
|
||||||
|
will also be able to use this in other cases, like the translation
|
||||||
|
of local function definitions.
|
||||||
|
|
||||||
|
#### New AST Nodes
|
||||||
|
Finally, it's time for us to add new AST nodes to our language.
|
||||||
|
Specifically, these nodes are `ast_let` (for `let/in` expressions)
|
||||||
|
and `ast_lambda` for lambda functions. We declare them as follows:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/12/ast.hpp" 131 166 >}}
|
||||||
|
|
||||||
|
In `ast_let`, the `definitions` field corresponds to the original definitions
|
||||||
|
given by the user in the program, and the `in` field corresponds to the
|
||||||
|
expression which uses these definitions. In the process of lifting, though,
|
||||||
|
we eventually transfer each of the definitions to the global scope, replacing
|
||||||
|
their right hand sides with partial applications. After this transformation,
|
||||||
|
all the data type definitions are effectively gone, and all the function
|
||||||
|
definitions are converted into the simple form `x = f a1 ... an`. We hold
|
||||||
|
these post-transformation equations in the `translated_definitions` field,
|
||||||
|
and it's them that we compile in this node's `compile` method.
|
||||||
|
|
||||||
|
In `ast_lambda`, we allow multiple parameters (like Haskell's `\x y -> x + y`).
|
||||||
|
We store these parameters in the `params` field, and we store the lambda's
|
||||||
|
expression in the `body` field. Just like `definition_defn`,
|
||||||
|
the `ast_lambda` node maintains a separate environment in which its children
|
||||||
|
have been bound, and a list of variables that occur freely in its body. The
|
||||||
|
former is used for typechecking, while the latter is used for lifting.
|
||||||
|
Finally, the `translated` field holds the lambda function's form
|
||||||
|
after its body has been transformed into a global function. Similarly to
|
||||||
|
`ast_let`, this node will be in the form `f a1 ... an`.
|
||||||
|
|
||||||
|
The
|
||||||
|
observant reader will have noticed that we have a new method: `translate`.
|
||||||
|
This is a new method for all `ast` descendants, and will implement the
|
||||||
|
steps of moving definitions to the global scope and transforming the
|
||||||
|
program. Before we get to it, though, let's quickly see the parsing
|
||||||
|
rules for `ast_let` and `ast_lambda`:
|
||||||
|
|
||||||
|
{{< codelines "text" "compiler/12/parser.y" 107 115 >}}
|
||||||
|
|
||||||
|
This is pretty similar to the rest of the grammar, so I will give this no
|
||||||
|
further explanation.
|
||||||
|
|
||||||
|
{{< todo >}}
|
||||||
|
Explain typechecking for lambda functions and let/in expressions.
|
||||||
|
{{< /todo >}}
|
||||||
|
|
||||||
|
{{< todo >}}
|
||||||
|
Explain free variable detection for lambda functions and let/in expressions.
|
||||||
|
{{< /todo >}}
|
||||||
|
|
||||||
#### Translation
|
#### Translation
|
||||||
While collecting all of the definitions into a global list, we can
|
While collecting all of the definitions into a global list, we can
|
||||||
also do some program transformations. Let's return to our earlier example:
|
also do some program transformations. Let's return to our earlier example:
|
||||||
|
|
Loading…
Reference in New Issue
Block a user