Add more content to part 12.
This commit is contained in:
parent
600d5b91ea
commit
21851e3a9c
@ -418,6 +418,178 @@ Recall that in this case, we need not have two methods for declaring
|
||||
and generating LLVM, since constructors don't reference other constructors,
|
||||
and are always generated before any function definitions.
|
||||
|
||||
#### Visibility
|
||||
Should we really be turning _all_ free variables in a function definition
|
||||
into arguments? Consider the following piece of Haskell code:
|
||||
|
||||
```Haskell {linenos=table}
|
||||
add x y = x + y
|
||||
mul x y = x * y
|
||||
something = mul (add 1 3) 3
|
||||
```
|
||||
|
||||
In the definition of `something`, `mul` and `add` occur free.
|
||||
A very naive lifting algorithm might be tempted to rewrite such a program
|
||||
as follows:
|
||||
|
||||
```Haskell {linenos=table}
|
||||
add x y = x + y
|
||||
mul x y = x * y
|
||||
something' add mul = mul (add 1 3) 3
|
||||
something = something' add mul
|
||||
```
|
||||
|
||||
But that's absurd! Not only are `add` and `mul` available globally,
|
||||
but such a rewrite generates another definition with free variables,
|
||||
which means we didn't really improve our program in any way. From this
|
||||
example, we can see that we don't want to be turning reference to global
|
||||
variables into function parameters. But how can we tell if a variable
|
||||
we're trying to operate on is global or not? I propose a flag in our
|
||||
`type_env`, which we'll augment to be used as a symbol table. To do
|
||||
this, we update the implementation of `type_env` to map variables to
|
||||
values of a struct `variable_data`:
|
||||
|
||||
{{< codelines "C++" "compiler/12/type_env.hpp" 13 22 >}}
|
||||
|
||||
The `visibility` enum is defined as follows:
|
||||
|
||||
{{< codelines "C++" "compiler/12/type_env.hpp" 10 10 >}}
|
||||
|
||||
As you can see from the above snippet, we also added a `mangled_name` field
|
||||
to the new `variable_data` struct. We will be using this field shortly. We
|
||||
also add a few methods to our `type_env`, and end up with the following:
|
||||
|
||||
{{< codelines "C++" "compiler/12/type_env.hpp" 31 44 >}}
|
||||
|
||||
We will come back to `find_free` and `find_free_except`, as well as
|
||||
`set_mangled_name` and `get_mangled_name`. For now, we just adjust `bind` to
|
||||
take a visibility parameter that defaults to `local`, and implement
|
||||
`is_global`:
|
||||
|
||||
{{< codelines "C++" "compiler/12/type_env.cpp" 27 32 >}}
|
||||
|
||||
Remember the `visibility::global` in `parser.y`? This is where that comes in.
|
||||
Specifically, we recall that `definition_defn::insert_types` is responsible
|
||||
for placing function types into the environment, making them accessible
|
||||
during typechecking later. At this time, we already need to know whether
|
||||
or not the definitions are global or local (so that we can create the binding).
|
||||
Thus, we add `visibility` as a parameter to `insert_types`:
|
||||
|
||||
{{< codelines "C++" "compiler/12/definition.hpp" 44 44 >}}
|
||||
|
||||
Since we are now moving from manually wrangling definitions towards using
|
||||
`definition_group`, we make it so that the group itself provides this
|
||||
argument. To do this, we add the `visibility` field from before to it,
|
||||
and set it in the parser. One more thing: since constructors never
|
||||
capture variables, we can always move them straight to the global
|
||||
scope, and thus, we'll always mark them with `visibility::global`.
|
||||
|
||||
#### Managing Mangled Names
|
||||
Just mangling names is not enough. Consider the following program:
|
||||
|
||||
```text {linenos=table}
|
||||
defn packOne x = {
|
||||
let {
|
||||
data Packed a = { Pack a }
|
||||
} in {
|
||||
Pack x
|
||||
}
|
||||
}
|
||||
defn packTwo x = {
|
||||
let {
|
||||
data Packed a = { Pack a }
|
||||
} in {
|
||||
Pack x
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
{{< sidenote "right" "lifting-types-note" "Lifting the data type declarations" >}}
|
||||
We are actually not <em>quite</em> doing something like the following snippet.
|
||||
The reason for this is that we don't mangle the names for types. I pointed
|
||||
out this potential issue in a sidenote in the previous post. Since the size
|
||||
of this post is already balooning, I will not deal with this issue here.
|
||||
Even at the end of this post, our compiler will not be able to distinguish
|
||||
between the two <code>Packed</code> types. We will hopefully get to it later.
|
||||
{{< /sidenote >}} and their constructors into the global
|
||||
scope gives us something like:
|
||||
|
||||
``` {linenos=table}
|
||||
data Packed a = { Pack a }
|
||||
data Packed_1 a = { Pack_1 a }
|
||||
defn packOne x = { Pack x }
|
||||
defn packTwo x = { Pack_1 x }
|
||||
```
|
||||
|
||||
Notice that we had to rename one of the calls to `Pack` to be a call to
|
||||
be `Pack_1`. To actually change our AST to reference `Pack_1`, we'd have
|
||||
to traverse the whole tree, and make sure to keep track of definitions
|
||||
that could shadow `Pack` further down. This is cumbersome. Instead, we
|
||||
can mark a variable as referring to a mangled version of itself, and
|
||||
access this information when needed. To do this, we add the `mangled_name`
|
||||
field to the `variable_data` struct as we've seen above, and implement
|
||||
the `set_mangled_name` and `get_mangled_name` methods. The former:
|
||||
|
||||
{{< codelines "C++" "compiler/12/type_env.cpp" 34 37 >}}
|
||||
|
||||
And the latter:
|
||||
|
||||
{{< codelines "C++" "compiler/12/type_env.cpp" 39 45 >}}
|
||||
|
||||
We don't allow the `set_mangled_name` to affect variables that are declared
|
||||
above the receiving `type_env`, and use the empty string as a 'none' value.
|
||||
Now, when lifting data type constructors, we'll be able to use
|
||||
`set_mangled_name` to make sure constructor calls are made correctly. We
|
||||
will also be able to use this in other cases, like the translation
|
||||
of local function definitions.
|
||||
|
||||
#### New AST Nodes
|
||||
Finally, it's time for us to add new AST nodes to our language.
|
||||
Specifically, these nodes are `ast_let` (for `let/in` expressions)
|
||||
and `ast_lambda` for lambda functions. We declare them as follows:
|
||||
|
||||
{{< codelines "C++" "compiler/12/ast.hpp" 131 166 >}}
|
||||
|
||||
In `ast_let`, the `definitions` field corresponds to the original definitions
|
||||
given by the user in the program, and the `in` field corresponds to the
|
||||
expression which uses these definitions. In the process of lifting, though,
|
||||
we eventually transfer each of the definitions to the global scope, replacing
|
||||
their right hand sides with partial applications. After this transformation,
|
||||
all the data type definitions are effectively gone, and all the function
|
||||
definitions are converted into the simple form `x = f a1 ... an`. We hold
|
||||
these post-transformation equations in the `translated_definitions` field,
|
||||
and it's them that we compile in this node's `compile` method.
|
||||
|
||||
In `ast_lambda`, we allow multiple parameters (like Haskell's `\x y -> x + y`).
|
||||
We store these parameters in the `params` field, and we store the lambda's
|
||||
expression in the `body` field. Just like `definition_defn`,
|
||||
the `ast_lambda` node maintains a separate environment in which its children
|
||||
have been bound, and a list of variables that occur freely in its body. The
|
||||
former is used for typechecking, while the latter is used for lifting.
|
||||
Finally, the `translated` field holds the lambda function's form
|
||||
after its body has been transformed into a global function. Similarly to
|
||||
`ast_let`, this node will be in the form `f a1 ... an`.
|
||||
|
||||
The
|
||||
observant reader will have noticed that we have a new method: `translate`.
|
||||
This is a new method for all `ast` descendants, and will implement the
|
||||
steps of moving definitions to the global scope and transforming the
|
||||
program. Before we get to it, though, let's quickly see the parsing
|
||||
rules for `ast_let` and `ast_lambda`:
|
||||
|
||||
{{< codelines "text" "compiler/12/parser.y" 107 115 >}}
|
||||
|
||||
This is pretty similar to the rest of the grammar, so I will give this no
|
||||
further explanation.
|
||||
|
||||
{{< todo >}}
|
||||
Explain typechecking for lambda functions and let/in expressions.
|
||||
{{< /todo >}}
|
||||
|
||||
{{< todo >}}
|
||||
Explain free variable detection for lambda functions and let/in expressions.
|
||||
{{< /todo >}}
|
||||
|
||||
#### Translation
|
||||
While collecting all of the definitions into a global list, we can
|
||||
also do some program transformations. Let's return to our earlier example:
|
||||
|
Loading…
Reference in New Issue
Block a user