Work on writing up the rest of part 8 in compiler series
This commit is contained in:
parent
2994f8983d
commit
1a8a1c3052
|
@ -54,7 +54,6 @@ a `Module` object, which represents some collection of code and declarations
|
||||||
|
|
||||||
{{< codeblock "C++" "compiler/08/llvm_context.hpp" >}}
|
{{< codeblock "C++" "compiler/08/llvm_context.hpp" >}}
|
||||||
|
|
||||||
{{< todo >}} Consistently name context / state.{{< /todo >}}
|
|
||||||
{{< todo >}} Explain creation functions. {{< /todo >}}
|
{{< todo >}} Explain creation functions. {{< /todo >}}
|
||||||
|
|
||||||
We include the LLVM context, builder, and module as members
|
We include the LLVM context, builder, and module as members
|
||||||
|
@ -82,35 +81,58 @@ an `llvm::LinkageType`, the name of the function, and the module
|
||||||
in which the function is declared. Since we only have one
|
in which the function is declared. Since we only have one
|
||||||
module (the one we initialized in the constructor) that's
|
module (the one we initialized in the constructor) that's
|
||||||
the module we pass in. The name of the function is the same
|
the module we pass in. The name of the function is the same
|
||||||
as its name in the runtime, and the linkage type is always
|
as its name in the runtime. The linkage type is a little
|
||||||
external. The only remaining parameter is
|
more complicated - it tells LLVM the "visibility" of a function.
|
||||||
the `llvm::FunctionType`, which is created using code like:
|
"Private" or "Internal" would hide this function from the linker
|
||||||
|
(like `static` functions in C). However, we want to do the opposite: our
|
||||||
|
generated functions should be accessible from other code.
|
||||||
|
Thus, our linkage type is "External".
|
||||||
|
|
||||||
{{< todo >}} Why external? {{< /todo >}}
|
The only remaining parameter is the `llvm::FunctionType`, which
|
||||||
|
is created using code like:
|
||||||
|
|
||||||
```C++
|
```C++
|
||||||
llvm::FunctionType::get(return_type, {param_type_1, param_type_2, ...}, is_variadic)
|
llvm::FunctionType::get(return_type, {param_type_1, param_type_2, ...}, is_variadic)
|
||||||
```
|
```
|
||||||
|
|
||||||
Declaring all the functions and types in our runtime is mostly
|
Declaring all the functions and types in our runtime is mostly
|
||||||
just tedious. Here are a few lines from `create_types()`, from
|
just tedious. Here are a few lines from `create_functions()`, which
|
||||||
|
give a very good idea of the rest of that method:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/08/llvm_context.cpp" 47 60 >}}
|
||||||
|
|
||||||
|
Similarly, here are a few lines from `create_types()`, from
|
||||||
which you can extrapolate the rest:
|
which you can extrapolate the rest:
|
||||||
|
|
||||||
{{< codelines "C++" "compiler/08/llvm_context.cpp" 7 11 >}}
|
{{< codelines "C++" "compiler/08/llvm_context.cpp" 7 11 >}}
|
||||||
|
|
||||||
{{< todo >}} Also show struct body setters. {{< /todo >}}
|
We also tell LLVM the contents of our structs, so that
|
||||||
|
we may later reference specific fields. This is just like
|
||||||
|
forward declaration - we can forward declare a struct
|
||||||
|
in C/C++, but unless we also declare its contents,
|
||||||
|
we can't access what's inside. Below is the code
|
||||||
|
for specifying the body of `node_base` and `node_app`.
|
||||||
|
|
||||||
Similarly, here are a few lines from `create_functions()`, which
|
{{< codelines "C++" "compiler/08/llvm_context.cpp" 19 26 >}}
|
||||||
give a very good idea of the rest of that method:
|
|
||||||
|
|
||||||
{{< codelines "C++" "compiler/08/llvm_context.cpp" 20 27 >}}
|
There's still more functionality packed into `llvm_context`.
|
||||||
|
Let's next take a look into `custom_function`, and
|
||||||
|
the `create_custom_function` method. Why do we need
|
||||||
|
these?
|
||||||
|
|
||||||
This completes our implementation of the context.
|
This isn't the end of our `llvm_context` class: it also
|
||||||
|
has a variety of `create_*` methods! Let's take a look
|
||||||
|
at their signatures. Most return either `void`,
|
||||||
|
`llvm::ConstantInt*`, or `llvm::Value*`. Since
|
||||||
|
`llvm::ConstantInt*` is a subclass of `llvm::Value*`, let's
|
||||||
|
just treat it as simply an `llvm::Value*` while trying
|
||||||
|
to understand these methods.
|
||||||
|
|
||||||
|
So, what is `llvm::Value`? To answer this question, let's
|
||||||
|
first understand how the LLVM IR works.
|
||||||
|
|
||||||
### LLVM IR
|
### LLVM IR
|
||||||
It's now time to look at generating actual code for each G-machine instruction.
|
An important property of LLVM IR is that it is in __Single Static Assignment__
|
||||||
Before we do this, we need to get a little bit of an understanding of what LLVM
|
|
||||||
IR is like. An important property of LLVM IR is that it is in __Single Static Assignment__
|
|
||||||
(SSA) form. This means that each variable can only be assigned to once. For instance,
|
(SSA) form. This means that each variable can only be assigned to once. For instance,
|
||||||
if we use `<-` to represent assignment, the following program is valid:
|
if we use `<-` to represent assignment, the following program is valid:
|
||||||
|
|
||||||
|
@ -140,13 +162,26 @@ x2 <- x1 + 1
|
||||||
In practice, LLVM's C++ API can take care of versioning variables on its own, by
|
In practice, LLVM's C++ API can take care of versioning variables on its own, by
|
||||||
auto-incrementing numbers associated with each variable we use.
|
auto-incrementing numbers associated with each variable we use.
|
||||||
|
|
||||||
We need not get too deep into the specifics of LLVM IR's textual
|
Assigned to each variable is `llvm::Value`. The LLVM documentation states:
|
||||||
representation, since we will largely be working with the C++
|
|
||||||
API to interact with it. We do, however, need to understand one more
|
> It is the base class of all values computed by a program that may be used as operands to other values.
|
||||||
concept from the world of compiler design: __basic blocks__. A basic
|
|
||||||
block is a sequence of instructions that are guaranteed to be executed
|
It's important to understand that `llvm::Value` __does not store the result of the computation__.
|
||||||
one after another. This means that a basic block cannot have
|
It rather represents how something may be computed. 1 is a value because it computed by
|
||||||
an if/else, jump, or any other type of control flow anywhere
|
just returning. `x + 1` is a value because it is computed by adding the value inside of
|
||||||
|
`x` to 1. Since we cannot modify a variable once we've declared it, we will
|
||||||
|
keep assigning intermediate results to new variables, constructing new values
|
||||||
|
out of values that we've already specified.
|
||||||
|
|
||||||
|
This somewhat elucidates what the `create_*` functions do: `create_i8` creates an 8-bit integer
|
||||||
|
value, and `create_pop` creates a value that is computed by calling
|
||||||
|
our runtime `stack_pop` function.
|
||||||
|
|
||||||
|
Before we move on to look at the implementations of these functions,
|
||||||
|
we need to understand another concept from the world of compiler design:
|
||||||
|
__basic blocks__. A basic block is a sequence of instructions that
|
||||||
|
are guaranteed to be executed one after another. This means that a
|
||||||
|
basic block cannot have an if/else, jump, or any other type of control flow anywhere
|
||||||
except at the end. If control flow could appear inside the basic block,
|
except at the end. If control flow could appear inside the basic block,
|
||||||
there would be opporunity for execution of some, but not all,
|
there would be opporunity for execution of some, but not all,
|
||||||
instructions in the block, violating the definition. Every time
|
instructions in the block, violating the definition. Every time
|
||||||
|
@ -155,7 +190,74 @@ Writing control flow involves creating several blocks, with each
|
||||||
block serving as the destination of a potential jump. We will
|
block serving as the destination of a potential jump. We will
|
||||||
see this used to compile the Jump instruction.
|
see this used to compile the Jump instruction.
|
||||||
|
|
||||||
### Generating LLVM
|
### Generating LLVM IR
|
||||||
|
Now that we understand what `llvm::Value` is, and have a vague
|
||||||
|
understanding of how LLVM is structured, let's take a look at
|
||||||
|
the implementations of the `create_*` functions. The simplest
|
||||||
|
is `create_i8`:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/08/llvm_context.cpp" 150 152 >}}
|
||||||
|
|
||||||
|
Not much to see here. We create an instance of the `llvm::ConstantInt` class,
|
||||||
|
from the actual integer given to the method. As we said before,
|
||||||
|
`llvm::ConstantInt` is a subclass of `llvm::Value`. Next up, let's look
|
||||||
|
at `create_pop`:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/08/llvm_context.cpp" 160 163 >}}
|
||||||
|
|
||||||
|
We first retrieve an `llvm::Function` associated with `stack_pop`
|
||||||
|
from our map, and then use `llvm::IRBuilder::CreateCall` to insert
|
||||||
|
a value that represents a function call into the currently
|
||||||
|
selected basic block (the builder's state is what
|
||||||
|
dictates what the "selected basic block" is). `CreateCall`
|
||||||
|
takes as parameters the function we want to call (`stack_pop`,
|
||||||
|
which we store into the `pop_f` variable), as well as the arguments
|
||||||
|
to the function (for which we pass `f->arg_begin()`).
|
||||||
|
|
||||||
|
Hold on. What the heck is `arg_begin()`? Why do we take a function
|
||||||
|
as a paramter to this method? The answer is fairly simple: this
|
||||||
|
method is used when we are
|
||||||
|
generating a function with signature `void f_(struct stack* s)`
|
||||||
|
(we discussed the signature in the previous post). The
|
||||||
|
parameter that we give to `create_pop` is this function we're
|
||||||
|
generating, and `arg_begin()` gets the value that represents
|
||||||
|
the first parameter to our function - `s`! Since `stack_pop`
|
||||||
|
takes a stack, we need to give it the stack we're working on,
|
||||||
|
and so we use `f->arg_begin()` to access it.
|
||||||
|
|
||||||
|
Most of the other functions follow this exact pattern, with small
|
||||||
|
deviations. However, another function uses a more complicated LLVM
|
||||||
|
instruction:
|
||||||
|
|
||||||
|
{{< codelines "C++" "compiler/08/llvm_context.cpp" 202 209 >}}
|
||||||
|
|
||||||
|
`unwrap_num` is used to cast a given node pointer to a pointer
|
||||||
|
to a number node, and then return the integer value from
|
||||||
|
that number node. It starts fairly innocently: we ask
|
||||||
|
LLVM for the type of a pointer to a `node_num` struct,
|
||||||
|
and then use `CreatePointerCast` to create a value
|
||||||
|
that is the same node pointer we're given, but now interpreted
|
||||||
|
as a number node pointer. We now have to access
|
||||||
|
the `value` field of our node. `CreateGEP` helps us with
|
||||||
|
this: given a pointer to a node, and two offsets
|
||||||
|
`n` and `k`, it effectively performs the following:
|
||||||
|
|
||||||
|
```C++
|
||||||
|
&(num_pointer[n]->kth_field)
|
||||||
|
```
|
||||||
|
|
||||||
|
The first offset, then, gives an index into the "array"
|
||||||
|
represented by the pointer, while the second offset
|
||||||
|
gives the index of the field we want to access. We
|
||||||
|
want to dereference the pointer (`num_pointer[0]`),
|
||||||
|
and we want the second field (`1`, when counting from 0).
|
||||||
|
Thus, we call CreateGEP with these offsets and our pointers.
|
||||||
|
|
||||||
|
This still leaves us with a pointer to a number, rather
|
||||||
|
than the number itself. To dereference the pointer, we use
|
||||||
|
`CreateLoad`. This gives us the value of the number node,
|
||||||
|
which we promptly return.
|
||||||
|
|
||||||
Let's envision a `gen_llvm` method on the `instruction` struct.
|
Let's envision a `gen_llvm` method on the `instruction` struct.
|
||||||
We need access to all the other functions from our runtime,
|
We need access to all the other functions from our runtime,
|
||||||
such as `stack_init`, and functions from our program such
|
such as `stack_init`, and functions from our program such
|
||||||
|
@ -187,5 +289,3 @@ virtual void gen_llvm(llvm_context&, llvm::Function*) const;
|
||||||
```
|
```
|
||||||
|
|
||||||
{{< todo >}} Fix pointer type inconsistencies. {{< /todo >}}
|
{{< todo >}} Fix pointer type inconsistencies. {{< /todo >}}
|
||||||
{{< todo >}} Create + backport Pop instruction {{< /todo >}}
|
|
||||||
{{< todo >}} Explain forcing normal evaluation in binary operator {{< /todo >}}
|
|
||||||
|
|
Loading…
Reference in New Issue
Block a user