|
|
|
@ -54,8 +54,6 @@ a `Module` object, which represents some collection of code and declarations
|
|
|
|
|
|
|
|
|
|
{{< codeblock "C++" "compiler/08/llvm_context.hpp" >}}
|
|
|
|
|
|
|
|
|
|
{{< todo >}} Explain creation functions. {{< /todo >}}
|
|
|
|
|
|
|
|
|
|
We include the LLVM context, builder, and module as members
|
|
|
|
|
of the context struct. Since the builder and the module need
|
|
|
|
|
the context, we initialize them in the constructor, where they
|
|
|
|
@ -118,10 +116,46 @@ for specifying the body of `node_base` and `node_app`.
|
|
|
|
|
There's still more functionality packed into `llvm_context`.
|
|
|
|
|
Let's next take a look into `custom_function`, and
|
|
|
|
|
the `create_custom_function` method. Why do we need
|
|
|
|
|
these?
|
|
|
|
|
these? To highlight the need for the custom class,
|
|
|
|
|
let's take a look at `instruction_pushglobal` which
|
|
|
|
|
occurs at the G-machine level, and then at `alloc_global`,
|
|
|
|
|
which will be a function call generated as part of
|
|
|
|
|
the PushGlobal instruction. `instruction_pushglobal`'s
|
|
|
|
|
only member variable is `name`, which stands for
|
|
|
|
|
the name of the global function it's referencing. However,
|
|
|
|
|
`alloc_global` requires an arity argument! We can
|
|
|
|
|
try to get this information from the `llvm::Function`
|
|
|
|
|
corresponding to the global we're trying to reference,
|
|
|
|
|
but this doesn't get us anywhere: as far as LLVM
|
|
|
|
|
is concerned, any global function only takes one
|
|
|
|
|
parameter, the stack. The rest of the parameters
|
|
|
|
|
are given through that stack, and their number cannot
|
|
|
|
|
be easily deduced from the function alone.
|
|
|
|
|
|
|
|
|
|
Instead, we decide to store global functions together
|
|
|
|
|
with their arity. We thus create a class to combine
|
|
|
|
|
these two things (`custom_function`), define
|
|
|
|
|
a map from global function names to instances
|
|
|
|
|
of `custom_function`, and add a convenience method
|
|
|
|
|
(`create_custom_function`) that takes care of
|
|
|
|
|
constructing an `llvm::Function` object, creating
|
|
|
|
|
a `custom_function`, and storing it in the map.
|
|
|
|
|
|
|
|
|
|
The implementation for `custom_function` is
|
|
|
|
|
straightforward:
|
|
|
|
|
|
|
|
|
|
{{< codelines "C++" "compiler/08/llvm_context.cpp" 234 252 >}}
|
|
|
|
|
|
|
|
|
|
We create a function type, then a function, and finally
|
|
|
|
|
initialize a `custom_function`. There's one thing
|
|
|
|
|
we haven't seen yet in this function, which is the
|
|
|
|
|
`BasicBlock` class. We'll get to what basic blocks
|
|
|
|
|
are shortly, but for now it's sufficient to
|
|
|
|
|
know that the basic block gives us a place to
|
|
|
|
|
insert code.
|
|
|
|
|
|
|
|
|
|
This isn't the end of our `llvm_context` class: it also
|
|
|
|
|
has a variety of `create_*` methods! Let's take a look
|
|
|
|
|
has a variety of other `create_*` methods! Let's take a look
|
|
|
|
|
at their signatures. Most return either `void`,
|
|
|
|
|
`llvm::ConstantInt*`, or `llvm::Value*`. Since
|
|
|
|
|
`llvm::ConstantInt*` is a subclass of `llvm::Value*`, let's
|
|
|
|
@ -168,7 +202,7 @@ Assigned to each variable is `llvm::Value`. The LLVM documentation states:
|
|
|
|
|
|
|
|
|
|
It's important to understand that `llvm::Value` __does not store the result of the computation__.
|
|
|
|
|
It rather represents how something may be computed. 1 is a value because it computed by
|
|
|
|
|
just returning. `x + 1` is a value because it is computed by adding the value inside of
|
|
|
|
|
just returning 1. `x + 1` is a value because it is computed by adding the value inside of
|
|
|
|
|
`x` to 1. Since we cannot modify a variable once we've declared it, we will
|
|
|
|
|
keep assigning intermediate results to new variables, constructing new values
|
|
|
|
|
out of values that we've already specified.
|
|
|
|
@ -251,36 +285,27 @@ represented by the pointer, while the second offset
|
|
|
|
|
gives the index of the field we want to access. We
|
|
|
|
|
want to dereference the pointer (`num_pointer[0]`),
|
|
|
|
|
and we want the second field (`1`, when counting from 0).
|
|
|
|
|
Thus, we call CreateGEP with these offsets and our pointers.
|
|
|
|
|
Thus, we call `CreateGEP` with these offsets and our pointers.
|
|
|
|
|
|
|
|
|
|
This still leaves us with a pointer to a number, rather
|
|
|
|
|
than the number itself. To dereference the pointer, we use
|
|
|
|
|
`CreateLoad`. This gives us the value of the number node,
|
|
|
|
|
which we promptly return.
|
|
|
|
|
|
|
|
|
|
Let's envision a `gen_llvm` method on the `instruction` struct.
|
|
|
|
|
We need access to all the other functions from our runtime,
|
|
|
|
|
such as `stack_init`, and functions from our program such
|
|
|
|
|
as `f_custom_function`. Thus, we need access to our
|
|
|
|
|
`llvm_context`. The current basic block is part
|
|
|
|
|
of the builder, which is part of the context, so that's
|
|
|
|
|
also taken care of. There's only one more thing that we will
|
|
|
|
|
need, and that's access to the `llvm::Function` that's
|
|
|
|
|
currently being compiled. To understand why, consider
|
|
|
|
|
the signature of `f_main` from the previous post:
|
|
|
|
|
This concludes our implementation of the `llvm_context` -
|
|
|
|
|
it's time to move on to the G-machine instructions.
|
|
|
|
|
|
|
|
|
|
```C
|
|
|
|
|
void f_main(struct stack*);
|
|
|
|
|
```
|
|
|
|
|
### G-machine Instructions to LLVM IR
|
|
|
|
|
|
|
|
|
|
The function takes a stack as a parameter. What if
|
|
|
|
|
we want to try use this stack in a method call, like
|
|
|
|
|
`stack_push(s, node)`? We need to have access to the
|
|
|
|
|
LLVM representation of the stack parameter. The easiest
|
|
|
|
|
way to do this is to use `llvm::Function::arg_begin()`,
|
|
|
|
|
which gives the first argument of the function. We thus
|
|
|
|
|
carry the function pointer throughout our code generation
|
|
|
|
|
methods.
|
|
|
|
|
Let's now envision a `gen_llvm` method on the `instruction` struct,
|
|
|
|
|
which will turn the still-abstract G-machine instruction
|
|
|
|
|
into tangible, close-to-metal LLVM IR. As we've seen
|
|
|
|
|
in our implementation of `llvm_context`, to access the stack, we need access to the first
|
|
|
|
|
argument of the function we're generating. Thus, we need this method
|
|
|
|
|
to accept the function whose instructions are
|
|
|
|
|
being converted to LLVM. We also pass in the
|
|
|
|
|
`llvm_context`, since it contains the LLVM builder,
|
|
|
|
|
context, module, and a map of globally declared functions.
|
|
|
|
|
|
|
|
|
|
With these things in mind, here's the signature for `gen_llvm`:
|
|
|
|
|
|
|
|
|
@ -288,4 +313,267 @@ With these things in mind, here's the signature for `gen_llvm`:
|
|
|
|
|
virtual void gen_llvm(llvm_context&, llvm::Function*) const;
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
{{< todo >}} Fix pointer type inconsistencies. {{< /todo >}}
|
|
|
|
|
Let's get right to it! `instruction_pushint` gives us an easy
|
|
|
|
|
start:
|
|
|
|
|
|
|
|
|
|
{{< codelines "C++" "compiler/08/instruction.cpp" 17 19 >}}
|
|
|
|
|
|
|
|
|
|
We create an LLVM integer constant with the value of
|
|
|
|
|
our integer, and push it onto the stack.
|
|
|
|
|
|
|
|
|
|
`instruction_push` is equally terse:
|
|
|
|
|
|
|
|
|
|
{{< codelines "C++" "compiler/08/instruction.cpp" 37 39 >}}
|
|
|
|
|
|
|
|
|
|
We simply peek at the value of the stack at the given
|
|
|
|
|
offset (an integer of the same size as `size_t`, which
|
|
|
|
|
we create using `create_size`). Once we have the
|
|
|
|
|
result of the peek, we push it onto the stack.
|
|
|
|
|
|
|
|
|
|
`instruction_pushglobal` is more involved. Let's take a look:
|
|
|
|
|
|
|
|
|
|
{{< codelines "C++" "compiler/08/instruction.cpp" 26 30 >}}
|
|
|
|
|
|
|
|
|
|
First, we retrive the `custom_function` associated with
|
|
|
|
|
the given global name. We then create an LLVM integer
|
|
|
|
|
constant representing the arity of the function,
|
|
|
|
|
and then push onto the stack the result of `alloc_global`,
|
|
|
|
|
giving it the function and arity just like it expects.
|
|
|
|
|
|
|
|
|
|
`instruction_pop` is also short, and doesn't require much
|
|
|
|
|
further explanation:
|
|
|
|
|
|
|
|
|
|
{{< codelines "C++" "compiler/08/instruction.cpp" 46 48 >}}
|
|
|
|
|
|
|
|
|
|
Some other instructions, such as `instruction_update`,
|
|
|
|
|
`instruction_pack`, `instruction_split`, `instruction_slide`,
|
|
|
|
|
`instruction_alloc` and `instruction_eval` are equally as simple,
|
|
|
|
|
and we omit them for the purpose of brevity.
|
|
|
|
|
|
|
|
|
|
What remains are two "meaty" functions, `instruction_jump` and
|
|
|
|
|
`instruction_binop`. Let's start with the former:
|
|
|
|
|
|
|
|
|
|
{{< codelines "C++" "compiler/08/instruction.cpp" 101 123 >}}
|
|
|
|
|
|
|
|
|
|
This is the one and only function in which we have to take
|
|
|
|
|
care of control flow. Conceptually, depending on the tag
|
|
|
|
|
of the `node_data` at the top of the stack, we want
|
|
|
|
|
to pick one of many branches and jump to it.
|
|
|
|
|
As we discussed, a basic block has to be executed in
|
|
|
|
|
its entirety; since the branches of a case expression
|
|
|
|
|
are mutually exclusive (only one of them is executed in any given case),
|
|
|
|
|
we have to create a separate basic block for each branch.
|
|
|
|
|
Given these blocks, we then want to branch to the correct one
|
|
|
|
|
using the tag of the node on top of the stack.
|
|
|
|
|
|
|
|
|
|
This is exactly what we do in this function. We first peek
|
|
|
|
|
at the node on top of the stack, and use `CreateGEP` through
|
|
|
|
|
`unwrap_data_tag` to get access to its tag. What we then
|
|
|
|
|
need is LLVM's switch instruction, created using `CreateSwitch`.
|
|
|
|
|
We must provide the switch with a "default" case in case
|
|
|
|
|
the tag value is something we don't recognize. To do this,
|
|
|
|
|
we create a "safety" `BasicBlock`. With this new safety
|
|
|
|
|
block in hand, we're able to call `CreateSwitch`, giving it
|
|
|
|
|
the tag value to switch on, the safety block to default to,
|
|
|
|
|
and the expected number of branches (to optimize memory allocation).
|
|
|
|
|
|
|
|
|
|
Next, we create a vector of blocks, and for each branch,
|
|
|
|
|
we append to it a corresponding block `branch_block`, into
|
|
|
|
|
which we insert the LLVM IR corresponding to the
|
|
|
|
|
instructions of the branch. No matter the branch we take,
|
|
|
|
|
we eventually want to come back to the same basic block,
|
|
|
|
|
which will perform the usual function cleanup via Update and Slide.
|
|
|
|
|
We re-use the safety block for this, and use `CreateBr` at the
|
|
|
|
|
end of each `branch_block` to perform an unconditional jump.
|
|
|
|
|
|
|
|
|
|
After we create each of the blocks, we use the `tag_mappings`
|
|
|
|
|
to add cases to the switch instruction, using `addCase`. Finally,
|
|
|
|
|
we set the builder's insertion point to the safety block,
|
|
|
|
|
meaning that the next instructions will insert their
|
|
|
|
|
LLVM IR into that block. Since we have all branches
|
|
|
|
|
jump to the safety block at the end, this means that
|
|
|
|
|
no matter which branch we take in the case expression,
|
|
|
|
|
we will still execute the subsequent instructions as expected.
|
|
|
|
|
|
|
|
|
|
Let's now look at `instruction_binop`:
|
|
|
|
|
|
|
|
|
|
{{< codelines "C++" "compiler/08/instruction.cpp" 139 150 >}}
|
|
|
|
|
|
|
|
|
|
In this instruction, we pop and unwrap two integers from
|
|
|
|
|
the stack (assuming they are integers). Depending on
|
|
|
|
|
the type of operation the instruction is set to, we
|
|
|
|
|
then push the result of the corresponding LLVM
|
|
|
|
|
instruction. `PLUS` calls LLVM's `CreateAdd` to insert
|
|
|
|
|
addition, `MINUS` calls `CreateSub`, and so on. No matter
|
|
|
|
|
what the operation was, we push the result onto the stack.
|
|
|
|
|
|
|
|
|
|
That's all for our instructions! We're so very close now. Let's
|
|
|
|
|
move on to compiling definitions.
|
|
|
|
|
|
|
|
|
|
### Definitions to LLVM IR
|
|
|
|
|
As with typechecking, to allow for mutually recursive functions,
|
|
|
|
|
we need to be able each global function from any other function.
|
|
|
|
|
We then take the same approah as before, going in two passes.
|
|
|
|
|
This leads to two new methods for `definition`:
|
|
|
|
|
|
|
|
|
|
```C++
|
|
|
|
|
virtual void gen_llvm_first(llvm_context& ctx) = 0;
|
|
|
|
|
virtual void gen_llvm_second(llvm_context& ctx) = 0;
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The first pass is intended to register all functions into
|
|
|
|
|
the `llvm_context`, making them visible to other functions.
|
|
|
|
|
The second pass is used to actually generate the code for
|
|
|
|
|
each function, now having access to all the other global
|
|
|
|
|
functions. Let's see the implementation for `gen_llvm_first`
|
|
|
|
|
for `definition_defn`:
|
|
|
|
|
|
|
|
|
|
{{< codelines "C++" "compiler/08/definition.cpp" 58 60 >}}
|
|
|
|
|
|
|
|
|
|
Since `create_custom_function` already creates a function
|
|
|
|
|
__and__ registers it with `llvm_context`, this is
|
|
|
|
|
all we need. Note that we created a new member variable
|
|
|
|
|
for `definition_defn` which stores this newly created
|
|
|
|
|
function. In the second pass, we will populate this
|
|
|
|
|
function with LLVM IR from the definition's instructions.
|
|
|
|
|
|
|
|
|
|
We actually create functions for each of the constructors
|
|
|
|
|
of data types, but they're quite special: all they do is
|
|
|
|
|
pack their arguments! Since they don't need access to
|
|
|
|
|
the other global functions, we might as well create
|
|
|
|
|
their bodies then and there:
|
|
|
|
|
|
|
|
|
|
{{< codelines "C++" "compiler/08/definition.cpp" 101 112 >}}
|
|
|
|
|
|
|
|
|
|
Like in `definition_defn`, we use `create_custom_function`.
|
|
|
|
|
However, we then use `SetInsertPoint` to configure our builder to insert code into
|
|
|
|
|
the newly created function (which already has a `BasicBlock`,
|
|
|
|
|
thanks to that one previously unexplained line in `create_custom_function`!).
|
|
|
|
|
Since we decided to only include the Pack instruction, we generate
|
|
|
|
|
a call to it directly using `create_pack`. We follow this
|
|
|
|
|
up with `CreateRetVoid`, which tells LLVM that this is
|
|
|
|
|
the end of the function, and that it is now safe to return
|
|
|
|
|
from it.
|
|
|
|
|
|
|
|
|
|
Great! We now implement the second pass of `gen_llvm`. In
|
|
|
|
|
the case of `definition_defn`, we do almost exactly
|
|
|
|
|
what we did in the first pass of `definition_data`:
|
|
|
|
|
|
|
|
|
|
{{< codelines "C++" "compiler/08/definition.cpp" 62 68 >}}
|
|
|
|
|
|
|
|
|
|
As for `definition_data`, we have nothing to do in the
|
|
|
|
|
second pass. We're done!
|
|
|
|
|
|
|
|
|
|
### Getting Results
|
|
|
|
|
We're almost there. Two things remain. The first: our implementation
|
|
|
|
|
of `ast_binop`, implement each binary operation as simply a function call:
|
|
|
|
|
`+` calls `f_plus`, and so on. But so far, we have not implemented
|
|
|
|
|
`f_plus`, or any other binary operator function. We do this
|
|
|
|
|
in `main.cpp`, creating a function `gen_llvm_internal_op`:
|
|
|
|
|
|
|
|
|
|
{{< codelines "C++" "compiler/08/main.cpp" 70 83 >}}
|
|
|
|
|
|
|
|
|
|
We create a simple function body. We then append G-machine
|
|
|
|
|
instructions that take each argument, evaluate it,
|
|
|
|
|
and then perform the corresponding binary operation.
|
|
|
|
|
With these instructions in the body, we insert
|
|
|
|
|
them into a new function, just like we did in our code
|
|
|
|
|
for `definition_defn` and `definition_data`.
|
|
|
|
|
|
|
|
|
|
Finally, we write our `gen_llvm` function that we will
|
|
|
|
|
call from `main`:
|
|
|
|
|
|
|
|
|
|
{{< codelines "C++" "compiler/08/main.cpp" 125 141 >}}
|
|
|
|
|
|
|
|
|
|
It first creates the functions for
|
|
|
|
|
`+`, `-`, `*`, and `/`. Then, it calls the first
|
|
|
|
|
pass of `gen_llvm` on all definitions, followed
|
|
|
|
|
by the second pass. Lastly, it uses LLVM's built-in
|
|
|
|
|
functionality to print out the generated IR in
|
|
|
|
|
our module, and then uses a function `output_llvm`
|
|
|
|
|
to create an object file ready for linking.
|
|
|
|
|
|
|
|
|
|
To be very honest, I took the `output_llvm` function
|
|
|
|
|
almost entirely from instructional material for my university's
|
|
|
|
|
compilers course. The gist of it, though, is: we determine
|
|
|
|
|
the target architecture and platform, specify a "generic" CPU,
|
|
|
|
|
create a default set of options, and then generate an object file.
|
|
|
|
|
Here it is:
|
|
|
|
|
|
|
|
|
|
{{< codelines "C++" "compiler/08/main.cpp" 85 123 >}}
|
|
|
|
|
|
|
|
|
|
We now add a `generate_llvm` call to `main`.
|
|
|
|
|
|
|
|
|
|
Are we there?
|
|
|
|
|
|
|
|
|
|
Let's try to compile our first example, `works1.txt`. The
|
|
|
|
|
file:
|
|
|
|
|
|
|
|
|
|
{{< rawblock "compiler/08/examples/works1.txt" >}}
|
|
|
|
|
|
|
|
|
|
We run the following commands in our build directory:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
./compiler < ../examples/work1.txt
|
|
|
|
|
gcc -no-pie main.c progrma.o
|
|
|
|
|
./a.out
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Nothing happens. How anticlimactic! Our runtime has no way of
|
|
|
|
|
printing out the result of the evaluation. Let's change that:
|
|
|
|
|
|
|
|
|
|
{{< codelines "C++" "compiler/08/runtime.c" 157 183 >}}
|
|
|
|
|
|
|
|
|
|
Rerunning our commands, we get:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
Result: 326
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The correct result! Let's try it with `works2.txt`:
|
|
|
|
|
|
|
|
|
|
{{< rawblock "compiler/08/examples/works2.txt" >}}
|
|
|
|
|
|
|
|
|
|
And again, we get the right answer:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
Result: 326
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
This is child's play, though. Let's try with something
|
|
|
|
|
more complicated, like `works3.txt`:
|
|
|
|
|
|
|
|
|
|
{{< rawblock "compiler/08/examples/works3.txt" >}}
|
|
|
|
|
|
|
|
|
|
Once again, our program does exactly what we intended:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
Result: 3
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Alright, this is neat, but we haven't yet confirmed that
|
|
|
|
|
lazy evaluation works. How about we try it with
|
|
|
|
|
`works5.txt`:
|
|
|
|
|
|
|
|
|
|
{{< rawblock "compiler/08/examples/works5.txt" >}}
|
|
|
|
|
|
|
|
|
|
Yet again, the program works:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
Result: 9
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
At last, we have a working compiler!
|
|
|
|
|
|
|
|
|
|
While this is a major victory, we are not yet
|
|
|
|
|
finished with the compiler altogether. While
|
|
|
|
|
we allocate nodes whenever we need them, we
|
|
|
|
|
have not once uttered the phrase `free` in our
|
|
|
|
|
runtime. Our language works, but we have no way
|
|
|
|
|
of comparing numbers, no lambdas, no `let/in`.
|
|
|
|
|
In the next several posts, we will improve
|
|
|
|
|
our compiler to properly free unused memory
|
|
|
|
|
usign a __garbage collector__, implement
|
|
|
|
|
lambda functions using __lambda lifting__,
|
|
|
|
|
and use implement `let/in` expressions. See
|
|
|
|
|
you there!
|
|
|
|
|