blog-static/content/blog/08_compiler_llvm.md

---
title: Compiling a Functional Language Using C++, Part 8 - LLVM
date: 2019-10-30T22:16:22-07:00
tags: ["C and C++", "Functional Languages", "Compilers"]
description: "In this post, we enable our compiler to convert G-machine instructions to LLVM IR, which finally allows us to generate working executables."
---

We don't want a compiler that can only generate code for a single
platform. Our language should work on macOS, Windows, and Linux,
on x86\_64, ARM, and maybe some other architectures. We also
don't want to manually implement the compiler for each platform,
dealing with the specifics of each architecture and operating
system.

This is where LLVM comes in. LLVM (which stands for __Low Level Virtual Machine__),
is a project which presents us with a kind of generic assembly language,
an __Intermediate Representation__ (IR). It also provides tooling to compile the
IR into platform-specific instructions, as well as to apply a host of various
optimizations. We can thus translate our G-machine instructions to LLVM,
and then use LLVM to generate machine code, which gets us to our ultimate
goal of compiling our language.

We start with adding LLVM to our CMake project.
{{< codelines "CMake" "compiler/08/CMakeLists.txt" 7 7 >}}

LLVM is a huge project, and has many components. We don't need
most of them. We do need the core libraries, the x86 assembly
generator, and x86 assembly parser. I'm
not sure why we need the last one, but I ran into linking
errors without them. We find the required link targets
for these components using this CMake command:

{{< codelines "CMake" "compiler/08/CMakeLists.txt" 19 20 >}}

Finally, we add the new include directories, link targets,
and definitions to our compiler executable:

{{< codelines "CMake" "compiler/08/CMakeLists.txt" 39 41 >}}

Great, we have the infrastructure updated to work with LLVM. It's
now time to start using the LLVM API to compile our G-machine instructions
into assembly. We start with `LLVMContext`. The LLVM documentation states:

> This is an important class for using LLVM in a threaded context.
> It (opaquely) owns and manages the core "global" data of LLVM's core infrastructure, including the type and constant uniquing tables. 

We will have exactly one instance of such a class in our program.

Additionally, we want an `IRBuilder`, which will help us generate IR instructions,
placing them into basic blocks (more on that in a bit). Also, we want
a `Module` object, which represents some collection of code and declarations
(perhaps like a C++ source file). Let's keep these things in our own
`llvm_context` class. Here's what that looks like:

{{< codeblock "C++" "compiler/08/llvm_context.hpp" >}}

We include the LLVM context, builder, and module as members
of the context struct. Since the builder and the module need
the context, we initialize them in the constructor, where they
can safely reference it.

Besides these fields, we added
a few others, namely the `functions` and `struct_types` maps,
and the various `llvm::Type` subclasses such as `stack_type`.
We did this because we want to be able to call our runtime
functions (and use our runtime structs) from LLVM. To generate
a function call from LLVM, we need to have access to an
`llvm::Function` object. We thus want to have an `llvm::Function`
object for each runtime function we want to call. We could declare
a member variable in our `llvm_context` for each runtime function,
but it's easier to leave this to be an implementation
detail, and only have a dynamically created map between runtime
function names and their corresponding `llvm::Function` objects.

We populate the maps and other type-related variables in the
two methods, `create_functions()` and `create_types()`. To
create an `llvm::Function`, we must provide an `llvm::FunctionType`,
an `llvm::LinkageType`, the name of the function, and the module
in which the function is declared. Since we only have one
module (the one we initialized in the constructor) that's
the module we pass in. The name of the function is the same
as its name in the runtime. The linkage type is a little
more complicated - it tells LLVM the "visibility" of a function.
"Private" or "Internal" would hide this function from the linker
(like `static` functions in C). However, we want to do the opposite: our
generated functions should be accessible from other code.
Thus, our linkage type is "External".

The only remaining parameter is the `llvm::FunctionType`, which
is created using code like:

```C++
llvm::FunctionType::get(return_type, {param_type_1, param_type_2, ...}, is_variadic)
```

Declaring all the functions and types in our runtime is mostly
just tedious. Here are a few lines from `create_functions()`, which
give a very good idea of the rest of that method:

{{< codelines "C++" "compiler/08/llvm_context.cpp" 47 60 >}}

Similarly, here are a few lines from `create_types()`, from
which you can extrapolate the rest:

{{< codelines "C++" "compiler/08/llvm_context.cpp" 7 11 >}}

We also tell LLVM the contents of our structs, so that
we may later reference specific fields. This is just like
forward declaration - we can forward declare a struct
in C/C++, but unless we also declare its contents,
we can't access what's inside. Below is the code
for specifying the body of `node_base` and `node_app`.

{{< codelines "C++" "compiler/08/llvm_context.cpp" 19 26 >}}

There's still more functionality packed into `llvm_context`.
Let's next take a look into `custom_function`, and
the `create_custom_function` method. Why do we need
these? To highlight the need for the custom class,
let's take a look at `instruction_pushglobal` which
occurs at the G-machine level, and then at `alloc_global`,
which will be a function call generated as part of
the PushGlobal instruction. `instruction_pushglobal`'s
only member variable is `name`, which stands for
the name of the global function it's referencing. However,
`alloc_global` requires an arity argument! We can
try to get this information from the `llvm::Function`
corresponding to the global we're trying to reference,
but this doesn't get us anywhere: as far as LLVM
is concerned, any global function only takes one
parameter, the stack. The rest of the parameters
are given through that stack, and their number cannot
be easily deduced from the function alone.

Instead, we decide to store global functions together
with their arity. We thus create a class to combine
these two things (`custom_function`), define
a map from global function names to instances
of `custom_function`, and add a convenience method
(`create_custom_function`) that takes care of
constructing an `llvm::Function` object, creating
a `custom_function`, and storing it in the map.

The implementation for `custom_function` is 
straightforward:

{{< codelines "C++" "compiler/08/llvm_context.cpp" 234 252 >}}

We create a function type, then a function, and finally
initialize a `custom_function`. There's one thing
we haven't seen yet in this function, which is the
`BasicBlock` class. We'll get to what basic blocks
are shortly, but for now it's sufficient to
know that the basic block gives us a place to
insert code.

This isn't the end of our `llvm_context` class: it also
has a variety of other `create_*` methods! Let's take a look
at their signatures. Most return either `void`,
`llvm::ConstantInt*`, or `llvm::Value*`. Since
`llvm::ConstantInt*` is a subclass of `llvm::Value*`, let's
just treat it as simply an `llvm::Value*` while trying
to understand these methods.

So, what is `llvm::Value`? To answer this question, let's
first understand how the LLVM IR works.

### LLVM IR
An important property of LLVM IR is that it is in __Single Static Assignment__
(SSA) form. This means that each variable can only be assigned to once. For instance,
if we use `<-` to represent assignment, the following program is valid:

```
x <- 1
y <- 2
z <- x + y
```

However, the following program is __not__ valid:

```
x <- 1
x <- x + 1
```

But what if we __do__ want to modify a variable `x`?
We can declare another "version" of `x` every time we modify it.
For instance, if we wanted to increment `x` twice, we'd do this:

```
x <- 1
x1 <- x + 1
x2 <- x1 + 1
```

In practice, LLVM's C++ API can take care of versioning variables on its own, by
auto-incrementing numbers associated with each variable we use.

Assigned to each variable is `llvm::Value`. The LLVM documentation states:

> It is the base class of all values computed by a program that may be used as operands to other values.

It's important to understand that `llvm::Value` __does not store the result of the computation__.
It rather represents how something may be computed. 1 is a value because it computed by
just returning 1. `x + 1` is a value because it is computed by adding the value inside of
`x` to 1. Since we cannot modify a variable once we've declared it, we will
keep assigning intermediate results to new variables, constructing new values
out of values that we've already specified.

This somewhat elucidates what the `create_*` functions do: `create_i8` creates an 8-bit integer
value, and `create_pop` creates a value that is computed by calling
our runtime `stack_pop` function.

Before we move on to look at the implementations of these functions,
we need to understand another concept from the world of compiler design:
__basic blocks__. A basic block is a sequence of instructions that
are guaranteed to be executed one after another. This means that a
basic block cannot have an if/else, jump, or any other type of control flow anywhere
except at the end. If control flow could appear inside the basic block,
there would be opporunity for execution of some, but not all,
instructions in the block, violating the definition. Every time
we add an IR instruction in LLVM, we add it to a basic block.
Writing control flow involves creating several blocks, with each
block serving as the destination of a potential jump. We will
see this used to compile the Jump instruction.

### Generating LLVM IR
Now that we understand what `llvm::Value` is, and have a vague
understanding of how LLVM is structured, let's take a look at
the implementations of the `create_*` functions. The simplest
is `create_i8`:

{{< codelines "C++" "compiler/08/llvm_context.cpp" 150 152 >}}

Not much to see here. We create an instance of the `llvm::ConstantInt` class,
from the actual integer given to the method. As we said before,
`llvm::ConstantInt` is a subclass of `llvm::Value`. Next up, let's look
at `create_pop`:

{{< codelines "C++" "compiler/08/llvm_context.cpp" 160 163 >}}

We first retrieve an `llvm::Function` associated with `stack_pop`
from our map, and then use `llvm::IRBuilder::CreateCall` to insert
a value that represents a function call into the currently
selected basic block (the builder's state is what
dictates what the "selected basic block" is). `CreateCall`
takes as parameters the function we want to call (`stack_pop`,
which we store into the `pop_f` variable), as well as the arguments
to the function (for which we pass `f->arg_begin()`).

Hold on. What the heck is `arg_begin()`? Why do we take a function
as a paramter to this method? The answer is fairly simple: this 
method is used when we are
generating a function with signature `void f_(struct stack* s)`
(we discussed the signature in the previous post). The
parameter that we give to `create_pop` is this function we're
generating, and `arg_begin()` gets the value that represents
the first parameter to our function - `s`! Since `stack_pop`
takes a stack, we need to give it the stack we're working on,
and so we use `f->arg_begin()` to access it.

Most of the other functions follow this exact pattern, with small
deviations. However, another function uses a more complicated LLVM
instruction:

{{< codelines "C++" "compiler/08/llvm_context.cpp" 202 209 >}}

`unwrap_num` is used to cast a given node pointer to a pointer
to a number node, and then return the integer value from
that number node. It starts fairly innocently: we ask
LLVM for the type of a pointer to a `node_num` struct,
and then use `CreatePointerCast` to create a value
that is the same node pointer we're given, but now interpreted
as a number node pointer. We now have to access
the `value` field of our node. `CreateGEP` helps us with
this: given a pointer to a node, and two offsets
`n` and `k`, it effectively performs the following:

```C++
&(num_pointer[n]->kth_field)
```

The first offset, then, gives an index into the "array"
represented by the pointer, while the second offset
gives the index of the field we want to access. We
want to dereference the pointer (`num_pointer[0]`),
and we want the second field (`1`, when counting from 0).
Thus, we call `CreateGEP` with these offsets and our pointers.

This still leaves us with a pointer to a number, rather
than the number itself. To dereference the pointer, we use
`CreateLoad`. This gives us the value of the number node,
which we promptly return.

This concludes our implementation of the `llvm_context` -
it's time to move on to the G-machine instructions.

### G-machine Instructions to LLVM IR

Let's now envision a `gen_llvm` method on the `instruction` struct,
which will turn the still-abstract G-machine instruction
into tangible, close-to-metal LLVM IR. As we've seen
in our implementation of `llvm_context`, to access the stack, we need access to the first
argument of the function we're generating. Thus, we need this method
to accept the function whose instructions are
being converted to LLVM. We also pass in the
`llvm_context`, since it contains the LLVM builder,
context, module, and a map of globally declared functions.

With these things in mind, here's the signature for `gen_llvm`:

```C++
virtual void gen_llvm(llvm_context&, llvm::Function*) const;
```

Let's get right to it! `instruction_pushint` gives us an easy
start:

{{< codelines "C++" "compiler/08/instruction.cpp" 17 19 >}}

We create an LLVM integer constant with the value of
our integer, and push it onto the stack.

`instruction_push` is equally terse:

{{< codelines "C++" "compiler/08/instruction.cpp" 37 39 >}}

We simply peek at the value of the stack at the given
offset (an integer of the same size as `size_t`, which
we create using `create_size`). Once we have the
result of the peek, we push it onto the stack.

`instruction_pushglobal` is more involved. Let's take a look:

{{< codelines "C++" "compiler/08/instruction.cpp" 26 30 >}}

First, we retrive the `custom_function` associated with
the given global name. We then create an LLVM integer
constant representing the arity of the function,
and then push onto the stack the result of `alloc_global`,
giving it the function and arity just like it expects.

`instruction_pop` is also short, and doesn't require much
further explanation: 

{{< codelines "C++" "compiler/08/instruction.cpp" 46 48 >}}

Some other instructions, such as `instruction_update`,
`instruction_pack`, `instruction_split`, `instruction_slide`,
`instruction_alloc` and `instruction_eval` are equally as simple,
and we omit them for the purpose of brevity.

What remains are two "meaty" functions, `instruction_jump` and
`instruction_binop`. Let's start with the former:

{{< codelines "C++" "compiler/08/instruction.cpp" 101 123 >}}

This is the one and only function in which we have to take
care of control flow. Conceptually, depending on the tag
of the `node_data` at the top of the stack, we want
to pick one of many branches and jump to it. 
As we discussed, a basic block has to be executed in
its entirety; since the branches of a case expression
are mutually exclusive (only one of them is executed in any given case),
we have to create a separate basic block for each branch.
Given these blocks, we then want to branch to the correct one
using the tag of the node on top of the stack.

This is exactly what we do in this function. We first peek
at the node on top of the stack, and use `CreateGEP` through
`unwrap_data_tag` to get access to its tag. What we then
need is LLVM's switch instruction, created using `CreateSwitch`.
We must provide the switch with a "default" case in case
the tag value is something we don't recognize. To do this,
we create a "safety" `BasicBlock`. With this new safety
block in hand, we're able to call `CreateSwitch`, giving it
the tag value to switch on, the safety block to default to,
and the expected number of branches (to optimize memory allocation).

Next, we create a vector of blocks, and for each branch,
we append to it a corresponding block `branch_block`, into
which we insert the LLVM IR corresponding to the
instructions of the branch. No matter the branch we take,
we eventually want to come back to the same basic block,
which will perform the usual function cleanup via Update and Slide.
We re-use the safety block for this, and use `CreateBr` at the
end of each `branch_block` to perform an unconditional jump.

After we create each of the blocks, we use the `tag_mappings`
to add cases to the switch instruction, using `addCase`. Finally,
we set the builder's insertion point to the safety block,
meaning that the next instructions will insert their
LLVM IR into that block. Since we have all branches
jump to the safety block at the end, this means that
no matter which branch we take in the case expression, 
we will still execute the subsequent instructions as expected.

Let's now look at `instruction_binop`:

{{< codelines "C++" "compiler/08/instruction.cpp" 139 150 >}}

In this instruction, we pop and unwrap two integers from
the stack (assuming they are integers). Depending on
the type of operation the instruction is set to, we
then push the result of the corresponding LLVM
instruction. `PLUS` calls LLVM's `CreateAdd` to insert
addition, `MINUS` calls `CreateSub`, and so on. No matter
what the operation was, we push the result onto the stack.

That's all for our instructions! We're so very close now. Let's
move on to compiling definitions.

### Definitions to LLVM IR
As with typechecking, to allow for mutually recursive functions,
we need to be able each global function from any other function.
We then take the same approah as before, going in two passes.
This leads to two new methods for `definition`:

```C++
virtual void gen_llvm_first(llvm_context& ctx) = 0;
virtual void gen_llvm_second(llvm_context& ctx) = 0;
```

The first pass is intended to register all functions into
the `llvm_context`, making them visible to other functions.
The second pass is used to actually generate the code for
each function, now having access to all the other global
functions. Let's see the implementation for `gen_llvm_first`
for `definition_defn`:

{{< codelines "C++" "compiler/08/definition.cpp" 58 60 >}}

Since `create_custom_function` already creates a function
__and__ registers it with `llvm_context`, this is
all we need. Note that we created a new member variable
for `definition_defn` which stores this newly created
function. In the second pass, we will populate this
function with LLVM IR from the definition's instructions.

We actually create functions for each of the constructors
of data types, but they're quite special: all they do is
pack their arguments! Since they don't need access to
the other global functions, we might as well create
their bodies then and there:

{{< codelines "C++" "compiler/08/definition.cpp" 101 112 >}}

Like in `definition_defn`, we use `create_custom_function`.
However, we then use `SetInsertPoint` to configure our builder to insert code into
the newly created function (which already has a `BasicBlock`,
thanks to that one previously unexplained line in `create_custom_function`!).
Since we decided to only include the Pack instruction, we generate
a call to it directly using `create_pack`. We follow this
up with `CreateRetVoid`, which tells LLVM that this is
the end of the function, and that it is now safe to return
from it.

Great! We now implement the second pass of `gen_llvm`. In
the case of `definition_defn`, we do almost exactly
what we did in the first pass of `definition_data`:

{{< codelines "C++" "compiler/08/definition.cpp" 62 68 >}}

As for `definition_data`, we have nothing to do in the
second pass. We're done!

### Getting Results
We're almost there. Two things remain. The first: our implementation
of `ast_binop`, implement each binary operation as simply a function call:
`+` calls `f_plus`, and so on. But so far, we have not implemented
`f_plus`, or any other binary operator function. We do this
in `main.cpp`, creating a function `gen_llvm_internal_op`:

{{< codelines "C++" "compiler/08/main.cpp" 70 83 >}}

We create a simple function body. We then append G-machine
instructions that take each argument, evaluate it,
and then perform the corresponding binary operation.
With these instructions in the body, we insert
them into a new function, just like we did in our code
for `definition_defn` and `definition_data`.

Finally, we write our `gen_llvm` function that we will
call from `main`:

{{< codelines "C++" "compiler/08/main.cpp" 125 141 >}}

It first creates the functions for
`+`, `-`, `*`, and `/`. Then, it calls the first
pass of `gen_llvm` on all definitions, followed
by the second pass. Lastly, it uses LLVM's built-in
functionality to print out the generated IR in
our module, and then uses a function `output_llvm`
to create an object file ready for linking.

To be very honest, I took the `output_llvm` function
almost entirely from instructional material for my university's
compilers course. The gist of it, though, is: we determine
the target architecture and platform, specify a "generic" CPU,
create a default set of options, and then generate an object file.
Here it is: 

{{< codelines "C++" "compiler/08/main.cpp" 85 123 >}}

We now add a `generate_llvm` call to `main`.

Are we there?

Let's try to compile our first example, `works1.txt`. The
file:

{{< rawblock "compiler/08/examples/works1.txt" >}}

We run the following commands in our build directory:

```
./compiler < ../examples/work1.txt
gcc -no-pie ../runtime.c program.o
./a.out
```

Nothing happens. How anticlimactic! Our runtime has no way of
printing out the result of the evaluation. Let's change that:

{{< codelines "C++" "compiler/08/runtime.c" 157 183 >}}

Rerunning our commands, we get:

```
Result: 326
```

The correct result! Let's try it with `works2.txt`:

{{< rawblock "compiler/08/examples/works2.txt" >}}

And again, we get the right answer:

```
Result: 326
```

This is child's play, though. Let's try with something
more complicated, like `works3.txt`:

{{< rawblock "compiler/08/examples/works3.txt" >}}

Once again, our program does exactly what we intended:

```
Result: 3
```

Alright, this is neat, but we haven't yet confirmed that
lazy evaluation works. How about we try it with
`works5.txt`:

{{< rawblock "compiler/08/examples/works5.txt" >}}

Yet again, the program works:

```
Result: 9
```

At last, we have a working compiler!

While this is a major victory, we are not yet
finished with the compiler altogether. While
we allocate nodes whenever we need them, we
have not once uttered the phrase `free` in our
runtime. Our language works, but we have no way
of comparing numbers, no lambdas, no `let/in`.
In the next several posts, we will improve
our compiler to properly free unused memory
usign a __garbage collector__, implement
lambda functions using __lambda lifting__,
and use our Alloc instruction to implement `let/in` expressions.
We get started on the first of these tasks in
[Part 9 - Garbage Collection]({{< relref "09_compiler_garbage_collection.md" >}}).