blog-static/content/blog/08_compiler_llvm.md

2.4 KiB

title date draft tags
Compiling a Functional Language Using C++, Part 8 - LLVM 2019-10-30T22:16:22-07:00 true
C and C++
Functional Languages
Compilers

We don't want a compiler that can only generate code for a single platform. Our language should work on macOS, Windows, and Linux, on x86_64, ARM, and maybe some other architectures. We also don't want to manually implement the compiler for each platform, dealing with the specifics of each architecture and operating system.

This is where LLVM comes in. LLVM (which stands for Low Level Virtual Machine), is a project which presents us with a kind of generic assembly language, an Intermediate Representation (IR). It also provides tooling to compile the IR into platform-specific instructions, as well as to apply a host of various optimizations. We can thus translate our G-machine instructions to LLVM, and then use LLVM to generate machine code, which gets us to our ultimate goal of compiling our language.

We start with adding LLVM to our CMake project. {{< codelines "CMake" "compiler/08/CMakeLists.txt" 7 7 >}}

LLVM is a huge project, and has many components. We don't need most of them. We do need the core libraries, the x86 assembly generator, and x86 assembly parser. I'm not sure why we need the last one, but I ran into linking errors without them. We find the required link targets for these components using this CMake command:

{{< codelines "CMake" "compiler/08/CMakeLists.txt" 19 20 >}}

Finally, we add the new include directories, link targets, and definitions to our compiler executable:

{{< codelines "CMake" "compiler/08/CMakeLists.txt" 39 41 >}}

Great, we have the infrastructure updated to work with LLVM. It's now time to start using the LLVM API to compile our G-machine instructions into assembly. We start with LLVMContext. The LLVM documentation states:

This is an important class for using LLVM in a threaded context. It (opaquely) owns and manages the core "global" data of LLVM's core infrastructure, including the type and constant uniquing tables.

We will have exactly one instance of such a class in our program.

Additionally, we want an IRBuilder, which will help us generate IR instructions, placing them into basic blocks (more on that in a bit). Also, we want a Module object, which represents some collection of code and declarations (perhaps like a C++ source file). Let's keep these things in our own llvm_state class.