blog-static/content/blog/08_compiler_llvm.md

54 lines
2.4 KiB
Markdown
Raw Normal View History

---
title: Compiling a Functional Language Using C++, Part 8 - LLVM
date: 2019-10-30T22:16:22-07:00
draft: true
tags: ["C and C++", "Functional Languages", "Compilers"]
---
We don't want a compiler that can only generate code for a single
platform. Our language should work on macOS, Windows, and Linux,
on x86\_64, ARM, and maybe some other architectures. We also
don't want to manually implement the compiler for each platform,
dealing with the specifics of each architecture and operating
system.
This is where LLVM comes in. LLVM (which stands for __Low Level Virtual Machine__),
is a project which presents us with a kind of generic assembly language,
an __Intermediate Representation__ (IR). It also provides tooling to compile the
IR into platform-specific instructions, as well as to apply a host of various
optimizations. We can thus translate our G-machine instructions to LLVM,
and then use LLVM to generate machine code, which gets us to our ultimate
goal of compiling our language.
We start with adding LLVM to our CMake project.
{{< codelines "CMake" "compiler/08/CMakeLists.txt" 7 7 >}}
LLVM is a huge project, and has many components. We don't need
most of them. We do need the core libraries, the x86 assembly
generator, and x86 assembly parser. I'm
not sure why we need the last one, but I ran into linking
errors without them. We find the required link targets
for these components using this CMake command:
{{< codelines "CMake" "compiler/08/CMakeLists.txt" 19 20 >}}
Finally, we add the new include directories, link targets,
and definitions to our compiler executable:
{{< codelines "CMake" "compiler/08/CMakeLists.txt" 39 41 >}}
Great, we have the infrastructure updated to work with LLVM. It's
now time to start using the LLVM API to compile our G-machine instructions
into assembly. We start with `LLVMContext`. The LLVM documentation states:
> This is an important class for using LLVM in a threaded context.
> It (opaquely) owns and manages the core "global" data of LLVM's core infrastructure, including the type and constant uniquing tables.
We will have exactly one instance of such a class in our program.
Additionally, we want an `IRBuilder`, which will help us generate IR instructions,
placing them into basic blocks (more on that in a bit). Also, we want
a `Module` object, which represents some collection of code and declarations
(perhaps like a C++ source file). Let's keep these things in our own
`llvm_state` class.