580 lines
		
	
	
		
			23 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			580 lines
		
	
	
		
			23 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| ---
 | |
| title: Compiling a Functional Language Using C++, Part 8 - LLVM
 | |
| date: 2019-10-30T22:16:22-07:00
 | |
| tags: ["C and C++", "Functional Languages", "Compilers"]
 | |
| ---
 | |
| 
 | |
| We don't want a compiler that can only generate code for a single
 | |
| platform. Our language should work on macOS, Windows, and Linux,
 | |
| on x86\_64, ARM, and maybe some other architectures. We also
 | |
| don't want to manually implement the compiler for each platform,
 | |
| dealing with the specifics of each architecture and operating
 | |
| system.
 | |
| 
 | |
| This is where LLVM comes in. LLVM (which stands for __Low Level Virtual Machine__),
 | |
| is a project which presents us with a kind of generic assembly language,
 | |
| an __Intermediate Representation__ (IR). It also provides tooling to compile the
 | |
| IR into platform-specific instructions, as well as to apply a host of various
 | |
| optimizations. We can thus translate our G-machine instructions to LLVM,
 | |
| and then use LLVM to generate machine code, which gets us to our ultimate
 | |
| goal of compiling our language.
 | |
| 
 | |
| We start with adding LLVM to our CMake project.
 | |
| {{< codelines "CMake" "compiler/08/CMakeLists.txt" 7 7 >}}
 | |
| 
 | |
| LLVM is a huge project, and has many components. We don't need
 | |
| most of them. We do need the core libraries, the x86 assembly
 | |
| generator, and x86 assembly parser. I'm
 | |
| not sure why we need the last one, but I ran into linking
 | |
| errors without them. We find the required link targets
 | |
| for these components using this CMake command:
 | |
| 
 | |
| {{< codelines "CMake" "compiler/08/CMakeLists.txt" 19 20 >}}
 | |
| 
 | |
| Finally, we add the new include directories, link targets,
 | |
| and definitions to our compiler executable:
 | |
| 
 | |
| {{< codelines "CMake" "compiler/08/CMakeLists.txt" 39 41 >}}
 | |
| 
 | |
| Great, we have the infrastructure updated to work with LLVM. It's
 | |
| now time to start using the LLVM API to compile our G-machine instructions
 | |
| into assembly. We start with `LLVMContext`. The LLVM documentation states:
 | |
| 
 | |
| > This is an important class for using LLVM in a threaded context.
 | |
| > It (opaquely) owns and manages the core "global" data of LLVM's core infrastructure, including the type and constant uniquing tables. 
 | |
| 
 | |
| We will have exactly one instance of such a class in our program.
 | |
| 
 | |
| Additionally, we want an `IRBuilder`, which will help us generate IR instructions,
 | |
| placing them into basic blocks (more on that in a bit). Also, we want
 | |
| a `Module` object, which represents some collection of code and declarations
 | |
| (perhaps like a C++ source file). Let's keep these things in our own
 | |
| `llvm_context` class. Here's what that looks like:
 | |
| 
 | |
| {{< codeblock "C++" "compiler/08/llvm_context.hpp" >}}
 | |
| 
 | |
| We include the LLVM context, builder, and module as members
 | |
| of the context struct. Since the builder and the module need
 | |
| the context, we initialize them in the constructor, where they
 | |
| can safely reference it.
 | |
| 
 | |
| Besides these fields, we added
 | |
| a few others, namely the `functions` and `struct_types` maps,
 | |
| and the various `llvm::Type` subclasses such as `stack_type`.
 | |
| We did this because we want to be able to call our runtime
 | |
| functions (and use our runtime structs) from LLVM. To generate
 | |
| a function call from LLVM, we need to have access to an
 | |
| `llvm::Function` object. We thus want to have an `llvm::Function`
 | |
| object for each runtime function we want to call. We could declare
 | |
| a member variable in our `llvm_context` for each runtime function,
 | |
| but it's easier to leave this to be an implementation
 | |
| detail, and only have a dynamically created map between runtime
 | |
| function names and their corresponding `llvm::Function` objects.
 | |
| 
 | |
| We populate the maps and other type-related variables in the
 | |
| two methods, `create_functions()` and `create_types()`. To
 | |
| create an `llvm::Function`, we must provide an `llvm::FunctionType`,
 | |
| an `llvm::LinkageType`, the name of the function, and the module
 | |
| in which the function is declared. Since we only have one
 | |
| module (the one we initialized in the constructor) that's
 | |
| the module we pass in. The name of the function is the same
 | |
| as its name in the runtime. The linkage type is a little
 | |
| more complicated - it tells LLVM the "visibility" of a function.
 | |
| "Private" or "Internal" would hide this function from the linker
 | |
| (like `static` functions in C). However, we want to do the opposite: our
 | |
| generated functions should be accessible from other code.
 | |
| Thus, our linkage type is "External".
 | |
| 
 | |
| The only remaining parameter is the `llvm::FunctionType`, which
 | |
| is created using code like:
 | |
| 
 | |
| ```C++
 | |
| llvm::FunctionType::get(return_type, {param_type_1, param_type_2, ...}, is_variadic)
 | |
| ```
 | |
| 
 | |
| Declaring all the functions and types in our runtime is mostly
 | |
| just tedious. Here are a few lines from `create_functions()`, which
 | |
| give a very good idea of the rest of that method:
 | |
| 
 | |
| {{< codelines "C++" "compiler/08/llvm_context.cpp" 47 60 >}}
 | |
| 
 | |
| Similarly, here are a few lines from `create_types()`, from
 | |
| which you can extrapolate the rest:
 | |
| 
 | |
| {{< codelines "C++" "compiler/08/llvm_context.cpp" 7 11 >}}
 | |
| 
 | |
| We also tell LLVM the contents of our structs, so that
 | |
| we may later reference specific fields. This is just like
 | |
| forward declaration - we can forward declare a struct
 | |
| in C/C++, but unless we also declare its contents,
 | |
| we can't access what's inside. Below is the code
 | |
| for specifying the body of `node_base` and `node_app`.
 | |
| 
 | |
| {{< codelines "C++" "compiler/08/llvm_context.cpp" 19 26 >}}
 | |
| 
 | |
| There's still more functionality packed into `llvm_context`.
 | |
| Let's next take a look into `custom_function`, and
 | |
| the `create_custom_function` method. Why do we need
 | |
| these? To highlight the need for the custom class,
 | |
| let's take a look at `instruction_pushglobal` which
 | |
| occurs at the G-machine level, and then at `alloc_global`,
 | |
| which will be a function call generated as part of
 | |
| the PushGlobal instruction. `instruction_pushglobal`'s
 | |
| only member variable is `name`, which stands for
 | |
| the name of the global function it's referencing. However,
 | |
| `alloc_global` requires an arity argument! We can
 | |
| try to get this information from the `llvm::Function`
 | |
| corresponding to the global we're trying to reference,
 | |
| but this doesn't get us anywhere: as far as LLVM
 | |
| is concerned, any global function only takes one
 | |
| parameter, the stack. The rest of the parameters
 | |
| are given through that stack, and their number cannot
 | |
| be easily deduced from the function alone.
 | |
| 
 | |
| Instead, we decide to store global functions together
 | |
| with their arity. We thus create a class to combine
 | |
| these two things (`custom_function`), define
 | |
| a map from global function names to instances
 | |
| of `custom_function`, and add a convenience method
 | |
| (`create_custom_function`) that takes care of
 | |
| constructing an `llvm::Function` object, creating
 | |
| a `custom_function`, and storing it in the map.
 | |
| 
 | |
| The implementation for `custom_function` is 
 | |
| straightforward:
 | |
| 
 | |
| {{< codelines "C++" "compiler/08/llvm_context.cpp" 234 252 >}}
 | |
| 
 | |
| We create a function type, then a function, and finally
 | |
| initialize a `custom_function`. There's one thing
 | |
| we haven't seen yet in this function, which is the
 | |
| `BasicBlock` class. We'll get to what basic blocks
 | |
| are shortly, but for now it's sufficient to
 | |
| know that the basic block gives us a place to
 | |
| insert code.
 | |
| 
 | |
| This isn't the end of our `llvm_context` class: it also
 | |
| has a variety of other `create_*` methods! Let's take a look
 | |
| at their signatures. Most return either `void`,
 | |
| `llvm::ConstantInt*`, or `llvm::Value*`. Since
 | |
| `llvm::ConstantInt*` is a subclass of `llvm::Value*`, let's
 | |
| just treat it as simply an `llvm::Value*` while trying
 | |
| to understand these methods.
 | |
| 
 | |
| So, what is `llvm::Value`? To answer this question, let's
 | |
| first understand how the LLVM IR works.
 | |
| 
 | |
| ### LLVM IR
 | |
| An important property of LLVM IR is that it is in __Single Static Assignment__
 | |
| (SSA) form. This means that each variable can only be assigned to once. For instance,
 | |
| if we use `<-` to represent assignment, the following program is valid:
 | |
| 
 | |
| ```
 | |
| x <- 1
 | |
| y <- 2
 | |
| z <- x + y
 | |
| ```
 | |
| 
 | |
| However, the following program is __not__ valid:
 | |
| 
 | |
| ```
 | |
| x <- 1
 | |
| x <- x + 1
 | |
| ```
 | |
| 
 | |
| But what if we __do__ want to modify a variable `x`?
 | |
| We can declare another "version" of `x` every time we modify it.
 | |
| For instance, if we wanted to increment `x` twice, we'd do this:
 | |
| 
 | |
| ```
 | |
| x <- 1
 | |
| x1 <- x + 1
 | |
| x2 <- x1 + 1
 | |
| ```
 | |
| 
 | |
| In practice, LLVM's C++ API can take care of versioning variables on its own, by
 | |
| auto-incrementing numbers associated with each variable we use.
 | |
| 
 | |
| Assigned to each variable is `llvm::Value`. The LLVM documentation states:
 | |
| 
 | |
| > It is the base class of all values computed by a program that may be used as operands to other values.
 | |
| 
 | |
| It's important to understand that `llvm::Value` __does not store the result of the computation__.
 | |
| It rather represents how something may be computed. 1 is a value because it computed by
 | |
| just returning 1. `x + 1` is a value because it is computed by adding the value inside of
 | |
| `x` to 1. Since we cannot modify a variable once we've declared it, we will
 | |
| keep assigning intermediate results to new variables, constructing new values
 | |
| out of values that we've already specified.
 | |
| 
 | |
| This somewhat elucidates what the `create_*` functions do: `create_i8` creates an 8-bit integer
 | |
| value, and `create_pop` creates a value that is computed by calling
 | |
| our runtime `stack_pop` function.
 | |
| 
 | |
| Before we move on to look at the implementations of these functions,
 | |
| we need to understand another concept from the world of compiler design:
 | |
| __basic blocks__. A basic block is a sequence of instructions that
 | |
| are guaranteed to be executed one after another. This means that a
 | |
| basic block cannot have an if/else, jump, or any other type of control flow anywhere
 | |
| except at the end. If control flow could appear inside the basic block,
 | |
| there would be opporunity for execution of some, but not all,
 | |
| instructions in the block, violating the definition. Every time
 | |
| we add an IR instruction in LLVM, we add it to a basic block.
 | |
| Writing control flow involves creating several blocks, with each
 | |
| block serving as the destination of a potential jump. We will
 | |
| see this used to compile the Jump instruction.
 | |
| 
 | |
| ### Generating LLVM IR
 | |
| Now that we understand what `llvm::Value` is, and have a vague
 | |
| understanding of how LLVM is structured, let's take a look at
 | |
| the implementations of the `create_*` functions. The simplest
 | |
| is `create_i8`:
 | |
| 
 | |
| {{< codelines "C++" "compiler/08/llvm_context.cpp" 150 152 >}}
 | |
| 
 | |
| Not much to see here. We create an instance of the `llvm::ConstantInt` class,
 | |
| from the actual integer given to the method. As we said before,
 | |
| `llvm::ConstantInt` is a subclass of `llvm::Value`. Next up, let's look
 | |
| at `create_pop`:
 | |
| 
 | |
| {{< codelines "C++" "compiler/08/llvm_context.cpp" 160 163 >}}
 | |
| 
 | |
| We first retrieve an `llvm::Function` associated with `stack_pop`
 | |
| from our map, and then use `llvm::IRBuilder::CreateCall` to insert
 | |
| a value that represents a function call into the currently
 | |
| selected basic block (the builder's state is what
 | |
| dictates what the "selected basic block" is). `CreateCall`
 | |
| takes as parameters the function we want to call (`stack_pop`,
 | |
| which we store into the `pop_f` variable), as well as the arguments
 | |
| to the function (for which we pass `f->arg_begin()`).
 | |
| 
 | |
| Hold on. What the heck is `arg_begin()`? Why do we take a function
 | |
| as a paramter to this method? The answer is fairly simple: this 
 | |
| method is used when we are
 | |
| generating a function with signature `void f_(struct stack* s)`
 | |
| (we discussed the signature in the previous post). The
 | |
| parameter that we give to `create_pop` is this function we're
 | |
| generating, and `arg_begin()` gets the value that represents
 | |
| the first parameter to our function - `s`! Since `stack_pop`
 | |
| takes a stack, we need to give it the stack we're working on,
 | |
| and so we use `f->arg_begin()` to access it.
 | |
| 
 | |
| Most of the other functions follow this exact pattern, with small
 | |
| deviations. However, another function uses a more complicated LLVM
 | |
| instruction:
 | |
| 
 | |
| {{< codelines "C++" "compiler/08/llvm_context.cpp" 202 209 >}}
 | |
| 
 | |
| `unwrap_num` is used to cast a given node pointer to a pointer
 | |
| to a number node, and then return the integer value from
 | |
| that number node. It starts fairly innocently: we ask
 | |
| LLVM for the type of a pointer to a `node_num` struct,
 | |
| and then use `CreatePointerCast` to create a value
 | |
| that is the same node pointer we're given, but now interpreted
 | |
| as a number node pointer. We now have to access
 | |
| the `value` field of our node. `CreateGEP` helps us with
 | |
| this: given a pointer to a node, and two offsets
 | |
| `n` and `k`, it effectively performs the following:
 | |
| 
 | |
| ```C++
 | |
| &(num_pointer[n]->kth_field)
 | |
| ```
 | |
| 
 | |
| The first offset, then, gives an index into the "array"
 | |
| represented by the pointer, while the second offset
 | |
| gives the index of the field we want to access. We
 | |
| want to dereference the pointer (`num_pointer[0]`),
 | |
| and we want the second field (`1`, when counting from 0).
 | |
| Thus, we call `CreateGEP` with these offsets and our pointers.
 | |
| 
 | |
| This still leaves us with a pointer to a number, rather
 | |
| than the number itself. To dereference the pointer, we use
 | |
| `CreateLoad`. This gives us the value of the number node,
 | |
| which we promptly return.
 | |
| 
 | |
| This concludes our implementation of the `llvm_context` -
 | |
| it's time to move on to the G-machine instructions.
 | |
| 
 | |
| ### G-machine Instructions to LLVM IR
 | |
| 
 | |
| Let's now envision a `gen_llvm` method on the `instruction` struct,
 | |
| which will turn the still-abstract G-machine instruction
 | |
| into tangible, close-to-metal LLVM IR. As we've seen
 | |
| in our implementation of `llvm_context`, to access the stack, we need access to the first
 | |
| argument of the function we're generating. Thus, we need this method
 | |
| to accept the function whose instructions are
 | |
| being converted to LLVM. We also pass in the
 | |
| `llvm_context`, since it contains the LLVM builder,
 | |
| context, module, and a map of globally declared functions.
 | |
| 
 | |
| With these things in mind, here's the signature for `gen_llvm`:
 | |
| 
 | |
| ```C++
 | |
| virtual void gen_llvm(llvm_context&, llvm::Function*) const;
 | |
| ```
 | |
| 
 | |
| Let's get right to it! `instruction_pushint` gives us an easy
 | |
| start:
 | |
| 
 | |
| {{< codelines "C++" "compiler/08/instruction.cpp" 17 19 >}}
 | |
| 
 | |
| We create an LLVM integer constant with the value of
 | |
| our integer, and push it onto the stack.
 | |
| 
 | |
| `instruction_push` is equally terse:
 | |
| 
 | |
| {{< codelines "C++" "compiler/08/instruction.cpp" 37 39 >}}
 | |
| 
 | |
| We simply peek at the value of the stack at the given
 | |
| offset (an integer of the same size as `size_t`, which
 | |
| we create using `create_size`). Once we have the
 | |
| result of the peek, we push it onto the stack.
 | |
| 
 | |
| `instruction_pushglobal` is more involved. Let's take a look:
 | |
| 
 | |
| {{< codelines "C++" "compiler/08/instruction.cpp" 26 30 >}}
 | |
| 
 | |
| First, we retrive the `custom_function` associated with
 | |
| the given global name. We then create an LLVM integer
 | |
| constant representing the arity of the function,
 | |
| and then push onto the stack the result of `alloc_global`,
 | |
| giving it the function and arity just like it expects.
 | |
| 
 | |
| `instruction_pop` is also short, and doesn't require much
 | |
| further explanation: 
 | |
| 
 | |
| {{< codelines "C++" "compiler/08/instruction.cpp" 46 48 >}}
 | |
| 
 | |
| Some other instructions, such as `instruction_update`,
 | |
| `instruction_pack`, `instruction_split`, `instruction_slide`,
 | |
| `instruction_alloc` and `instruction_eval` are equally as simple,
 | |
| and we omit them for the purpose of brevity.
 | |
| 
 | |
| What remains are two "meaty" functions, `instruction_jump` and
 | |
| `instruction_binop`. Let's start with the former:
 | |
| 
 | |
| {{< codelines "C++" "compiler/08/instruction.cpp" 101 123 >}}
 | |
| 
 | |
| This is the one and only function in which we have to take
 | |
| care of control flow. Conceptually, depending on the tag
 | |
| of the `node_data` at the top of the stack, we want
 | |
| to pick one of many branches and jump to it. 
 | |
| As we discussed, a basic block has to be executed in
 | |
| its entirety; since the branches of a case expression
 | |
| are mutually exclusive (only one of them is executed in any given case),
 | |
| we have to create a separate basic block for each branch.
 | |
| Given these blocks, we then want to branch to the correct one
 | |
| using the tag of the node on top of the stack.
 | |
| 
 | |
| This is exactly what we do in this function. We first peek
 | |
| at the node on top of the stack, and use `CreateGEP` through
 | |
| `unwrap_data_tag` to get access to its tag. What we then
 | |
| need is LLVM's switch instruction, created using `CreateSwitch`.
 | |
| We must provide the switch with a "default" case in case
 | |
| the tag value is something we don't recognize. To do this,
 | |
| we create a "safety" `BasicBlock`. With this new safety
 | |
| block in hand, we're able to call `CreateSwitch`, giving it
 | |
| the tag value to switch on, the safety block to default to,
 | |
| and the expected number of branches (to optimize memory allocation).
 | |
| 
 | |
| Next, we create a vector of blocks, and for each branch,
 | |
| we append to it a corresponding block `branch_block`, into
 | |
| which we insert the LLVM IR corresponding to the
 | |
| instructions of the branch. No matter the branch we take,
 | |
| we eventually want to come back to the same basic block,
 | |
| which will perform the usual function cleanup via Update and Slide.
 | |
| We re-use the safety block for this, and use `CreateBr` at the
 | |
| end of each `branch_block` to perform an unconditional jump.
 | |
| 
 | |
| After we create each of the blocks, we use the `tag_mappings`
 | |
| to add cases to the switch instruction, using `addCase`. Finally,
 | |
| we set the builder's insertion point to the safety block,
 | |
| meaning that the next instructions will insert their
 | |
| LLVM IR into that block. Since we have all branches
 | |
| jump to the safety block at the end, this means that
 | |
| no matter which branch we take in the case expression, 
 | |
| we will still execute the subsequent instructions as expected.
 | |
| 
 | |
| Let's now look at `instruction_binop`:
 | |
| 
 | |
| {{< codelines "C++" "compiler/08/instruction.cpp" 139 150 >}}
 | |
| 
 | |
| In this instruction, we pop and unwrap two integers from
 | |
| the stack (assuming they are integers). Depending on
 | |
| the type of operation the instruction is set to, we
 | |
| then push the result of the corresponding LLVM
 | |
| instruction. `PLUS` calls LLVM's `CreateAdd` to insert
 | |
| addition, `MINUS` calls `CreateSub`, and so on. No matter
 | |
| what the operation was, we push the result onto the stack.
 | |
| 
 | |
| That's all for our instructions! We're so very close now. Let's
 | |
| move on to compiling definitions.
 | |
| 
 | |
| ### Definitions to LLVM IR
 | |
| As with typechecking, to allow for mutually recursive functions,
 | |
| we need to be able each global function from any other function.
 | |
| We then take the same approah as before, going in two passes.
 | |
| This leads to two new methods for `definition`:
 | |
| 
 | |
| ```C++
 | |
| virtual void gen_llvm_first(llvm_context& ctx) = 0;
 | |
| virtual void gen_llvm_second(llvm_context& ctx) = 0;
 | |
| ```
 | |
| 
 | |
| The first pass is intended to register all functions into
 | |
| the `llvm_context`, making them visible to other functions.
 | |
| The second pass is used to actually generate the code for
 | |
| each function, now having access to all the other global
 | |
| functions. Let's see the implementation for `gen_llvm_first`
 | |
| for `definition_defn`:
 | |
| 
 | |
| {{< codelines "C++" "compiler/08/definition.cpp" 58 60 >}}
 | |
| 
 | |
| Since `create_custom_function` already creates a function
 | |
| __and__ registers it with `llvm_context`, this is
 | |
| all we need. Note that we created a new member variable
 | |
| for `definition_defn` which stores this newly created
 | |
| function. In the second pass, we will populate this
 | |
| function with LLVM IR from the definition's instructions.
 | |
| 
 | |
| We actually create functions for each of the constructors
 | |
| of data types, but they're quite special: all they do is
 | |
| pack their arguments! Since they don't need access to
 | |
| the other global functions, we might as well create
 | |
| their bodies then and there:
 | |
| 
 | |
| {{< codelines "C++" "compiler/08/definition.cpp" 101 112 >}}
 | |
| 
 | |
| Like in `definition_defn`, we use `create_custom_function`.
 | |
| However, we then use `SetInsertPoint` to configure our builder to insert code into
 | |
| the newly created function (which already has a `BasicBlock`,
 | |
| thanks to that one previously unexplained line in `create_custom_function`!).
 | |
| Since we decided to only include the Pack instruction, we generate
 | |
| a call to it directly using `create_pack`. We follow this
 | |
| up with `CreateRetVoid`, which tells LLVM that this is
 | |
| the end of the function, and that it is now safe to return
 | |
| from it.
 | |
| 
 | |
| Great! We now implement the second pass of `gen_llvm`. In
 | |
| the case of `definition_defn`, we do almost exactly
 | |
| what we did in the first pass of `definition_data`:
 | |
| 
 | |
| {{< codelines "C++" "compiler/08/definition.cpp" 62 68 >}}
 | |
| 
 | |
| As for `definition_data`, we have nothing to do in the
 | |
| second pass. We're done!
 | |
| 
 | |
| ### Getting Results
 | |
| We're almost there. Two things remain. The first: our implementation
 | |
| of `ast_binop`, implement each binary operation as simply a function call:
 | |
| `+` calls `f_plus`, and so on. But so far, we have not implemented
 | |
| `f_plus`, or any other binary operator function. We do this
 | |
| in `main.cpp`, creating a function `gen_llvm_internal_op`:
 | |
| 
 | |
| {{< codelines "C++" "compiler/08/main.cpp" 70 83 >}}
 | |
| 
 | |
| We create a simple function body. We then append G-machine
 | |
| instructions that take each argument, evaluate it,
 | |
| and then perform the corresponding binary operation.
 | |
| With these instructions in the body, we insert
 | |
| them into a new function, just like we did in our code
 | |
| for `definition_defn` and `definition_data`.
 | |
| 
 | |
| Finally, we write our `gen_llvm` function that we will
 | |
| call from `main`:
 | |
| 
 | |
| {{< codelines "C++" "compiler/08/main.cpp" 125 141 >}}
 | |
| 
 | |
| It first creates the functions for
 | |
| `+`, `-`, `*`, and `/`. Then, it calls the first
 | |
| pass of `gen_llvm` on all definitions, followed
 | |
| by the second pass. Lastly, it uses LLVM's built-in
 | |
| functionality to print out the generated IR in
 | |
| our module, and then uses a function `output_llvm`
 | |
| to create an object file ready for linking.
 | |
| 
 | |
| To be very honest, I took the `output_llvm` function
 | |
| almost entirely from instructional material for my university's
 | |
| compilers course. The gist of it, though, is: we determine
 | |
| the target architecture and platform, specify a "generic" CPU,
 | |
| create a default set of options, and then generate an object file.
 | |
| Here it is: 
 | |
| 
 | |
| {{< codelines "C++" "compiler/08/main.cpp" 85 123 >}}
 | |
| 
 | |
| We now add a `generate_llvm` call to `main`.
 | |
| 
 | |
| Are we there?
 | |
| 
 | |
| Let's try to compile our first example, `works1.txt`. The
 | |
| file:
 | |
| 
 | |
| {{< rawblock "compiler/08/examples/works1.txt" >}}
 | |
| 
 | |
| We run the following commands in our build directory:
 | |
| 
 | |
| ```
 | |
| ./compiler < ../examples/work1.txt
 | |
| gcc -no-pie ../runtime.c program.o
 | |
| ./a.out
 | |
| ```
 | |
| 
 | |
| Nothing happens. How anticlimactic! Our runtime has no way of
 | |
| printing out the result of the evaluation. Let's change that:
 | |
| 
 | |
| {{< codelines "C++" "compiler/08/runtime.c" 157 183 >}}
 | |
| 
 | |
| Rerunning our commands, we get:
 | |
| 
 | |
| ```
 | |
| Result: 326
 | |
| ```
 | |
| 
 | |
| The correct result! Let's try it with `works2.txt`:
 | |
| 
 | |
| {{< rawblock "compiler/08/examples/works2.txt" >}}
 | |
| 
 | |
| And again, we get the right answer:
 | |
| 
 | |
| ```
 | |
| Result: 326
 | |
| ```
 | |
| 
 | |
| This is child's play, though. Let's try with something
 | |
| more complicated, like `works3.txt`:
 | |
| 
 | |
| {{< rawblock "compiler/08/examples/works3.txt" >}}
 | |
| 
 | |
| Once again, our program does exactly what we intended:
 | |
| 
 | |
| ```
 | |
| Result: 3
 | |
| ```
 | |
| 
 | |
| Alright, this is neat, but we haven't yet confirmed that
 | |
| lazy evaluation works. How about we try it with
 | |
| `works5.txt`:
 | |
| 
 | |
| {{< rawblock "compiler/08/examples/works5.txt" >}}
 | |
| 
 | |
| Yet again, the program works:
 | |
| 
 | |
| ```
 | |
| Result: 9
 | |
| ```
 | |
| 
 | |
| At last, we have a working compiler!
 | |
| 
 | |
| While this is a major victory, we are not yet
 | |
| finished with the compiler altogether. While
 | |
| we allocate nodes whenever we need them, we
 | |
| have not once uttered the phrase `free` in our
 | |
| runtime. Our language works, but we have no way
 | |
| of comparing numbers, no lambdas, no `let/in`.
 | |
| In the next several posts, we will improve
 | |
| our compiler to properly free unused memory
 | |
| usign a __garbage collector__, implement
 | |
| lambda functions using __lambda lifting__,
 | |
| and use our Alloc instruction to implement `let/in` expressions.
 | |
| We get started on the first of these tasks in
 | |
| [Part 9 - Garbage Collection]({{< relref "09_compiler_garbage_collection.md" >}}).
 |