blog-static/content/blog/07_compiler_runtime.md

162 lines
7.1 KiB
Markdown

---
title: Compiling a Functional Language Using C++, Part 7 - Runtime
date: 2019-08-06T14:26:38-07:00
draft: true
tags: ["C and C++", "Functional Languages", "Compilers"]
---
Wikipedia has the following definition for a __runtime__:
> A [runtime] primarily implements portions of an execution model.
We know what our execution model is! We talked about it in Part 5 - it's the
lazy graph reduction we've specified. Creating and manipulating
graph nodes is slightly above hardware level, and all programs in our
functional language will rely on such manipulation (it's how they run!). Furthermore,
most G-machine instructions are also above hardware level (especially unwind!).
Push and Slide and other instructions are pretty complex.
Most computers aren't stack machines. We'll have to implement
our own stack, and whenever a graph-building function will want to modify
the stack, it will have to call library routines for our stack implementation:
```C
void stack_push(struct stack* s, struct node_s* n);
struct node_s* stack_slide(struct stack* s, size_t c);
/* other stack operations */
```
Furthermore, we observe that Unwind does a lot of the heavy lifting in our
G-machine definition. After we build the graph,
Unwind is what picks it apart and performs function calls. Furthermore,
Unwind pushes Unwind back on the stack: once you've hit it,
you're continuing to Unwind until you reach a function call. This
effectively means we can implement Unwind as a loop:
```C
while(1) {
// Check for Unwind's first rule
// Check for Unwind's second rule
// ...
}
```
In this implementation, Unwind is in charge. We won't need to insert
the Unwind operations at the end of our generated functions, and you
may have noticed we've already been following this strategy in our
implementation of the G-machine compilation.
We can start working on an implementation of the runtime right now,
beginning with the nodes:
{{< codelines "C++" "compiler/07/runtime.h" 4 50 >}}
We have a variety of different nodes that can be on the stack, but without
the magic of C++'s `vtable` and RTTI, we have to take care of the bookkeeping
ourselves. We add an enum, `node_tag`, which we will use to indicate what
type of node we're looking at. We also add a "base class" `node_base`, which
contains the fields that all nodes must contain (only `tag` at the moment).
We then add to the beginning of each node struct a member of type
`node_base`. With this, a pointer to a node struct can be interpreted as a pointer
to `node_base`, which is our lowest common denominator. To go back, we
check the `tag` of `node_base`, and cast the pointer appropriately. This way,
we mimic inheritance, in a very basic manner.
We also add an `alloc_node`, which allocates a region of memory big enough
to be any node. We do this because we sometimes mutate nodes (replacing
expressions with the results of their evaluation), changing their type.
We then want to be able to change a node without reallocating memory.
Since the biggest node we have is `node_app`, that's the one we choose.
Finally, to make it easier to create nodes from our generated code,
we add helper functions like `alloc_num`, which allocate a given
node type, and set its tag and member fields appropriately. We
don't include such a function for `node_data`, since this
node will be created only in one possible way.
Here's the implementation:
{{< codelines "C" "compiler/07/runtime.c" 6 40 >}}
We now move on to implement some stack operations. Let's list them:
* `stack_init` and `stack_free` - one allocates memory for the stack,
the other releases it.
* `stack_push`, `stack_pop` and `stack_peek` - the classic stack operations.
We have `_peek` to take an offset, so we can peek relative to the top of the stack.
* `stack_popn` - pop off some number of nodes instead of one.
* `stack_slide` - the slide we specified in the semantics. Keeps the top, deletes the
next several nodes.
* `stack_update` - turns the node at the offset into an indirection to the result,
which we will use for lazy evaluation (modifying expressions with their reduced forms).
* `stack_alloc` - allocate indirection nodes on the stack. We will use this later.
* `stack_pack` and `stack_split` - Wrap and unwrap constructors on the stack.
We declare these in a header:
{{< codelines "C" "compiler/07/runtime.h" 52 68 >}}
And implement them as follows:
{{< codelines "C" "compiler/07/runtime.c" 42 116 >}}
Let's now talk about how this will connect to the code we generate. To get
a quick example, consider the `node_global` struct that we have declared above.
It has a member `function`, which is a __function pointer__ to a function
that takes a stack and returns void.
When we finally generate machine code for each of the functions
we have in our program, it will be made up of sequences of G-machine
operations expressed using assembly instructions. These instructions will still
have to manipulate the G-machine stack (they still represent G-machine operations!),
and thus, the resulting assembly subroutine will take as parameter a stack. It will
then construct the function's graph on that stack, as we've already seen. Thus,
we express a compiled top-level function as a subroutine that takes a stack,
and returns void. A global node holds in it the pointer to the function that it will call.
When our program will start, it will assume that there exists a top-level
function `f_main` that takes 0 parameters. It will take that function, call it
to produce the initial graph, and then let the unwind loop take care of the evaluation.
Thus, our program will initially look like this:
{{< codelines "C" "compiler/07/runtime.c" 154 159 >}}
As we said, we expect an externally-declared subroutine `f_main`. We construct
a global node for `f_main` with arity 0, and then start the execution using a function `eval`.
What's `eval`, though? It's the function that will take care of creating
a new stack, and evaluating the node that is passed to it using
our unwind loop. `eval` itself is pretty terse:
{{< codelines "C" "compiler/07/runtime.c" 144 152 >}}
We create a fresh program stack, start it off with whatever node
we want to evaluate, and have `unwind` take care of the rest.
`unwind` is a direct implementation of the rules from Part 5:
{{< codelines "C" "compiler/07/runtime.c" 118 142 >}}
We can now come up with some simple programs. Let's try
writing out, by hand, `main = { 320 + 6 }`. We end up with:
{{< codeblock "C" "compiler/07/examples/runtime1.c" >}}
If we add to the bottom of our `main` the following code:
```C
printf("%d\n", ((struct node_num*) result)->value);
```
And compile and run our code:
```
gcc runtime.c examples/runtime1.c
./a.out
```
We get the output `326`, which is exactly correct!
We now have a common set of functions and declarations
that serve to support the code we generate from our compiler.
Although this time, we wrote out `f_main` by hand, we will soon
use LLVM to generate code for `f_main` and more. Once we get
that going, we be able to compile our code!
Next time, we will start work on converting our G-machine instructions
into machine code. We will set up LLVM and get our very first
fully functional compiled programs in Part 8 - LLVM.