2019-10-15 11:13:13 -07:00
|
|
|
---
|
|
|
|
title: Compiling a Functional Language Using C++, Part 7 - Runtime
|
|
|
|
date: 2019-08-06T14:26:38-07:00
|
|
|
|
tags: ["C and C++", "Functional Languages", "Compilers"]
|
2020-05-09 17:29:37 -07:00
|
|
|
description: "In this post, we implement the supporting code that will be shared between all executables our compiler will create."
|
2019-10-15 11:13:13 -07:00
|
|
|
---
|
|
|
|
Wikipedia has the following definition for a __runtime__:
|
|
|
|
|
|
|
|
> A [runtime] primarily implements portions of an execution model.
|
|
|
|
|
|
|
|
We know what our execution model is! We talked about it in Part 5 - it's the
|
2019-10-30 14:21:13 -07:00
|
|
|
lazy graph reduction we've specified. Creating and manipulating
|
2019-10-15 11:13:13 -07:00
|
|
|
graph nodes is slightly above hardware level, and all programs in our
|
|
|
|
functional language will rely on such manipulation (it's how they run!). Furthermore,
|
|
|
|
most G-machine instructions are also above hardware level (especially unwind!).
|
|
|
|
|
2019-10-30 14:21:13 -07:00
|
|
|
Push and Slide and other instructions are pretty complex.
|
2019-10-15 11:13:13 -07:00
|
|
|
Most computers aren't stack machines. We'll have to implement
|
|
|
|
our own stack, and whenever a graph-building function will want to modify
|
|
|
|
the stack, it will have to call library routines for our stack implementation:
|
|
|
|
|
|
|
|
```C
|
2019-10-26 20:30:29 -07:00
|
|
|
void stack_push(struct stack* s, struct node_s* n);
|
|
|
|
struct node_s* stack_slide(struct stack* s, size_t c);
|
|
|
|
/* other stack operations */
|
2019-10-15 11:13:13 -07:00
|
|
|
```
|
|
|
|
|
|
|
|
Furthermore, we observe that Unwind does a lot of the heavy lifting in our
|
|
|
|
G-machine definition. After we build the graph,
|
|
|
|
Unwind is what picks it apart and performs function calls. Furthermore,
|
|
|
|
Unwind pushes Unwind back on the stack: once you've hit it,
|
|
|
|
you're continuing to Unwind until you reach a function call. This
|
|
|
|
effectively means we can implement Unwind as a loop:
|
|
|
|
|
|
|
|
```C
|
|
|
|
while(1) {
|
|
|
|
// Check for Unwind's first rule
|
|
|
|
// Check for Unwind's second rule
|
|
|
|
// ...
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
In this implementation, Unwind is in charge. We won't need to insert
|
2019-10-26 20:30:29 -07:00
|
|
|
the Unwind operations at the end of our generated functions, and you
|
|
|
|
may have noticed we've already been following this strategy in our
|
|
|
|
implementation of the G-machine compilation.
|
|
|
|
|
|
|
|
We can start working on an implementation of the runtime right now,
|
|
|
|
beginning with the nodes:
|
|
|
|
|
2019-10-30 14:21:13 -07:00
|
|
|
{{< codelines "C++" "compiler/07/runtime.h" 4 50 >}}
|
2019-10-26 20:30:29 -07:00
|
|
|
|
|
|
|
We have a variety of different nodes that can be on the stack, but without
|
|
|
|
the magic of C++'s `vtable` and RTTI, we have to take care of the bookkeeping
|
|
|
|
ourselves. We add an enum, `node_tag`, which we will use to indicate what
|
|
|
|
type of node we're looking at. We also add a "base class" `node_base`, which
|
|
|
|
contains the fields that all nodes must contain (only `tag` at the moment).
|
|
|
|
We then add to the beginning of each node struct a member of type
|
|
|
|
`node_base`. With this, a pointer to a node struct can be interpreted as a pointer
|
|
|
|
to `node_base`, which is our lowest common denominator. To go back, we
|
|
|
|
check the `tag` of `node_base`, and cast the pointer appropriately. This way,
|
|
|
|
we mimic inheritance, in a very basic manner.
|
|
|
|
|
|
|
|
We also add an `alloc_node`, which allocates a region of memory big enough
|
|
|
|
to be any node. We do this because we sometimes mutate nodes (replacing
|
|
|
|
expressions with the results of their evaluation), changing their type.
|
|
|
|
We then want to be able to change a node without reallocating memory.
|
|
|
|
Since the biggest node we have is `node_app`, that's the one we choose.
|
2019-10-30 00:19:56 -07:00
|
|
|
|
2019-10-30 14:21:13 -07:00
|
|
|
Finally, to make it easier to create nodes from our generated code,
|
|
|
|
we add helper functions like `alloc_num`, which allocate a given
|
|
|
|
node type, and set its tag and member fields appropriately. We
|
|
|
|
don't include such a function for `node_data`, since this
|
|
|
|
node will be created only in one possible way.
|
|
|
|
|
|
|
|
Here's the implementation:
|
|
|
|
{{< codelines "C" "compiler/07/runtime.c" 6 40 >}}
|
|
|
|
|
|
|
|
We now move on to implement some stack operations. Let's list them:
|
2019-10-30 00:19:56 -07:00
|
|
|
|
|
|
|
* `stack_init` and `stack_free` - one allocates memory for the stack,
|
|
|
|
the other releases it.
|
|
|
|
* `stack_push`, `stack_pop` and `stack_peek` - the classic stack operations.
|
|
|
|
We have `_peek` to take an offset, so we can peek relative to the top of the stack.
|
|
|
|
* `stack_popn` - pop off some number of nodes instead of one.
|
|
|
|
* `stack_slide` - the slide we specified in the semantics. Keeps the top, deletes the
|
|
|
|
next several nodes.
|
|
|
|
* `stack_update` - turns the node at the offset into an indirection to the result,
|
|
|
|
which we will use for lazy evaluation (modifying expressions with their reduced forms).
|
|
|
|
* `stack_alloc` - allocate indirection nodes on the stack. We will use this later.
|
2019-10-30 14:21:13 -07:00
|
|
|
* `stack_pack` and `stack_split` - Wrap and unwrap constructors on the stack.
|
2019-10-30 00:19:56 -07:00
|
|
|
|
2019-10-30 14:21:13 -07:00
|
|
|
We declare these in a header:
|
2019-11-04 13:17:15 -08:00
|
|
|
{{< codelines "C" "compiler/07/runtime.h" 52 68 >}}
|
2019-10-30 14:21:13 -07:00
|
|
|
|
|
|
|
And implement them as follows:
|
|
|
|
{{< codelines "C" "compiler/07/runtime.c" 42 116 >}}
|
2019-10-30 00:19:56 -07:00
|
|
|
|
2019-10-30 14:21:13 -07:00
|
|
|
Let's now talk about how this will connect to the code we generate. To get
|
2019-10-30 00:19:56 -07:00
|
|
|
a quick example, consider the `node_global` struct that we have declared above.
|
|
|
|
It has a member `function`, which is a __function pointer__ to a function
|
|
|
|
that takes a stack and returns void.
|
|
|
|
|
|
|
|
When we finally generate machine code for each of the functions
|
|
|
|
we have in our program, it will be made up of sequences of G-machine
|
|
|
|
operations expressed using assembly instructions. These instructions will still
|
|
|
|
have to manipulate the G-machine stack (they still represent G-machine operations!),
|
|
|
|
and thus, the resulting assembly subroutine will take as parameter a stack. It will
|
|
|
|
then construct the function's graph on that stack, as we've already seen. Thus,
|
|
|
|
we express a compiled top-level function as a subroutine that takes a stack,
|
|
|
|
and returns void. A global node holds in it the pointer to the function that it will call.
|
|
|
|
|
|
|
|
When our program will start, it will assume that there exists a top-level
|
2019-10-30 14:21:13 -07:00
|
|
|
function `f_main` that takes 0 parameters. It will take that function, call it
|
2019-10-30 00:19:56 -07:00
|
|
|
to produce the initial graph, and then let the unwind loop take care of the evaluation.
|
|
|
|
|
|
|
|
Thus, our program will initially look like this:
|
2019-10-30 14:21:13 -07:00
|
|
|
{{< codelines "C" "compiler/07/runtime.c" 154 159 >}}
|
|
|
|
|
|
|
|
As we said, we expect an externally-declared subroutine `f_main`. We construct
|
|
|
|
a global node for `f_main` with arity 0, and then start the execution using a function `eval`.
|
|
|
|
What's `eval`, though? It's the function that will take care of creating
|
|
|
|
a new stack, and evaluating the node that is passed to it using
|
|
|
|
our unwind loop. `eval` itself is pretty terse:
|
|
|
|
|
|
|
|
{{< codelines "C" "compiler/07/runtime.c" 144 152 >}}
|
|
|
|
|
|
|
|
We create a fresh program stack, start it off with whatever node
|
|
|
|
we want to evaluate, and have `unwind` take care of the rest.
|
|
|
|
|
|
|
|
`unwind` is a direct implementation of the rules from Part 5:
|
|
|
|
|
|
|
|
{{< codelines "C" "compiler/07/runtime.c" 118 142 >}}
|
|
|
|
|
|
|
|
We can now come up with some simple programs. Let's try
|
|
|
|
writing out, by hand, `main = { 320 + 6 }`. We end up with:
|
|
|
|
|
|
|
|
{{< codeblock "C" "compiler/07/examples/runtime1.c" >}}
|
|
|
|
|
|
|
|
If we add to the bottom of our `main` the following code:
|
|
|
|
```C
|
|
|
|
printf("%d\n", ((struct node_num*) result)->value);
|
|
|
|
```
|
|
|
|
|
|
|
|
And compile and run our code:
|
|
|
|
```
|
|
|
|
gcc runtime.c examples/runtime1.c
|
|
|
|
./a.out
|
|
|
|
```
|
|
|
|
|
|
|
|
We get the output `326`, which is exactly correct!
|
|
|
|
|
|
|
|
We now have a common set of functions and declarations
|
|
|
|
that serve to support the code we generate from our compiler.
|
|
|
|
Although this time, we wrote out `f_main` by hand, we will soon
|
|
|
|
use LLVM to generate code for `f_main` and more. Once we get
|
|
|
|
that going, we be able to compile our code!
|
|
|
|
|
|
|
|
Next time, we will start work on converting our G-machine instructions
|
|
|
|
into machine code. We will set up LLVM and get our very first
|
2019-11-06 21:10:53 -08:00
|
|
|
fully functional compiled programs in [Part 8 - LLVM]({{< relref "08_compiler_llvm.md" >}}).
|