163 lines
		
	
	
		
			7.3 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			163 lines
		
	
	
		
			7.3 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| ---
 | |
| title: Compiling a Functional Language Using C++, Part 7 - Runtime
 | |
| date: 2019-08-06T14:26:38-07:00
 | |
| tags: ["C", "C++", "Functional Languages", "Compilers"]
 | |
| series: "Compiling a Functional Language using C++"
 | |
| description: "In this post, we implement the supporting code that will be shared between all executables our compiler will create."
 | |
| ---
 | |
| Wikipedia has the following definition for a __runtime__:
 | |
| 
 | |
| > A [runtime] primarily implements portions of an execution model.
 | |
| 
 | |
| We know what our execution model is! We talked about it in Part 5 - it's the
 | |
| lazy graph reduction we've specified. Creating and manipulating
 | |
| graph nodes is slightly above hardware level, and all programs in our
 | |
| functional language will rely on such manipulation (it's how they run!). Furthermore,
 | |
| most G-machine instructions are also above hardware level (especially unwind!).
 | |
| 
 | |
| Push and Slide and other instructions are pretty complex.
 | |
| Most computers aren't stack machines. We'll have to implement
 | |
| our own stack, and whenever a graph-building function will want to modify
 | |
| the stack, it will have to call library routines for our stack implementation:
 | |
| 
 | |
| ```C
 | |
| void stack_push(struct stack* s, struct node_s* n);
 | |
| struct node_s* stack_slide(struct stack* s, size_t c);
 | |
| /* other stack operations */
 | |
| ```
 | |
| 
 | |
| Furthermore, we observe that Unwind does a lot of the heavy lifting in our
 | |
| G-machine definition. After we build the graph,
 | |
| Unwind is what picks it apart and performs function calls. Furthermore,
 | |
| Unwind pushes Unwind back on the stack: once you've hit it,
 | |
| you're continuing to Unwind until you reach a function call. This
 | |
| effectively means we can implement Unwind as a loop:
 | |
| 
 | |
| ```C
 | |
| while(1) {
 | |
|     // Check for Unwind's first rule
 | |
|     // Check for Unwind's second rule
 | |
|     // ...
 | |
| }
 | |
| ```
 | |
| 
 | |
| In this implementation, Unwind is in charge. We won't need to insert
 | |
| the Unwind operations at the end of our generated functions, and you
 | |
| may have noticed we've already been following this strategy in our
 | |
| implementation of the G-machine compilation.
 | |
| 
 | |
| We can start working on an implementation of the runtime right now,
 | |
| beginning with the nodes:
 | |
| 
 | |
| {{< codelines "C++" "compiler/07/runtime.h" 4 50 >}}
 | |
| 
 | |
| We have a variety of different nodes that can be on the stack, but without
 | |
| the magic of C++'s `vtable` and RTTI, we have to take care of the bookkeeping
 | |
| ourselves. We add an enum, `node_tag`, which we will use to indicate what
 | |
| type of node we're looking at. We also add a "base class" `node_base`, which
 | |
| contains the fields that all nodes must contain (only `tag` at the moment).
 | |
| We then add to the beginning of each node struct a member of type
 | |
| `node_base`. With this, a pointer to a node struct can be interpreted as a pointer
 | |
| to `node_base`, which is our lowest common denominator. To go back, we
 | |
| check the `tag` of `node_base`, and cast the pointer appropriately. This way,
 | |
| we mimic inheritance, in a very basic manner.
 | |
| 
 | |
| We also add an `alloc_node`, which allocates a region of memory big enough
 | |
| to be any node. We do this because we sometimes mutate nodes (replacing
 | |
| expressions with the results of their evaluation), changing their type.
 | |
| We then want to be able to change a node without reallocating memory.
 | |
| Since the biggest node we have is `node_app`, that's the one we choose.
 | |
| 
 | |
| Finally, to make it easier to create nodes from our generated code,
 | |
| we add helper functions like `alloc_num`, which allocate a given
 | |
| node type, and set its tag and member fields appropriately. We
 | |
| don't include such a function for `node_data`, since this
 | |
| node will be created only in one possible way.
 | |
| 
 | |
| Here's the implementation:
 | |
| {{< codelines "C" "compiler/07/runtime.c" 6 40 >}}
 | |
| 
 | |
| We now move on to implement some stack operations. Let's list them:
 | |
| 
 | |
| * `stack_init` and `stack_free` - one allocates memory for the stack,
 | |
| the other releases it.
 | |
| * `stack_push`, `stack_pop` and `stack_peek` - the classic stack operations.
 | |
| We have `_peek` to take an offset, so we can peek relative to the top of the stack.
 | |
| * `stack_popn` - pop off some number of nodes instead of one.
 | |
| * `stack_slide` - the slide we specified in the semantics. Keeps the top, deletes the
 | |
| next several nodes.
 | |
| * `stack_update` - turns the node at the offset into an indirection to the result,
 | |
| which we will use for lazy evaluation (modifying expressions with their reduced forms).
 | |
| * `stack_alloc` - allocate indirection nodes on the stack. We will use this later.
 | |
| * `stack_pack` and `stack_split` - Wrap and unwrap constructors on the stack.
 | |
| 
 | |
| We declare these in a header:
 | |
| {{< codelines "C" "compiler/07/runtime.h" 52 68 >}}
 | |
| 
 | |
| And implement them as follows:
 | |
| {{< codelines "C" "compiler/07/runtime.c" 42 116 >}}
 | |
| 
 | |
| Let's now talk about how this will connect to the code we generate. To get
 | |
| a quick example, consider the `node_global` struct that we have declared above.
 | |
| It has a member `function`, which is a __function pointer__ to a function
 | |
| that takes a stack and returns void.
 | |
| 
 | |
| When we finally generate machine code for each of the functions
 | |
| we have in our program, it will be made up of sequences of G-machine
 | |
| operations expressed using assembly instructions. These instructions will still
 | |
| have to manipulate the G-machine stack (they still represent G-machine operations!),
 | |
| and thus, the resulting assembly subroutine will take as parameter a stack. It will
 | |
| then construct the function's graph on that stack, as we've already seen. Thus,
 | |
| we express a compiled top-level function as a subroutine that takes a stack,
 | |
| and returns void. A global node holds in it the pointer to the function that it will call.
 | |
| 
 | |
| When our program will start, it will assume that there exists a top-level
 | |
| function `f_main` that takes 0 parameters. It will take that function, call it
 | |
| to produce the initial graph, and then let the unwind loop take care of the evaluation.
 | |
| 
 | |
| Thus, our program will initially look like this:
 | |
| {{< codelines "C" "compiler/07/runtime.c" 154 159 >}}
 | |
| 
 | |
| As we said, we expect an externally-declared subroutine `f_main`. We construct
 | |
| a global node for `f_main` with arity 0, and then start the execution using a function `eval`.
 | |
| What's `eval`, though? It's the function that will take care of creating
 | |
| a new stack, and evaluating the node that is passed to it using
 | |
| our unwind loop. `eval` itself is pretty terse:
 | |
| 
 | |
| {{< codelines "C" "compiler/07/runtime.c" 144 152 >}}
 | |
| 
 | |
| We create a fresh program stack, start it off with whatever node
 | |
| we want to evaluate, and have `unwind` take care of the rest.
 | |
| 
 | |
| `unwind` is a direct implementation of the rules from Part 5:
 | |
| 
 | |
| {{< codelines "C" "compiler/07/runtime.c" 118 142 >}}
 | |
| 
 | |
| We can now come up with some simple programs. Let's try
 | |
| writing out, by hand, `main = { 320 + 6 }`. We end up with:
 | |
| 
 | |
| {{< codeblock "C" "compiler/07/examples/runtime1.c" >}}
 | |
| 
 | |
| If we add to the bottom of our `main` the following code:
 | |
| ```C
 | |
| printf("%d\n", ((struct node_num*) result)->value);
 | |
| ```
 | |
| 
 | |
| And compile and run our code:
 | |
| ```
 | |
| gcc runtime.c examples/runtime1.c
 | |
| ./a.out
 | |
| ```
 | |
| 
 | |
| We get the output `326`, which is exactly correct!
 | |
| 
 | |
| We now have a common set of functions and declarations
 | |
| that serve to support the code we generate from our compiler.
 | |
| Although this time, we wrote out `f_main` by hand, we will soon
 | |
| use LLVM to generate code for `f_main` and more. Once we get
 | |
| that going, we be able to compile our code!
 | |
| 
 | |
| Next time, we will start work on converting our G-machine instructions
 | |
| into machine code. We will set up LLVM and get our very first
 | |
| fully functional compiled programs in [Part 8 - LLVM]({{< relref "08_compiler_llvm.md" >}}).
 |