Add beginning of part 6 of compiler series

2019-09-04 21:10:06 -07:00 · 2019-09-04 21:10:06 -07:00 · 44abf877b2
commit 44abf877b2
parent 1fdccd6fae
1 changed files with 91 additions and 0 deletions
--- a/content/blog/06_compiler_semantics.md
+++ b/content/blog/06_compiler_semantics.md
@ -0,0 +1,91 @@
 ---
 title: Compiling a Functional Language Using C++, Part 6 - Compilation
 date: 2019-08-06T14:26:38-07:00
 draft: true
 tags: ["C and C++", "Functional Languages", "Compilers"]
 ---
 In the previous post, we defined a magine for graph reduction,
 called a G-machine. However, this machine is still not particularly
 connected to __our__ language. In this post, we will give
 meanings to programs in our language in the context of
 this G-machine. We will define a __compilation scheme__,
 which will be a set of rules that tell us how to
 translate programs in our language into G-machine instructions.
 To mirror _Implementing Functional Languages: a tutorial_, we'll
 call this compilation scheme \\(\\mathcal{C}\\), and write it
 as \\(\\mathcal{C} ⟦e⟧ = i\\), meaning "the expression \\(e\\)
 compiles to the instructions \\(i\\)".
 To follow our route from the typechecking, let's start
 with compiling expressions that are numbers. It's pretty easy:
 $$
 \\mathcal{C} ⟦n⟧ = [\\text{PushInt} \\; n]
 $$
 Here, we compiled a number expression to a list of
 instructions with only one element - PushInt.
 Just like when we did typechecking, let's
 move on to compiling function applications. As
 we informally stated in the previous chapter, since
 the thing we're applying has to be on top,
 we want to compile it last:
 $$
 \\mathcal{C} ⟦e\_1 \; e\_2⟧ = \\mathcal{C} ⟦e\_2⟧ ⧺ \\mathcal{C} ⟦e\_1⟧ ⧺ [\\text{MkApp}]
 $$
 Here, we used the \\(⧺\\) operator to represent the concatenation of two
 lists. Otherwise, this should be pretty intutive - we first run the instructions
 to create the parameter, then we run the instructions to create the function,
 and finally, we combine them using MkApp.
 It's variables that once again force us to adjust our strategy. If our
 program is well-typed, we know our variable will be on the stack:
 our definition of Unwind makes it so for functions, and we will
 define our case expression compilation scheme to match. However,
 we still need to know __where__ on the stack each variable is,
 and this changes as the stack is modified.
 To accommodate for this, we define an environment, \\(\\rho\\),
 to be a partial function mapping variable names to thier
 offsets on the stack. We write \\(\\rho = [x \\rightarrow n, y \\rightarrow m]\\)
 to say "the environment \\(\\rho\\) maps variable \\(x\\) to stack offset \\(n\\),
 and variable \\(y\\) to stack offset \\(m\\)". We also write \\(\\rho \; x\\) to
 say "look up \\(x\\) in \\(\\rho\\)", since \\(\\rho\\) is a function. Finally,
 to help with the ever-changing stack, we define an augmented environment
 \\(\\rho^{+n}\\), such that \\(\\rho^{+n} \; x = \\rho \; x + n\\). In words,
 this basically means "\\(\\rho^{+n}\\) has all the variables from \\(\\rho\\),
 but their addresses are incremented by \\(n\\)". We now pass \\(\\rho\\)
 in to \\(\\mathcal{C}\\) together with the expression \\(e\\). Let's
 rewrite our first two rules. For numbers:
 $$
 \\mathcal{C} ⟦n⟧ \; \\rho = [\\text{PushInt} \\; n]
 $$
 For function application:
 $$
 \\mathcal{C} ⟦e\_1 \; e\_2⟧ \; \\rho = \\mathcal{C} ⟦e\_2⟧ \; \\rho ⧺ \\mathcal{C} ⟦e\_1⟧ \; \\rho^{+1} ⧺ [\\text{MkApp}]
 $$
 Notice how in that last rule, we passed in \\(\\rho^{+1}\\) when compiling the function's expression. This is because
 the result of running the instructions for \\(e\_2\\) will have left on the stack the function's parameter. Whatever
 was at the top of the stack (and thus, had index 0), is now the second element from the top (address 1). The
 same is true for all other things that were on the stack. So, we increment the environment accordingly.
 With the environment, the variable rule is simple:
 $$
 \\mathcal{C} ⟦x⟧ \; \\rho = [\\text{Push} \\; (\\rho \; x)]
 $$
 One more thing. If we run across a function name, we want to
 use PushGlobal rather than Push. Defining \\(f\\) to be a name
 of a global function, we capture this using the following rule:
 $$
 \\mathcal{C} ⟦f⟧ \; \\rho = [\\text{PushGlobal} \\; f]
 $$
 Next up, case expressions. These are a bit more complex: there are several
 branches, each of which will have its own environment.