From 44abf877b289e0a8e42e893117be3f4236539f9b Mon Sep 17 00:00:00 2001 From: Danila Fedorin Date: Wed, 4 Sep 2019 21:10:06 -0700 Subject: [PATCH] Add beginning of part 6 of compiler series --- content/blog/06_compiler_semantics.md | 91 +++++++++++++++++++++++++++ 1 file changed, 91 insertions(+) create mode 100644 content/blog/06_compiler_semantics.md diff --git a/content/blog/06_compiler_semantics.md b/content/blog/06_compiler_semantics.md new file mode 100644 index 0000000..87fa1a6 --- /dev/null +++ b/content/blog/06_compiler_semantics.md @@ -0,0 +1,91 @@ +--- +title: Compiling a Functional Language Using C++, Part 6 - Compilation +date: 2019-08-06T14:26:38-07:00 +draft: true +tags: ["C and C++", "Functional Languages", "Compilers"] +--- +In the previous post, we defined a magine for graph reduction, +called a G-machine. However, this machine is still not particularly +connected to __our__ language. In this post, we will give +meanings to programs in our language in the context of +this G-machine. We will define a __compilation scheme__, +which will be a set of rules that tell us how to +translate programs in our language into G-machine instructions. +To mirror _Implementing Functional Languages: a tutorial_, we'll +call this compilation scheme \\(\\mathcal{C}\\), and write it +as \\(\\mathcal{C} ⟦e⟧ = i\\), meaning "the expression \\(e\\) +compiles to the instructions \\(i\\)". + +To follow our route from the typechecking, let's start +with compiling expressions that are numbers. It's pretty easy: +$$ +\\mathcal{C} ⟦n⟧ = [\\text{PushInt} \\; n] +$$ + +Here, we compiled a number expression to a list of +instructions with only one element - PushInt. + +Just like when we did typechecking, let's +move on to compiling function applications. As +we informally stated in the previous chapter, since +the thing we're applying has to be on top, +we want to compile it last: + +$$ +\\mathcal{C} ⟦e\_1 \; e\_2⟧ = \\mathcal{C} ⟦e\_2⟧ ⧺ \\mathcal{C} ⟦e\_1⟧ ⧺ [\\text{MkApp}] +$$ + +Here, we used the \\(⧺\\) operator to represent the concatenation of two +lists. Otherwise, this should be pretty intutive - we first run the instructions +to create the parameter, then we run the instructions to create the function, +and finally, we combine them using MkApp. + +It's variables that once again force us to adjust our strategy. If our +program is well-typed, we know our variable will be on the stack: +our definition of Unwind makes it so for functions, and we will +define our case expression compilation scheme to match. However, +we still need to know __where__ on the stack each variable is, +and this changes as the stack is modified. + +To accommodate for this, we define an environment, \\(\\rho\\), +to be a partial function mapping variable names to thier +offsets on the stack. We write \\(\\rho = [x \\rightarrow n, y \\rightarrow m]\\) +to say "the environment \\(\\rho\\) maps variable \\(x\\) to stack offset \\(n\\), +and variable \\(y\\) to stack offset \\(m\\)". We also write \\(\\rho \; x\\) to +say "look up \\(x\\) in \\(\\rho\\)", since \\(\\rho\\) is a function. Finally, +to help with the ever-changing stack, we define an augmented environment +\\(\\rho^{+n}\\), such that \\(\\rho^{+n} \; x = \\rho \; x + n\\). In words, +this basically means "\\(\\rho^{+n}\\) has all the variables from \\(\\rho\\), +but their addresses are incremented by \\(n\\)". We now pass \\(\\rho\\) +in to \\(\\mathcal{C}\\) together with the expression \\(e\\). Let's +rewrite our first two rules. For numbers: + +$$ +\\mathcal{C} ⟦n⟧ \; \\rho = [\\text{PushInt} \\; n] +$$ + +For function application: +$$ +\\mathcal{C} ⟦e\_1 \; e\_2⟧ \; \\rho = \\mathcal{C} ⟦e\_2⟧ \; \\rho ⧺ \\mathcal{C} ⟦e\_1⟧ \; \\rho^{+1} ⧺ [\\text{MkApp}] +$$ + +Notice how in that last rule, we passed in \\(\\rho^{+1}\\) when compiling the function's expression. This is because +the result of running the instructions for \\(e\_2\\) will have left on the stack the function's parameter. Whatever +was at the top of the stack (and thus, had index 0), is now the second element from the top (address 1). The +same is true for all other things that were on the stack. So, we increment the environment accordingly. + +With the environment, the variable rule is simple: +$$ +\\mathcal{C} ⟦x⟧ \; \\rho = [\\text{Push} \\; (\\rho \; x)] +$$ + +One more thing. If we run across a function name, we want to +use PushGlobal rather than Push. Defining \\(f\\) to be a name +of a global function, we capture this using the following rule: + +$$ +\\mathcal{C} ⟦f⟧ \; \\rho = [\\text{PushGlobal} \\; f] +$$ + +Next up, case expressions. These are a bit more complex: there are several +branches, each of which will have its own environment.