Continue work on part 6 of compiler series
This commit is contained in:
parent
77cfeda60d
commit
a69f9f633e

@ 4,7 +4,7 @@ date: 20190806T14:26:3807:00


draft: true


tags: ["C and C++", "Functional Languages", "Compilers"]





In the previous post, we defined a magine for graph reduction,


In the previous post, we defined a machine for graph reduction,


called a Gmachine. However, this machine is still not particularly


connected to __our__ language. In this post, we will give


meanings to programs in our language in the context of



@ 32,7 +32,7 @@ the thing we're applying has to be on top,


we want to compile it last:




$$


\\mathcal{C} ⟦e\_1 \; e\_2⟧ = \\mathcal{C} ⟦e\_2⟧ ⧺ \\mathcal{C} ⟦e\_1⟧ ⧺ [\\text{MkApp}]


\\mathcal{C} ⟦e\_1 \\; e\_2⟧ = \\mathcal{C} ⟦e\_2⟧ ⧺ \\mathcal{C} ⟦e\_1⟧ ⧺ [\\text{MkApp}]


$$




Here, we used the \\(⧺\\) operator to represent the concatenation of two



@ 51,22 +51,22 @@ To accommodate for this, we define an environment, \\(\\rho\\),


to be a partial function mapping variable names to thier


offsets on the stack. We write \\(\\rho = [x \\rightarrow n, y \\rightarrow m]\\)


to say "the environment \\(\\rho\\) maps variable \\(x\\) to stack offset \\(n\\),


and variable \\(y\\) to stack offset \\(m\\)". We also write \\(\\rho \; x\\) to


and variable \\(y\\) to stack offset \\(m\\)". We also write \\(\\rho \\; x\\) to


say "look up \\(x\\) in \\(\\rho\\)", since \\(\\rho\\) is a function. Finally,


to help with the everchanging stack, we define an augmented environment


\\(\\rho^{+n}\\), such that \\(\\rho^{+n} \; x = \\rho \; x + n\\). In words,


\\(\\rho^{+n}\\), such that \\(\\rho^{+n} \\; x = \\rho \\; x + n\\). In words,


this basically means "\\(\\rho^{+n}\\) has all the variables from \\(\\rho\\),


but their addresses are incremented by \\(n\\)". We now pass \\(\\rho\\)


in to \\(\\mathcal{C}\\) together with the expression \\(e\\). Let's


rewrite our first two rules. For numbers:




$$


\\mathcal{C} ⟦n⟧ \; \\rho = [\\text{PushInt} \\; n]


\\mathcal{C} ⟦n⟧ \\; \\rho = [\\text{PushInt} \\; n]


$$




For function application:


$$


\\mathcal{C} ⟦e\_1 \; e\_2⟧ \; \\rho = \\mathcal{C} ⟦e\_2⟧ \; \\rho ⧺ \\mathcal{C} ⟦e\_1⟧ \; \\rho^{+1} ⧺ [\\text{MkApp}]


\\mathcal{C} ⟦e\_1 \\; e\_2⟧ \\; \\rho = \\mathcal{C} ⟦e\_2⟧ \\; \\rho ⧺ \\mathcal{C} ⟦e\_1⟧ \\; \\rho^{+1} ⧺ [\\text{MkApp}]


$$




Notice how in that last rule, we passed in \\(\\rho^{+1}\\) when compiling the function's expression. This is because



@ 76,7 +76,7 @@ same is true for all other things that were on the stack. So, we increment the e




With the environment, the variable rule is simple:


$$


\\mathcal{C} ⟦x⟧ \; \\rho = [\\text{Push} \\; (\\rho \; x)]


\\mathcal{C} ⟦x⟧ \\; \\rho = [\\text{Push} \\; (\\rho \\; x)]


$$




One more thing. If we run across a function name, we want to



@ 84,7 +84,7 @@ use PushGlobal rather than Push. Defining \\(f\\) to be a name


of a global function, we capture this using the following rule:




$$


\\mathcal{C} ⟦f⟧ \; \\rho = [\\text{PushGlobal} \\; f]


\\mathcal{C} ⟦f⟧ \\; \\rho = [\\text{PushGlobal} \\; f]


$$




Now it's time for us to compile case expressions, but there's a bit of



@ 100,5 +100,70 @@ defn weird b = { case b of { b > { False } } }


```




We only have one branch, but we have two tags that should


lead to it!


lead to it! Not only that, but variable patterns are


locationdependent: if a variable pattern comes


before a constructor pattern, then the constructor


pattern will never be reached. On the other hand,


if a constructor pattern comes before a variable


pattern, it will be tried before the varible pattern,


and thus is reachable.




We will ignore this problem for now  we will define our semantics


as though each case expression branch can match exactly one tag.


In our C++ code, we will write a conversion function that will


figure out which tag goes to which sequence of instructions.


Effectively, we'll be performing [desugaring](https://en.wikipedia.org/wiki/Syntactic_sugar).




Now, on to defining the compilation rules for case expressions.


It's helpful to define compiling a single branch of a case expression


separately. For a branch in the form \\(t \\; x\_1 \\; x\_2 \\; ... \\; x\_n \\rightarrow \text{body}\\),


we define a compilation scheme \\(\\mathcal{A}\\) as follows:




$$


\\begin{align}


\\mathcal{A} ⟦t \\; x\_1 \\; ... \\; x\_n \\rightarrow \text{body}⟧ \\; \\rho & =


t \\rightarrow [\\text{Split} \\; n] \\; ⧺ \\; \\mathcal{C}⟦\\text{body}⟧ \\; \\rho' \\; ⧺ \\; [\\text{Slide} \\; n] \\\\\\


\text{where} \\; \\rho' &= \\rho^{+n}[x\_1 \\rightarrow 0, ..., x\_n \\rightarrow n  1]


\\end{align}


$$




First, we run Split  the node on the top of the stack is a packed constructor,


and we want access to its member variables, since they can be referenced by


the branch's body via \\(x\_i\\). For the same reason, we must make sure to include


\\(x\_1\\) through \\(x\_n\\) in our environment. Furthermore, since the split values now occupy the stack,


we have to offset our environment by \\(n\\) before adding bindings to our new variables.


Doing all these things gives us \\(\\rho'\\), which we use to compile the body, placing


the resulting instructions after Split. This leaves us with the desired graph on top of


the stack  the only thing left to do is to clean up the stack of the unpacked values,


which we do using Slide.




Notice that we didn't just create instructions  we created a mapping from the tag \\(t\\)


to the instructions that correspond to it.




Now, it's time for compiling the whole case expression. We first want


to construct the graph for the expression we want to perform case analysis on.


Next, we want to evaluate it (since we need a packed value, not a graph,


to read the tag). Finally, we perform a jump depending on the tag. This


is capture by the following rule:




$$


\\mathcal{C} ⟦\\text{case} \\; e \\; \\text{of} \\; \\text{alt}_1 ... \\text{alt}_n⟧ \\; \\rho =


\\mathcal{C} ⟦e⟧ \\; \\rho \\; ⧺ [\\text{Eval}, \\text{Jump} \\; [\\mathcal{A} ⟦\\text{alt}_1⟧ \; \\rho, ..., \\mathcal{A} ⟦\\text{alt}_n⟧ \; \\rho]]


$$




This works because \\(\\mathcal{A}\\) creates not only instructions,


but also a tag mapping. We simply populate our Jump instruction such mappings


resulting from compiling each branch.




You may have noticed that we didn't add rules for binary operators. Just like


with type checking, we treat them as function calls. However, rather that constructing


graphs when we have to instantiate those functions, we simply


evaluate the arguments and perform the relevant arithmetic operation using BinOp.


We will do a similar thing for constructors.




With that out of the way, we can get around to writing some code. We can envision


a method on the `ast` struct that takes an environment (just like our compilation


scheme takes the environment \\(\\rho\\\)). Rather than returning a vector


of instructions (which involves copying, unless we get some optimization kicking in),


we'll pass to it a reference to a vector. The method will then place the generated


instructions into the vector.




Loading…
Reference in New Issue