Update "compiler: compilation" to use new math delimiters
Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
This commit is contained in:
parent
d3fa7336a2
commit
bee06b6731
@ -13,9 +13,9 @@ this G-machine. We will define a __compilation scheme__,
|
||||
which will be a set of rules that tell us how to
|
||||
translate programs in our language into G-machine instructions.
|
||||
To mirror _Implementing Functional Languages: a tutorial_, we'll
|
||||
call this compilation scheme \\(\\mathcal{C}\\), and write it
|
||||
as \\(\\mathcal{C} ⟦e⟧ = i\\), meaning "the expression \\(e\\)
|
||||
compiles to the instructions \\(i\\)".
|
||||
call this compilation scheme \(\mathcal{C}\), and write it
|
||||
as \(\mathcal{C} ⟦e⟧ = i\), meaning "the expression \(e\)
|
||||
compiles to the instructions \(i\)".
|
||||
|
||||
To follow our route from the typechecking, let's start
|
||||
with compiling expressions that are numbers. It's pretty easy:
|
||||
@ -37,7 +37,7 @@ we want to compile it last:
|
||||
\mathcal{C} ⟦e_1 \; e_2⟧ = \mathcal{C} ⟦e_2⟧ ⧺ \mathcal{C} ⟦e_1⟧ ⧺ [\text{MkApp}]
|
||||
{{< /latex >}}
|
||||
|
||||
Here, we used the \\(⧺\\) operator to represent the concatenation of two
|
||||
Here, we used the \(⧺\) operator to represent the concatenation of two
|
||||
lists. Otherwise, this should be pretty intutive - we first run the instructions
|
||||
to create the parameter, then we run the instructions to create the function,
|
||||
and finally, we combine them using MkApp.
|
||||
@ -49,17 +49,17 @@ define our case expression compilation scheme to match. However,
|
||||
we still need to know __where__ on the stack each variable is,
|
||||
and this changes as the stack is modified.
|
||||
|
||||
To accommodate for this, we define an environment, \\(\\rho\\),
|
||||
To accommodate for this, we define an environment, \(\rho\),
|
||||
to be a partial function mapping variable names to thier
|
||||
offsets on the stack. We write \\(\\rho = [x \\rightarrow n, y \\rightarrow m]\\)
|
||||
to say "the environment \\(\\rho\\) maps variable \\(x\\) to stack offset \\(n\\),
|
||||
and variable \\(y\\) to stack offset \\(m\\)". We also write \\(\\rho \\; x\\) to
|
||||
say "look up \\(x\\) in \\(\\rho\\)", since \\(\\rho\\) is a function. Finally,
|
||||
offsets on the stack. We write \(\rho = [x \rightarrow n, y \rightarrow m]\)
|
||||
to say "the environment \(\rho\) maps variable \(x\) to stack offset \(n\),
|
||||
and variable \(y\) to stack offset \(m\)". We also write \(\rho \; x\) to
|
||||
say "look up \(x\) in \(\rho\)", since \(\rho\) is a function. Finally,
|
||||
to help with the ever-changing stack, we define an augmented environment
|
||||
\\(\\rho^{+n}\\), such that \\(\\rho^{+n} \\; x = \\rho \\; x + n\\). In words,
|
||||
this basically means "\\(\\rho^{+n}\\) has all the variables from \\(\\rho\\),
|
||||
but their addresses are incremented by \\(n\\)". We now pass \\(\\rho\\)
|
||||
in to \\(\\mathcal{C}\\) together with the expression \\(e\\). Let's
|
||||
\(\rho^{+n}\), such that \(\rho^{+n} \; x = \rho \; x + n\). In words,
|
||||
this basically means "\(\rho^{+n}\) has all the variables from \(\rho\),
|
||||
but their addresses are incremented by \(n\)". We now pass \(\rho\)
|
||||
in to \(\mathcal{C}\) together with the expression \(e\). Let's
|
||||
rewrite our first two rules. For numbers:
|
||||
|
||||
{{< latex >}}
|
||||
@ -72,8 +72,8 @@ For function application:
|
||||
\mathcal{C} ⟦e_1 \; e_2⟧ \; \rho = \mathcal{C} ⟦e_2⟧ \; \rho \; ⧺ \;\mathcal{C} ⟦e_1⟧ \; \rho^{+1} \; ⧺ \; [\text{MkApp}]
|
||||
{{< /latex >}}
|
||||
|
||||
Notice how in that last rule, we passed in \\(\\rho^{+1}\\) when compiling the function's expression. This is because
|
||||
the result of running the instructions for \\(e\_2\\) will have left on the stack the function's parameter. Whatever
|
||||
Notice how in that last rule, we passed in \(\rho^{+1}\) when compiling the function's expression. This is because
|
||||
the result of running the instructions for \(e_2\) will have left on the stack the function's parameter. Whatever
|
||||
was at the top of the stack (and thus, had index 0), is now the second element from the top (address 1). The
|
||||
same is true for all other things that were on the stack. So, we increment the environment accordingly.
|
||||
|
||||
@ -84,7 +84,7 @@ With the environment, the variable rule is simple:
|
||||
{{< /latex >}}
|
||||
|
||||
One more thing. If we run across a function name, we want to
|
||||
use PushGlobal rather than Push. Defining \\(f\\) to be a name
|
||||
use PushGlobal rather than Push. Defining \(f\) to be a name
|
||||
of a global function, we capture this using the following rule:
|
||||
|
||||
{{< latex >}}
|
||||
@ -93,8 +93,8 @@ of a global function, we capture this using the following rule:
|
||||
|
||||
Now it's time for us to compile case expressions, but there's a bit of
|
||||
an issue - our case expressions branches don't map one-to-one with
|
||||
the \\(t \\rightarrow i\_t\\) format of the Jump instruction.
|
||||
This is because we allow for name patterns in the form \\(x\\),
|
||||
the \(t \rightarrow i_t\) format of the Jump instruction.
|
||||
This is because we allow for name patterns in the form \(x\),
|
||||
which can possibly match more than one tag. Consider this
|
||||
rather useless example:
|
||||
|
||||
@ -120,8 +120,8 @@ Effectively, we'll be performing [desugaring](https://en.wikipedia.org/wiki/Synt
|
||||
|
||||
Now, on to defining the compilation rules for case expressions.
|
||||
It's helpful to define compiling a single branch of a case expression
|
||||
separately. For a branch in the form \\(t \\; x\_1 \\; x\_2 \\; ... \\; x\_n \\rightarrow \text{body}\\),
|
||||
we define a compilation scheme \\(\\mathcal{A}\\) as follows:
|
||||
separately. For a branch in the form \(t \; x_1 \; x_2 \; ... \; x_n \rightarrow \text{body}\),
|
||||
we define a compilation scheme \(\mathcal{A}\) as follows:
|
||||
|
||||
{{< latex >}}
|
||||
\begin{aligned}
|
||||
@ -133,15 +133,15 @@ t \rightarrow [\text{Split} \; n] \; ⧺ \; \mathcal{C}⟦\text{body}⟧ \; \rho
|
||||
|
||||
First, we run Split - the node on the top of the stack is a packed constructor,
|
||||
and we want access to its member variables, since they can be referenced by
|
||||
the branch's body via \\(x\_i\\). For the same reason, we must make sure to include
|
||||
\\(x\_1\\) through \\(x\_n\\) in our environment. Furthermore, since the split values now occupy the stack,
|
||||
we have to offset our environment by \\(n\\) before adding bindings to our new variables.
|
||||
Doing all these things gives us \\(\\rho'\\), which we use to compile the body, placing
|
||||
the branch's body via \(x_i\). For the same reason, we must make sure to include
|
||||
\(x_1\) through \(x_n\) in our environment. Furthermore, since the split values now occupy the stack,
|
||||
we have to offset our environment by \(n\) before adding bindings to our new variables.
|
||||
Doing all these things gives us \(\rho'\), which we use to compile the body, placing
|
||||
the resulting instructions after Split. This leaves us with the desired graph on top of
|
||||
the stack - the only thing left to do is to clean up the stack of the unpacked values,
|
||||
which we do using Slide.
|
||||
|
||||
Notice that we didn't just create instructions - we created a mapping from the tag \\(t\\)
|
||||
Notice that we didn't just create instructions - we created a mapping from the tag \(t\)
|
||||
to the instructions that correspond to it.
|
||||
|
||||
Now, it's time for compiling the whole case expression. We first want
|
||||
@ -155,7 +155,7 @@ is captured by the following rule:
|
||||
\mathcal{C} ⟦e⟧ \; \rho \; ⧺ [\text{Eval}, \text{Jump} \; [\mathcal{A} ⟦\text{alt}_1⟧ \; \rho, ..., \mathcal{A} ⟦\text{alt}_n⟧ \; \rho]]
|
||||
{{< /latex >}}
|
||||
|
||||
This works because \\(\\mathcal{A}\\) creates not only instructions,
|
||||
This works because \(\mathcal{A}\) creates not only instructions,
|
||||
but also a tag mapping. We simply populate our Jump instruction such mappings
|
||||
resulting from compiling each branch.
|
||||
|
||||
@ -177,7 +177,7 @@ as always, you can look at the full project source code, which is
|
||||
freely available for each post in the series.
|
||||
|
||||
We can now envision a method on the `ast` struct that takes an environment
|
||||
(just like our compilation scheme takes the environment \\(\\rho\\\)),
|
||||
(just like our compilation scheme takes the environment \(\rho\)),
|
||||
and compiles the `ast`. Rather than returning a vector
|
||||
of instructions (which involves copying, unless we get some optimization kicking in),
|
||||
we'll pass a reference to a vector to our method. The method will then place the generated
|
||||
@ -188,7 +188,7 @@ from a variable? A naive solution would be to take a list or map of
|
||||
global functions as a third parameter to our `compile` method.
|
||||
But there's an easier way! We know that the program passed type checking.
|
||||
This means that every referenced variable exists. From then, the situation is easy -
|
||||
if actual variable names are kept in the environment, \\(\\rho\\), then whenever
|
||||
if actual variable names are kept in the environment, \(\rho\), then whenever
|
||||
we see a variable that __isn't__ in the current environment, it must be a function name.
|
||||
|
||||
Having finished contemplating out method, it's time to define a signature:
|
||||
|
Loading…
Reference in New Issue
Block a user