Update "compiler: compilation" to use new math delimiters

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
This commit is contained in:
Danila Fedorin 2024-05-13 18:43:14 -07:00
parent d3fa7336a2
commit bee06b6731

View File

@ -13,9 +13,9 @@ this G-machine. We will define a __compilation scheme__,
which will be a set of rules that tell us how to
translate programs in our language into G-machine instructions.
To mirror _Implementing Functional Languages: a tutorial_, we'll
call this compilation scheme \\(\\mathcal{C}\\), and write it
as \\(\\mathcal{C} ⟦e⟧ = i\\), meaning "the expression \\(e\\)
compiles to the instructions \\(i\\)".
call this compilation scheme \(\mathcal{C}\), and write it
as \(\mathcal{C} ⟦e⟧ = i\), meaning "the expression \(e\)
compiles to the instructions \(i\)".
To follow our route from the typechecking, let's start
with compiling expressions that are numbers. It's pretty easy:
@ -37,7 +37,7 @@ we want to compile it last:
\mathcal{C} ⟦e_1 \; e_2⟧ = \mathcal{C} ⟦e_2⟧ ⧺ \mathcal{C} ⟦e_1⟧ ⧺ [\text{MkApp}]
{{< /latex >}}
Here, we used the \\(⧺\\) operator to represent the concatenation of two
Here, we used the \(⧺\) operator to represent the concatenation of two
lists. Otherwise, this should be pretty intutive - we first run the instructions
to create the parameter, then we run the instructions to create the function,
and finally, we combine them using MkApp.
@ -49,17 +49,17 @@ define our case expression compilation scheme to match. However,
we still need to know __where__ on the stack each variable is,
and this changes as the stack is modified.
To accommodate for this, we define an environment, \\(\\rho\\),
To accommodate for this, we define an environment, \(\rho\),
to be a partial function mapping variable names to thier
offsets on the stack. We write \\(\\rho = [x \\rightarrow n, y \\rightarrow m]\\)
to say "the environment \\(\\rho\\) maps variable \\(x\\) to stack offset \\(n\\),
and variable \\(y\\) to stack offset \\(m\\)". We also write \\(\\rho \\; x\\) to
say "look up \\(x\\) in \\(\\rho\\)", since \\(\\rho\\) is a function. Finally,
offsets on the stack. We write \(\rho = [x \rightarrow n, y \rightarrow m]\)
to say "the environment \(\rho\) maps variable \(x\) to stack offset \(n\),
and variable \(y\) to stack offset \(m\)". We also write \(\rho \; x\) to
say "look up \(x\) in \(\rho\)", since \(\rho\) is a function. Finally,
to help with the ever-changing stack, we define an augmented environment
\\(\\rho^{+n}\\), such that \\(\\rho^{+n} \\; x = \\rho \\; x + n\\). In words,
this basically means "\\(\\rho^{+n}\\) has all the variables from \\(\\rho\\),
but their addresses are incremented by \\(n\\)". We now pass \\(\\rho\\)
in to \\(\\mathcal{C}\\) together with the expression \\(e\\). Let's
\(\rho^{+n}\), such that \(\rho^{+n} \; x = \rho \; x + n\). In words,
this basically means "\(\rho^{+n}\) has all the variables from \(\rho\),
but their addresses are incremented by \(n\)". We now pass \(\rho\)
in to \(\mathcal{C}\) together with the expression \(e\). Let's
rewrite our first two rules. For numbers:
{{< latex >}}
@ -72,8 +72,8 @@ For function application:
\mathcal{C} ⟦e_1 \; e_2⟧ \; \rho = \mathcal{C} ⟦e_2⟧ \; \rho \; ⧺ \;\mathcal{C} ⟦e_1⟧ \; \rho^{+1} \; ⧺ \; [\text{MkApp}]
{{< /latex >}}
Notice how in that last rule, we passed in \\(\\rho^{+1}\\) when compiling the function's expression. This is because
the result of running the instructions for \\(e\_2\\) will have left on the stack the function's parameter. Whatever
Notice how in that last rule, we passed in \(\rho^{+1}\) when compiling the function's expression. This is because
the result of running the instructions for \(e_2\) will have left on the stack the function's parameter. Whatever
was at the top of the stack (and thus, had index 0), is now the second element from the top (address 1). The
same is true for all other things that were on the stack. So, we increment the environment accordingly.
@ -84,7 +84,7 @@ With the environment, the variable rule is simple:
{{< /latex >}}
One more thing. If we run across a function name, we want to
use PushGlobal rather than Push. Defining \\(f\\) to be a name
use PushGlobal rather than Push. Defining \(f\) to be a name
of a global function, we capture this using the following rule:
{{< latex >}}
@ -93,8 +93,8 @@ of a global function, we capture this using the following rule:
Now it's time for us to compile case expressions, but there's a bit of
an issue - our case expressions branches don't map one-to-one with
the \\(t \\rightarrow i\_t\\) format of the Jump instruction.
This is because we allow for name patterns in the form \\(x\\),
the \(t \rightarrow i_t\) format of the Jump instruction.
This is because we allow for name patterns in the form \(x\),
which can possibly match more than one tag. Consider this
rather useless example:
@ -120,8 +120,8 @@ Effectively, we'll be performing [desugaring](https://en.wikipedia.org/wiki/Synt
Now, on to defining the compilation rules for case expressions.
It's helpful to define compiling a single branch of a case expression
separately. For a branch in the form \\(t \\; x\_1 \\; x\_2 \\; ... \\; x\_n \\rightarrow \text{body}\\),
we define a compilation scheme \\(\\mathcal{A}\\) as follows:
separately. For a branch in the form \(t \; x_1 \; x_2 \; ... \; x_n \rightarrow \text{body}\),
we define a compilation scheme \(\mathcal{A}\) as follows:
{{< latex >}}
\begin{aligned}
@ -133,15 +133,15 @@ t \rightarrow [\text{Split} \; n] \; ⧺ \; \mathcal{C}⟦\text{body}⟧ \; \rho
First, we run Split - the node on the top of the stack is a packed constructor,
and we want access to its member variables, since they can be referenced by
the branch's body via \\(x\_i\\). For the same reason, we must make sure to include
\\(x\_1\\) through \\(x\_n\\) in our environment. Furthermore, since the split values now occupy the stack,
we have to offset our environment by \\(n\\) before adding bindings to our new variables.
Doing all these things gives us \\(\\rho'\\), which we use to compile the body, placing
the branch's body via \(x_i\). For the same reason, we must make sure to include
\(x_1\) through \(x_n\) in our environment. Furthermore, since the split values now occupy the stack,
we have to offset our environment by \(n\) before adding bindings to our new variables.
Doing all these things gives us \(\rho'\), which we use to compile the body, placing
the resulting instructions after Split. This leaves us with the desired graph on top of
the stack - the only thing left to do is to clean up the stack of the unpacked values,
which we do using Slide.
Notice that we didn't just create instructions - we created a mapping from the tag \\(t\\)
Notice that we didn't just create instructions - we created a mapping from the tag \(t\)
to the instructions that correspond to it.
Now, it's time for compiling the whole case expression. We first want
@ -155,7 +155,7 @@ is captured by the following rule:
\mathcal{C} ⟦e⟧ \; \rho \; ⧺ [\text{Eval}, \text{Jump} \; [\mathcal{A} ⟦\text{alt}_1⟧ \; \rho, ..., \mathcal{A} ⟦\text{alt}_n⟧ \; \rho]]
{{< /latex >}}
This works because \\(\\mathcal{A}\\) creates not only instructions,
This works because \(\mathcal{A}\) creates not only instructions,
but also a tag mapping. We simply populate our Jump instruction such mappings
resulting from compiling each branch.
@ -177,7 +177,7 @@ as always, you can look at the full project source code, which is
freely available for each post in the series.
We can now envision a method on the `ast` struct that takes an environment
(just like our compilation scheme takes the environment \\(\\rho\\\)),
(just like our compilation scheme takes the environment \(\rho\)),
and compiles the `ast`. Rather than returning a vector
of instructions (which involves copying, unless we get some optimization kicking in),
we'll pass a reference to a vector to our method. The method will then place the generated
@ -188,7 +188,7 @@ from a variable? A naive solution would be to take a list or map of
global functions as a third parameter to our `compile` method.
But there's an easier way! We know that the program passed type checking.
This means that every referenced variable exists. From then, the situation is easy -
if actual variable names are kept in the environment, \\(\\rho\\), then whenever
if actual variable names are kept in the environment, \(\rho\), then whenever
we see a variable that __isn't__ in the current environment, it must be a function name.
Having finished contemplating out method, it's time to define a signature: