Add beginning of part 6 of compiler series
This commit is contained in:
parent
1fdccd6fae
commit
44abf877b2
91
content/blog/06_compiler_semantics.md
Normal file
91
content/blog/06_compiler_semantics.md
Normal file
|
@ -0,0 +1,91 @@
|
||||||
|
---
|
||||||
|
title: Compiling a Functional Language Using C++, Part 6 - Compilation
|
||||||
|
date: 2019-08-06T14:26:38-07:00
|
||||||
|
draft: true
|
||||||
|
tags: ["C and C++", "Functional Languages", "Compilers"]
|
||||||
|
---
|
||||||
|
In the previous post, we defined a magine for graph reduction,
|
||||||
|
called a G-machine. However, this machine is still not particularly
|
||||||
|
connected to __our__ language. In this post, we will give
|
||||||
|
meanings to programs in our language in the context of
|
||||||
|
this G-machine. We will define a __compilation scheme__,
|
||||||
|
which will be a set of rules that tell us how to
|
||||||
|
translate programs in our language into G-machine instructions.
|
||||||
|
To mirror _Implementing Functional Languages: a tutorial_, we'll
|
||||||
|
call this compilation scheme \\(\\mathcal{C}\\), and write it
|
||||||
|
as \\(\\mathcal{C} ⟦e⟧ = i\\), meaning "the expression \\(e\\)
|
||||||
|
compiles to the instructions \\(i\\)".
|
||||||
|
|
||||||
|
To follow our route from the typechecking, let's start
|
||||||
|
with compiling expressions that are numbers. It's pretty easy:
|
||||||
|
$$
|
||||||
|
\\mathcal{C} ⟦n⟧ = [\\text{PushInt} \\; n]
|
||||||
|
$$
|
||||||
|
|
||||||
|
Here, we compiled a number expression to a list of
|
||||||
|
instructions with only one element - PushInt.
|
||||||
|
|
||||||
|
Just like when we did typechecking, let's
|
||||||
|
move on to compiling function applications. As
|
||||||
|
we informally stated in the previous chapter, since
|
||||||
|
the thing we're applying has to be on top,
|
||||||
|
we want to compile it last:
|
||||||
|
|
||||||
|
$$
|
||||||
|
\\mathcal{C} ⟦e\_1 \; e\_2⟧ = \\mathcal{C} ⟦e\_2⟧ ⧺ \\mathcal{C} ⟦e\_1⟧ ⧺ [\\text{MkApp}]
|
||||||
|
$$
|
||||||
|
|
||||||
|
Here, we used the \\(⧺\\) operator to represent the concatenation of two
|
||||||
|
lists. Otherwise, this should be pretty intutive - we first run the instructions
|
||||||
|
to create the parameter, then we run the instructions to create the function,
|
||||||
|
and finally, we combine them using MkApp.
|
||||||
|
|
||||||
|
It's variables that once again force us to adjust our strategy. If our
|
||||||
|
program is well-typed, we know our variable will be on the stack:
|
||||||
|
our definition of Unwind makes it so for functions, and we will
|
||||||
|
define our case expression compilation scheme to match. However,
|
||||||
|
we still need to know __where__ on the stack each variable is,
|
||||||
|
and this changes as the stack is modified.
|
||||||
|
|
||||||
|
To accommodate for this, we define an environment, \\(\\rho\\),
|
||||||
|
to be a partial function mapping variable names to thier
|
||||||
|
offsets on the stack. We write \\(\\rho = [x \\rightarrow n, y \\rightarrow m]\\)
|
||||||
|
to say "the environment \\(\\rho\\) maps variable \\(x\\) to stack offset \\(n\\),
|
||||||
|
and variable \\(y\\) to stack offset \\(m\\)". We also write \\(\\rho \; x\\) to
|
||||||
|
say "look up \\(x\\) in \\(\\rho\\)", since \\(\\rho\\) is a function. Finally,
|
||||||
|
to help with the ever-changing stack, we define an augmented environment
|
||||||
|
\\(\\rho^{+n}\\), such that \\(\\rho^{+n} \; x = \\rho \; x + n\\). In words,
|
||||||
|
this basically means "\\(\\rho^{+n}\\) has all the variables from \\(\\rho\\),
|
||||||
|
but their addresses are incremented by \\(n\\)". We now pass \\(\\rho\\)
|
||||||
|
in to \\(\\mathcal{C}\\) together with the expression \\(e\\). Let's
|
||||||
|
rewrite our first two rules. For numbers:
|
||||||
|
|
||||||
|
$$
|
||||||
|
\\mathcal{C} ⟦n⟧ \; \\rho = [\\text{PushInt} \\; n]
|
||||||
|
$$
|
||||||
|
|
||||||
|
For function application:
|
||||||
|
$$
|
||||||
|
\\mathcal{C} ⟦e\_1 \; e\_2⟧ \; \\rho = \\mathcal{C} ⟦e\_2⟧ \; \\rho ⧺ \\mathcal{C} ⟦e\_1⟧ \; \\rho^{+1} ⧺ [\\text{MkApp}]
|
||||||
|
$$
|
||||||
|
|
||||||
|
Notice how in that last rule, we passed in \\(\\rho^{+1}\\) when compiling the function's expression. This is because
|
||||||
|
the result of running the instructions for \\(e\_2\\) will have left on the stack the function's parameter. Whatever
|
||||||
|
was at the top of the stack (and thus, had index 0), is now the second element from the top (address 1). The
|
||||||
|
same is true for all other things that were on the stack. So, we increment the environment accordingly.
|
||||||
|
|
||||||
|
With the environment, the variable rule is simple:
|
||||||
|
$$
|
||||||
|
\\mathcal{C} ⟦x⟧ \; \\rho = [\\text{Push} \\; (\\rho \; x)]
|
||||||
|
$$
|
||||||
|
|
||||||
|
One more thing. If we run across a function name, we want to
|
||||||
|
use PushGlobal rather than Push. Defining \\(f\\) to be a name
|
||||||
|
of a global function, we capture this using the following rule:
|
||||||
|
|
||||||
|
$$
|
||||||
|
\\mathcal{C} ⟦f⟧ \; \\rho = [\\text{PushGlobal} \\; f]
|
||||||
|
$$
|
||||||
|
|
||||||
|
Next up, case expressions. These are a bit more complex: there are several
|
||||||
|
branches, each of which will have its own environment.
|
Loading…
Reference in New Issue
Block a user