Add G-machine graph creation instructions to Part 5
This commit is contained in:
parent
4d8d806706
commit
a1244f201a
45
assets/scss/gmachine.scss
Normal file
45
assets/scss/gmachine.scss
Normal file
|
@ -0,0 +1,45 @@
|
|||
$basic-border: 1px solid #bfbfbf;
|
||||
|
||||
.gmachine-instruction {
|
||||
display: flex;
|
||||
border: $basic-border;
|
||||
border-radius: 2px;
|
||||
}
|
||||
|
||||
.gmachine-instruction-name {
|
||||
padding: 10px;
|
||||
border-right: $basic-border;
|
||||
flex-grow: 1;
|
||||
flex-basis: 20%;
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
.gmachine-instruction-sem {
|
||||
width: 100%;
|
||||
flex-grow: 4;
|
||||
}
|
||||
|
||||
.gmachine-inner {
|
||||
border-bottom: $basic-border;
|
||||
width: 100%;
|
||||
|
||||
&:last-child {
|
||||
border-bottom: none;
|
||||
}
|
||||
}
|
||||
|
||||
.gmachine-inner-label {
|
||||
padding: 10px;
|
||||
font-weight: bold;
|
||||
}
|
||||
|
||||
.gmachine-inner-text {
|
||||
padding: 10px;
|
||||
text-align: right;
|
||||
flex-grow: 1;
|
||||
}
|
||||
|
||||
.gmachine-instruction-name, .gmachine-inner-label, .gmachine-inner {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
}
|
|
@ -4,6 +4,7 @@ date: 2019-08-06T14:26:38-07:00
|
|||
draft: true
|
||||
tags: ["C and C++", "Functional Languages", "Compilers"]
|
||||
---
|
||||
{{< gmachine_css >}}
|
||||
We now have trees representing valid programs in our language,
|
||||
and it's time to think about how to compile them into machine code,
|
||||
to be executed on hardware. But __how should we execute programs__?
|
||||
|
@ -134,12 +135,159 @@ to apply a function, we'll follow the corresponding recipe for
|
|||
that function, and end up with a new tree that we continue evaluating.
|
||||
|
||||
### G-machine
|
||||
"Instructions" is a very generic term. We will be creating instructions
|
||||
"Instructions" is a very generic term. Specifically, we will be creating instructions
|
||||
for a [G-machine](https://link.springer.com/chapter/10.1007/3-540-15975-4_50),
|
||||
an abstract architecture which we will use to reduce our graphs. The G-machine
|
||||
is stack-based - all operations push and pop items from a stack. The machine
|
||||
will also have a "dump", which is a stack of stacks; this will help with
|
||||
separating function calls.
|
||||
|
||||
Besides constructing graphs, the machine will also have operations that will aid
|
||||
in evaluating graphs.
|
||||
We will follow the same notation as Simon Peyton Jones in
|
||||
[his book](https://www.microsoft.com/en-us/research/wp-content/uploads/1992/01/student.pdf)
|
||||
, which was my source of truth when implementing my compiler. The machine
|
||||
will be executing instructions that we give it, and as such, it must have
|
||||
an instruction queue, which we will reference as \\(i\\). We will write
|
||||
\\(x:i\\) to mean "an instruction queue that starts with
|
||||
an instruction x and ends with instructions \\(i\\)". A stack machine
|
||||
obviously needs to have a stack - we will call it \\(s\\), and will
|
||||
adopt a similar notation to the instruction queue: \\(a\_1, a\_2, a\_3 : s\\)
|
||||
will mean "a stack with the top values \\(a\_1\\), \\(a\_2\\), and \\(a\_3\\),
|
||||
and remaining instructions \\(s\\)".
|
||||
|
||||
There's one more thing the G-machine will have that we've not yet discussed at all,
|
||||
and it's needed because of the following quip earlier in the post:
|
||||
|
||||
> When we evaluate a tree, we can substitute it in-place with what it evaluates to.
|
||||
|
||||
How can we substitute a value in place? Surely we won't iterate over the entire
|
||||
tree and look for an occurence of the tree we evaluted. Rather, wouldn't it be
|
||||
nice if we could update all references to a tree to be something else? Indeed,
|
||||
we can achieve this effect by using __pointers__. I don't mean specifically
|
||||
C/C++ pointers - I mean the more general concept of "an address in memory".
|
||||
The G-machine has a __heap__, much like the heap of a C/C++ process. We
|
||||
can create a tree node on the heap, and then get an __address__ of the node.
|
||||
We then have trees use these addresses to link their child nodes.
|
||||
If we want to replace a tree node with its reduced form, we keep
|
||||
its address the same, but change the value on the heap.
|
||||
This way, all trees that reference the node we change become updated,
|
||||
without us having to change them - their child address remains the same,
|
||||
but the child has now been updated. We represent the heap
|
||||
using \\(h\\). We write \\(h[a : v]\\) to say "the address \\(a\\) points
|
||||
to value \\(v\\) in the heap \\(h\\)". Now you also know why we used
|
||||
the letter \\(a\\) when describing values on the stack - the stack contains
|
||||
addresses of (or pointers to) tree nodes.
|
||||
|
||||
_Compiling Functional Languages: a tutorial_ also keeps another component
|
||||
of the G-machine, the __global map__, which maps function names to addresses of nodes
|
||||
that represent them. We'll stick with this, and call this global map \\(m\\).
|
||||
|
||||
Finally, let's talk about what kind of nodes our trees will be made of.
|
||||
We don't have to include every node that we've defined as a subclass of
|
||||
`ast` - some nodes we can compile to instructions, without having to build
|
||||
them. We will also include nodes that we didn't need for to represent expressions.
|
||||
Here's the list of nodes types we'll have:
|
||||
|
||||
* NInt - represents an integer.
|
||||
* NApp - represents an application (has two children).
|
||||
* NGlobal - represents a global function (like the `f` in `f x`).
|
||||
* NInd - an "indrection" node that points to another node. This will help with "replacing" a node.
|
||||
* NData - a "packed" node that will represent a constructor with all the arguments.
|
||||
|
||||
With these nodes in mind, let's try defining some instructions for the G-machine.
|
||||
We start with instructions we'll use to assemble new version of function body trees as we discussed above.
|
||||
First up is __PushInt__:
|
||||
|
||||
{{< gmachine "PushInt" >}}
|
||||
{{< gmachine_inner "Before">}}
|
||||
\( \text{PushInt} \; n : i \quad s \quad h \quad m \)
|
||||
{{< /gmachine_inner >}}
|
||||
{{< gmachine_inner "After" >}}
|
||||
\( i \quad a : s \quad h[a : \text{NInt} \; n] \quad m \)
|
||||
{{< /gmachine_inner >}}
|
||||
{{< gmachine_inner "Description" >}}
|
||||
Push an integer \(n\) onto the stack.
|
||||
{{< /gmachine_inner >}}
|
||||
{{< /gmachine >}}
|
||||
|
||||
Let's go through this. We start with an instruction queue
|
||||
with `PushInt n` on top. We allocate a new `NInt` with the
|
||||
number `n` on the heap at address \\(a\\). We then push
|
||||
the address of the `NInt` node on top of the stack. Next,
|
||||
__PushGlobal__:
|
||||
|
||||
{{< gmachine "PushGlobal" >}}
|
||||
{{< gmachine_inner "Before">}}
|
||||
\( \text{PushGlobal} \; f : i \quad s \quad h \quad m[f : a] \)
|
||||
{{< /gmachine_inner >}}
|
||||
{{< gmachine_inner "After" >}}
|
||||
\( i \quad a : s \quad h \quad m[f : a] \)
|
||||
{{< /gmachine_inner >}}
|
||||
{{< gmachine_inner "Description" >}}
|
||||
Push a global function \(f\) onto the stack.
|
||||
{{< /gmachine_inner >}}
|
||||
{{< /gmachine >}}
|
||||
|
||||
We don't allocate anything new on the heap for this one -
|
||||
we already have a node for the global function. Next up,
|
||||
__Push__:
|
||||
|
||||
{{< gmachine "Push" >}}
|
||||
{{< gmachine_inner "Before">}}
|
||||
\( \text{Push} \; n : i \quad a_0, a_1, ..., a_n : s \quad h \quad m \)
|
||||
{{< /gmachine_inner >}}
|
||||
{{< gmachine_inner "After" >}}
|
||||
\( i \quad a_n, a_0, a_1, ..., a_n : s \quad h \quad m \)
|
||||
{{< /gmachine_inner >}}
|
||||
{{< gmachine_inner "Description" >}}
|
||||
Push a value at offset \(n\) from the top of the stack onto the stack.
|
||||
{{< /gmachine_inner >}}
|
||||
{{< /gmachine >}}
|
||||
|
||||
We define this instruction to work if and only if there exists an address
|
||||
on the stack at offset \\(n\\). We take the value at that offset, and
|
||||
push it onto the stack again. This can be helpful for something like
|
||||
`f x x`, where we use the same tree twice. Speaking of that - let's
|
||||
define an instruction to combine two nodes into an application:
|
||||
|
||||
{{< gmachine "MkApp" >}}
|
||||
{{< gmachine_inner "Before">}}
|
||||
\( \text{MkApp} : i \quad a_0, a_1 : s \quad h \quad m \)
|
||||
{{< /gmachine_inner >}}
|
||||
{{< gmachine_inner "After" >}}
|
||||
\( i \quad a : s \quad h[ a : \text{NApp} \; a_0 \; a_1] \quad m \)
|
||||
{{< /gmachine_inner >}}
|
||||
{{< gmachine_inner "Description" >}}
|
||||
Apply a function at the top of the stack to a value after it.
|
||||
{{< /gmachine_inner >}}
|
||||
{{< /gmachine >}}
|
||||
|
||||
We pop two things off the stack: first, the thing we're applying, then
|
||||
the thing we apply it to. We then create a new node on the heap
|
||||
that is an `NApp` node, with its two children being the nodes we popped off.
|
||||
Finally, we push it onto the stack.
|
||||
|
||||
Let's try use these instructions to get a feel for it. To save some space,
|
||||
let's assume that \\(m\\) contains \\(\\text{double} : a\_{\\text{double}}\\) and \\(\\text{halve} : a\_{\\text{halve}} \\).
|
||||
For the same reason, let's also use
|
||||
|
||||
* \\(\\text{G}\\) for \\(\\text{PushGlobal}\\)
|
||||
* \\(\\text{I}\\) for \\(\\text{PushInt}\\)
|
||||
* \\(\\text{P}\\) for \\(\\text{Push}\\)
|
||||
* \\(\\text{A}\\) for \\(\\text{MakeApp}\\)
|
||||
|
||||
Let's say we want to construct a graph for the expression `double 326`.
|
||||
The sequence of instructions \\(\\text{I} \; 326, \\text{G} \; \\text{double},
|
||||
\\text{A}\\) will do the trick. Let's
|
||||
step through them:
|
||||
|
||||
$$
|
||||
\\begin{align}
|
||||
[\\text{I} \; 326, \\text{G} \; \\text{double}, \\text{A}] & \\quad s \\quad & h \\quad & m \\\\\\
|
||||
[\\text{G} \; \\text{double},\\text{A} ] & \\quad a\_0 : s \\quad & h[a\_0 : \\text{NInt} \; 326] \\quad & m \\\\\\
|
||||
[\\text{A}] & \\quad a\_{\\text{double}}, a\_0 : s \\quad & h[a\_0 : \\text{NInt} \; 326] \\quad & m \\\\\\
|
||||
[] & \\quad a\_1: s \\quad & h[\; \\begin{aligned} a\_0 & : \\text{NInt} \; 326 \\\ a\_1 & : \\text{NApp} \; a\_{\\text{double}} \; a\_0 \\end{aligned} ] \\quad & m \\\\\\
|
||||
\\end{align}
|
||||
$$
|
||||
|
||||
We end up with a node, \\(a\_1\\), on top of the stack, which represents the application of `double` to `326`. You can see
|
||||
how the notation gets unwieldy very quickly, so I'll try to steer clear of more examples like this.
|
||||
|
|
Loading…
Reference in New Issue
Block a user