diff --git a/code/compiler/09/llvm_context.cpp b/code/compiler/09/llvm_context.cpp index 99ab480..45dcb50 100644 --- a/code/compiler/09/llvm_context.cpp +++ b/code/compiler/09/llvm_context.cpp @@ -20,7 +20,9 @@ void llvm_context::create_types() { gmachine_type->setBody( stack_ptr_type, - node_ptr_type + node_ptr_type, + IntegerType::getInt64Ty(ctx), + IntegerType::getInt64Ty(ctx) ); struct_types.at("node_base")->setBody( IntegerType::getInt32Ty(ctx), @@ -212,11 +214,6 @@ Value* llvm_context::create_track(Function* f, Value* v) { return builder.CreateCall(track_f, { f->arg_begin(), v }); } -Value* llvm_context::create_eval(Value* e) { - auto eval_f = functions.at("eval"); - return builder.CreateCall(eval_f, { e }); -} - void llvm_context::create_unwind(Function* f) { auto unwind_f = functions.at("unwind"); builder.CreateCall(unwind_f, { f->args().begin() }); diff --git a/code/compiler/09/llvm_context.hpp b/code/compiler/09/llvm_context.hpp index 89e145e..fbe4cc1 100644 --- a/code/compiler/09/llvm_context.hpp +++ b/code/compiler/09/llvm_context.hpp @@ -55,7 +55,6 @@ struct llvm_context { void create_alloc(llvm::Function*, llvm::Value*); llvm::Value* create_track(llvm::Function*, llvm::Value*); - llvm::Value* create_eval(llvm::Value*); void create_unwind(llvm::Function*); llvm::Value* unwrap_gmachine_stack_ptr(llvm::Value*); diff --git a/code/compiler/09/runtime.c b/code/compiler/09/runtime.c index 52a74c8..7b8a7c3 100644 --- a/code/compiler/09/runtime.c +++ b/code/compiler/09/runtime.c @@ -170,14 +170,16 @@ void gmachine_split(struct gmachine* g, size_t n) { struct node_base* gmachine_track(struct gmachine* g, struct node_base* b) { g->gc_node_count++; - if(g->gc_node_count >= g->gc_node_threshold) { - gc_visit_node(b); - gmachine_gc(g); - g->gc_node_threshold *= 2; - } - b->gc_next = g->gc_nodes; g->gc_nodes = b; + + if(g->gc_node_count >= g->gc_node_threshold) { + uint64_t nodes_before = g->gc_node_count; + gc_visit_node(b); + gmachine_gc(g); + g->gc_node_threshold = g->gc_node_count * 2; + } + return b; } diff --git a/content/blog/08_compiler_llvm.md b/content/blog/08_compiler_llvm.md index 0fd6587..e4a1be8 100644 --- a/content/blog/08_compiler_llvm.md +++ b/content/blog/08_compiler_llvm.md @@ -574,5 +574,6 @@ In the next several posts, we will improve our compiler to properly free unused memory usign a __garbage collector__, implement lambda functions using __lambda lifting__, -and use our Alloc instruction to implement `let/in` expressions. See -you there! +and use our Alloc instruction to implement `let/in` expressions. +We get started on the first of these tasks in +[Part 9 - Garbage Collection]({{< relref "09_compiler_garbage_collection.md" >}}). diff --git a/content/blog/09_compiler_garbage_collection.md b/content/blog/09_compiler_garbage_collection.md new file mode 100644 index 0000000..5e996f1 --- /dev/null +++ b/content/blog/09_compiler_garbage_collection.md @@ -0,0 +1,559 @@ +--- +title: Compiling a Functional Language Using C++, Part 9 - Garbage Collection +date: 2020-01-27T20:35:15-08:00 +tags: ["C and C++", "Functional Languages", "Compilers"] +draft: true +--- + +> "When will you learn? When will you learn that __your actions have consequences?__" + +So far, we've entirely ignored the problem of memory management. Every time +that we need a new node for our growing graph, we simply ask for more memory +from the runtime with `malloc`. But selfishly, even when we no longer require +the memory allocated for a particular node, when that node is no longer in use, +we do not `free` it. In fact, our runtime currently has no idea about +which nodes are needed and which ones are ready to be discarded. + +To convince ourselves that this is a problem, let's first assess the extent of the damage. +Consider the program from `works3.txt`: + +{{< rawblock "compiler/09/examples/works3.txt" >}} + +Compiling and running this program through `valgrind`, we get the following output: + +``` +==XXXX== LEAK SUMMARY: +==XXXX== definitely lost: 288 bytes in 12 blocks +==XXXX== indirectly lost: 768 bytes in 34 blocks +==XXXX== possibly lost: 0 bytes in 0 blocks +==XXXX== still reachable: 0 bytes in 0 blocks +==XXXX== suppressed: 0 bytes in 0 blocks +``` + +We lost 1056 bytes of memory, just to return the length of a list +with 3 elements. The problem of leaking memory is very real. + +How do we solve this issue? We can't embed memory management into our language; +We want to keep it pure, and managing memory is typically pretty far from +that goal. Instead, we will make our runtime do the work of freeing memory. +Even then, this is a nontrivial goal: our runtime manipulates graphs, each +of which can be combined with others in arbitrary ways. In general, there +will not always be a _single_ node that, when freed, will guarantee that +another node can be freed as well. Instead, it's very possible in our +graphs that two parent nodes both refer to a third, and only when both +parents are freed can we free that third node itself. Consider, +for instance, the function `square` as follows: + +``` +defn square x = { + x * x +} +``` + +This function will receive, on top of the stack, a single graph representing `x`. +It will then create two applications of a global `(+)` function, each time +to the graph of `x`. Thus, it will construct a tree with two `App` nodes, both +of which +{{< sidenote "right" "lazy-note" "must keep track of a reference to x.">}} +We later take advantage of this, by replacing the graph of x with the +result of evaluating it. Since both App nodes point to the same +graph, when we evaluate it once, each node observes this update, and is not +required to evaluate x again. With this, we achieve lazy evaluation. +{{< /sidenote >}} The runtime will have to wait until both `App` nodes +are freed before it can free the graph of `x`. + +This seems simple enough! If there are multiple things that may reference a node +in the graph, why don't we just keep track of how many there are? Once we know +that no more things are still referencing a node, we can free it. This is +called [reference counting](https://en.wikipedia.org/wiki/Reference_counting). +Reference counting is a valid technique, but unfortunately, it will not suit us. +The reason for this is that our language may produce +[cyclic graphs](https://en.wikipedia.org/wiki/Cycle_(graph_theory)). Consider, +for example, this definition of an infinite list of the number 1: + +``` +defn ones = { Cons 1 ones } +``` + +Envisioning the graph of the tree, we can see `ones` as an application +of the constructor `Cons` to two arguments, one of which is `ones` again. +{{< sidenote "right" "recursive-note" "It refers to itself!" >}} +Things are actually more complicated than this. In our current language, +recursive definitions are only possible in function definitions (like +ones). In our runtime, each time there is a reference +to a function, this is done through a new node, which +means that functions with recursive definitions are not represented cyclically. +Therefore, reference counting would work. However, in the future, +our language will have more ways of creating circular definitions, +some of which will indeed create cycles in our graphs. So, to +prepare for this, we will avoid the use of reference counting. +{{< /sidenote >}} In this case, when we compute the number of nodes +that require `ones`, we will always find the number to be at least 1: `ones` +needs `ones`, which needs `ones`, and so on. It will not be possible for +us to free `ones`, then, by simply counting the number of references to it. + +There's a more powerful technique than reference counting for freeing +unused memory: __mark-and-sweep garbage collection__. This technique +is conceptually pretty simple to grasp, yet will allow us to handle +cycles in our graphs. Unsurprisingly, we implement this type +of garbage collection in two stages: + +1. __Mark__: We go through every node that is still needed by +the runtime, and recursively mark is, its children, and so on as "to keep". +2. __Sweep__: We go through every node we haven't yet freed, and, +if it hasn't been marked as "to keep", we free it. + +This also seems simple enough. There are two main things for us +to figure out: + +1. For __Mark__, what are the "nodes still needed by the runtime"? +These are just the nodes on the various G-machine stacks. If +a node is not on the stack, nor is it a child of a node +that is on the stack, why should we keep it around? +2. For __Sweep__, how do we keep track of all the nodes we haven't +yet freed? In our case, the solution is a global list of allocated +nodes, which is updated every time that a node is allocated. + +Wait a minute, though. Inside of `unwind` in C, we only have +a reference to the most recent stack. Our execution model allows +for an arbitrary number of stacks: we can keep using `Eval`, +placing the current stack on the dump, and starting a new stack +from scratch to evaluate a node. How can we traverse these stacks +from inside unwind? One solution could be to have each stack +point to the "parent" stack. To find all the nodes on the +stack, then, we'd start with the current stack, mark all the +nodes on it as "required", then move on to the parent stack, +rinse and repeat. This is plausible and pretty simple, but +there's another way. + +We clean up after ourselves. + +### Towards a Cleaner Stack +Simon Peyton Jones wrote his G-machine semantics in a particular way. Every time +that a function is called, all it leaves behind on the stack is the graph node +that represents the function's output. Our own internal functions, however, have been less +careful. Consider, for instance, the "binary operator" function I showed you. +Its body is given by the following G-machine instructions: + +```C++ +instructions.push_back(instruction_ptr(new instruction_push(1))); +instructions.push_back(instruction_ptr(new instruction_eval())); +instructions.push_back(instruction_ptr(new instruction_push(1))); +instructions.push_back(instruction_ptr(new instruction_eval())); +instructions.push_back(instruction_ptr(new instruction_binop(op))); +``` + +When the function is called, there are at least 3 things on the stack: + +1. The "outermost" application node, to be replaced with an indirection (to enable laziness). +2. The second argument to the binary operator. +3. The first argument to the binary operator. + +Then, __Push__ adds another node to the stack, an __Eval__ forces +its evaluation (and leaves it on the stack). This happens again with the second argument. +Finally, we call __BinOp__, popping two values off the stack and combining them +according to the binary operator. This leaves the stack with 4 things: the 3 I described +above, and thew newly computed value. This is fine as far as `eval` is concerned: its +implementation only asks for the top value on the stack after `unwind` finishes. But +for anything more complicated, this is a very bad side effect. We want to leave the +stack as clean as we found it - with one node and no garbage. + +Fortunately, the way we compile functions is a good guide for how we should +compile internal operators and constructors. The idea is captured +by the two instructions we insert at the end of a user-defined +function: + +{{< codelines "C++" "compiler/09/definition.cpp" 56 57 >}} + +Once a result is computed, we turn the node that represented the application +into an indirection, and point it to the computed result (as I said before, +this enables lazy evaluation). We also pop the arguments given to the function +off the stack. Let's add these two things to the `gen_llvm_internal_op` function: + +{{< codelines "C++" "compiler/09/main.cpp" 70 85 >}} + +Notice, in particular, the `instruction_update(2)` and `instruction_pop(2)` +instructions that were recently added. A similar thing has to be done for data +type constructors. The difference, though, is that __Pack__ removes the data +it packs from the stack, and thus, __Pop__ is not needed: + +{{< codelines "C++" "compiler/09/definition.cpp" 102 117 >}} + +With this done, let's run a quick test: let's print the number of things +on the stack at the end of an `eval` call (before the stack is freed, +of course). We can compare the output of runtime without the fix (`old`) +and with the fix (`current`): + +``` + current old + +Current stack size is 0 | Current stack size: 1 +Current stack size is 0 | Current stack size: 1 +Current stack size is 0 | Current stack size: 1 +Current stack size is 0 | Current stack size: 1 +Current stack size is 0 | Current stack size: 0 +Current stack size is 0 | Current stack size: 0 +Current stack size is 0 | Current stack size: 3 +Current stack size is 0 | Current stack size: 0 +Current stack size is 0 | Current stack size: 3 +Current stack size is 0 | Current stack size: 0 +Current stack size is 0 | Current stack size: 3 +Result: 3 | Result: 3 +``` + +The stack is now much cleaner! Every time `eval` is called, it starts +with one node, and ends with one node (which is then popped). + +### One Stack to Rule Them All + +Wait a minute. If the stack is really always empty at the end, do we really need to construct +a new stack every time? +{{< sidenote "right" "arity-note" "I think not" >}} +There's some nuance to this. While it is true that for the most +part, we can get rid of the new stacks in favor of a single +one, our runtime will experience a change. The change lies +in the Unwind-Global rule, which requires that the +stack has as many children as the function needs +arguments. Until now, there was no way +for this condition to be accidentally satisfied: the function +we were unwinding was the only thing on the stack. Now, +though, things are different: the function being +unwound may share a stack with something else, +and just checking the stack size will not be sufficient. +I believe that this is not a problem for us, +since the compiler will only emit Eval +instructions for things it knows are data types or numbers, +meaning their type is not a partially applied function +that is missing arguments. However, this is a nontrivial +observation. +{{< /sidenote >}}, and Simon Peyton Jones seems to +agree. In _Implementing Functional Languages: a tutorial_, he mentions +that the dump does not need to be implemented as a real stack of stacks. +So let's try this out: instead of starting a new stack using `eval`, +let's use an existing one, by just calling `unwind` again. To do so, +all we have to do is change our `instruction_eval` instruction. When +the G-machine wants something evaluated now, it should just call +`unwind` directly! + +To make this change, we have to make `unwind` available to the +compiler. We thus declare it in the `llvm_context.cpp` file: + +{{< codelines "C++" "compiler/09/llvm_context.cpp" 158 163 >}} + +And even create a function to construct a call to `unwind` +with the following signature: + +{{< codelines "C++" "compiler/09/llvm_context.hpp" 58 58 >}} + +We implement it like so: + +{{< codelines "C++" "compiler/09/llvm_context.cpp" 217 220 >}} + +Finally, the `instruction_eval::gen_llvm` method simply calls +`unwind`: + +{{< codelines "C++" "compiler/09/instruction.cpp" 157 159 >}} + +After this change, we only call `eval` from `main`. Furthermore, +since `eval` releases all the resources it allocates before +returning, we won't be able to +{{< sidenote "right" "retrieve-note" "easily retrieve" >}} +We were able to do this before, but that's because our +runtime didn't free the nodes, ever. Now that +it does, returning a node violates that node's lifetime. +{{< /sidenote >}}the result of the evaluation from it. +Thus, we simply merge `eval` with `main` - combining +the printing and the initialization / freeing +code. + +With this, only one stack will be allocated for the entirety of +program execution. This doesn't just help us save on memory +allocations, but also __solves the problem of marking +valid nodes during garbage collection__! Instead of traversing +a dump of stacks, we can now simply traverse a single stack; +all that we need is in one place. + +So this takes care, more or less, of the "mark" portion of mark-and-sweep. +Using the stack, we can recursively mark the nodes that we need. But +what about "sweeping"? How can we possibly know of every node that +we've allocated? There's some more bookkeping for us to do. + +### It's All Connected +There exists a simple technique I've previously seen (and used) +for keeping track of all the allocated memory. The technique is +to __turn all the allocated nodes into elements of a linked list__. +The general process of implementing this proceeds as follows: + +1. To each node, add a "next" pointer. +2. Keep a handle to the whole node chain somewhere. +3. Add each newly allocated node to the front of the whole chain. + +This "somewhere" could be a global variable. However, +since we already pass a stack to almost all of our +functions, it makes more sense to make the list handle +a part of some data structure that will also contain the stack, +and pass that around, instead. This keeps all of the G-machine +data in one place, and in principle could allow for concurrent +execution of more than one G-machine in a single program. Let's +call our new data structure `gmachine`: + +{{< codelines "C++" "compiler/09/runtime.h" 69 74 >}} + +Here, the `stack` field holds the G-machine stack, +and the `gc_nodes` is the handle to the list of all the nodes +we've allocated and not yet freed. Don't worry about the `gc_node_count` +and `gc_threshold` fields - we'll get to them a little later. + +This is going to be a significant change. First of all, since +the handle won't be global, it can't be accessed from inside the +`alloc_*` functions. Instead, we have to make sure to add +nodes allocated through `alloc_*` to a G-machine somewhere +wherever we call the allocators. To make it easier to add nodes to a G-machine +GC handle, let's make a new function, `track`: + +```C +struct node_base* gmachine_track(struct gmachine*, struct node_base*); +``` + +This function will add the given node to the G-machine's handle, +and return that same node. This way, we can wrap nodes in +a call to `gmachine_track`. We will talk about this +function's implementation later in the post. + +This doesn't get us all the way to a working runtime, though: +right now, we still pass around `struct stack*` instead of +`struct gmachine*` everywhere. However, the whole point +of adding the `gmachine` struct was to store more data in it! +Surely we need that new data somewhere, and thus, we need to +use the `gmachine` struct for _some_ functions. What functions +_do_ need a whole `gmachine*`, and which ones only need +a `stack*`? + +1. {{< sidenote "right" "ownership-note" "Clearly," >}} +This might not be clear. Maybe pushing onto a stack will +add a node to our GC handle, and so, we need to have access +to the handle in stack_push. The underlying +question is that of ownership: when we allocate +a node, which part of the program does it "belong" to? +The "owner" of the node should do the work of managing +when to free it or keep it. Since we already agreed to +create a gmachine struct to house the GC +handle, it makes sense that nodes are owned by the +G-machine. Thus, the assumption in functions like +stack_push is that the "owner" of the node +already took care of allocating and tracking it, and +stack_push itself shouldn't bother. +{{< /sidenote >}} `stack_push`, `stack_pop`, and similar functions +do not require a G-machine. +2. `stack_alloc` and `stack_pack` __do__ need a G-machine, +because they must allocate new nodes. Where the nodes +are allocated, we should add them to the GC handle. +3. Since they use `stack_alloc` and `stack_pack`, +generated functions also need a G-machine. +4. Since `unwind` calls the generated functions, +it must also receive a G-machine. + +As far as stack functions go, we only _need_ to update +`stack_alloc` and `stack_pack`. Everything else +doesn't require new node allocations, and thus, +does not require the GC handle. However, this makes +our code rather ugly: we have a set of mostly `stack_*` +functions, followed suddenly by two `gmachine_*` functions. +In the interest of cleanliness, let's instead do the following: + +1. Make all functions associated with G-machine rules (like +__Alloc__, __Update__, and so on) require a `gmachine*`. This +way, theres a correspondence between our code and the theory. +2. Leave the rest of the functions (`stack_push`, `stack_pop`, +etc.) as is. They are not G-machine specific, and don't +require a GC handle, so there's no need to touch them. + +Let's make this change. We end up with the following +functions: + +{{< codelines "C" "compiler/09/runtime.h" 56 84 >}} + +For the majority of the changed functions, the +updates are +{{< sidenote "right" "cosmetic-note" "cosmetic." >}} +We must also update the LLVM/C++ declarations of +the affected functions: many of them now take a +gmachine_ptr_type instead of stack_ptr_type. +This change is not shown explicitly here (it is hard to do with our +growing code base), but it is nonetheless significant. +{{< /sidenote >}} The functions +that require more significant modifications are `gmachine_alloc` +and `gmachine_pack`. In both, we must now make a call to `gmachine_track` +to ensure that a newly allocated node will be garbage collected in the future. +The updated code for `gmachine_alloc` is: + +{{< codelines "C" "compiler/09/runtime.c" 140 145 >}} + +Correspondingly, the updated code for `gmachine_pack` is: + +{{< codelines "C" "compiler/09/runtime.c" 147 162 >}} + +Note that we've secretly made one more change. Instead of +allocating `sizeof(*data) * n` bytes of memory for +the array of packed nodes, we allocate `sizeof(*data) * (n + 1)`, +and set the last element to `NULL`. This will allow other +functions (which we will soon write) to know how many elements are packed inside +a `node_data` (effectively, we've added a `NULL` terminator). + +We must change our compiler to keep it up to date with this change. Importantly, +it must know that a G-machine struct exists. To give it +this information, we add a new +`llvm::StructType*` called `gmachine_type` to the `llvm_context` class, +initialize it in the constructor, and set its body as follows: + +{{< codelines "C++" "compiler/09/llvm_context.cpp" 21 26 >}} + +The compiler must also know that generated functions now use the G-machine +struct rather than a stack struct: + +{{< codelines "C++" "compiler/09/llvm_context.cpp" 19 19 >}} + +Since we still use some functions that require a stack and not a G-machine, +we must have a way to get the stack from a G-machine. To do this, +we create a new `unwrap` function, which uses LLVM's GEP instruction +to get a pointer to the G-machine's stack field: + +{{< codelines "C++" "compiler/09/llvm_context.cpp" 222 225 >}} + +We use this function elsewhere, such `llvm_context::create_pop`: + +{{< codelines "C++" "compiler/09/llvm_context.cpp" 176 179 >}} + +Finally, we want to make sure our generated functions don't allocate +nodes without tracking them with the G-machine. To do so, we modify +all the `create_*` methods to require the G-machine function argument, +and update the functions themselves to call `gmachine_track`. For +example, here's `llvm_context::create_num`: + +{{< codelines "C++" "compiler/09/llvm_context.cpp" 235 239 >}} + +Of course, this requires us to add a new `create_track` method +to the `llvm_context`: + +{{< codelines "C++" "compiler/09/llvm_context.cpp" 212 215 >}} + +This is good. Let's now implement the actual mark-and-sweep algorithm +in `gmachine_gc`: + +{{< codelines "C" "compiler/09/runtime.c" 186 204 >}} + +In the code above, we first iterate through the stack, +calling `gc_visit_node` on every node that we encounter. The +assumption is that once `gc_visit_node` is done, every node +that _can_ be reached has its `gc_reachable` field set to 1, +and all the others have it set to 0. + +Once we reach the end of the stack, we continue to the "sweep" phase, +iterating through the linked list of nodes (held in the G-machine +GC handle `gc_nodes`). For each node, if its `gc_reachable` flag +is not set, we remove it from the linked list, and call `free_node_direct` +on it. Otherwise (that is, if the flag __is__ set), we clear it, +so that the node can potentially be garbage collected in a future +invocation of `gmachine_gc`. + +`gc_visit_node` recursively marks a node and its children as reachable: + +{{< codelines "C" "compiler/09/runtime.c" 51 70 >}} + +This is possible with the `node_data` nodes because of the change we +made to the `gmachine_pack` instruction earlier: now, the last element +of the "packed" array is `NULL`, telling `gc_visit_node` that it has +reached the end of the list of children. + +`free_node_direct` performs a non-recursive deallocation of all +the resources held by a particular node. So far, this is only +needed for `node_data` nodes, since the arrays holding their children +are dynamically allocated. Thus, the code for the function is +pretty simple: + +{{< codelines "C" "compiler/09/runtime.c" 45 49 >}} + +### When to Collect +When should we run garbage collection? Initially, I tried +running it after every call to `unwind`. However, this +quickly proved impractical: the performance of all +the programs in the language decreased by a spectacular +amount. Programs like `works1.txt` and `works2.txt` +would take tens of seconds to complete. + +Instead of this madness, let's settle for an approach +common to many garbage collectors. Let's __perform +garbage collection every time the amount of +memory we've allocated doubles__. Tracking when the +amount of allocated memory doubles is the purpose of +the `gc_node_count` and `gc_threshold` fields in the +`gmachine` struct. The former field tracks how many +nodes have been tracked by the garbage collector, and the +latter holds the number of nodes the G-machine must +reach before triggering garbage collection. + +Since the G-machine is made aware of allocations +by a call to the `gmachine_track` function, this +is where we will attempt to perform garbage collection. +We end up with the following code: + +{{< codelines "C++" "compiler/09/runtime.c" 171 184 >}} + +When a node is added to the GC handle, we increment the `gc_node_count` +field. If the new value of this field exceeds the threshold, +we perform garbage collection. There are cases in which +this is fairly dangerous: for instance, `gmachine_pack` first +moves all packed nodes into an array, then allocates a `node_data` +node. This means that for a brief moment, the nodes stored +into the new data node are inaccessible from the stack, +and thus susceptible to garbage collection. To prevent +situations like this, we run `gc_visit_node` on the node +being tracked, marking it and its children as "reachable". +Finally, we set the next "free" threshold to double +the number of currently allocated nodes. + +This is about as much as we need to do. The change in this +post was a major one, and required updating multiple files. +As always, you're welcome to check out [the compiler source +code for this post](https://dev.danilafe.com/Web-Projects/blog-static/src/branch/master/code/compiler/09). +To wrap up, let's evaluate our change. + +To especially stress the compiler, I came up with a prime number +generator. Since booleans are not in the standard library, and +since it isn't possible to pattern match on numbers, my +only option was the use Peano encoding. This effectively +means that numbers are represented as linked lists, +which makes garbage collection all the more +important. The program is quite long, but you can +[find the entire code here](https://dev.danilafe.com/Web-Projects/blog-static/src/branch/master/code/compiler/09/examples/primes.txt). + +When I ran the `primes` program compiled using the +previous version of the compiler using `time`, I +got the following output: + +``` +Maximum resident set size (kbytes): 935764 +Minor (reclaiming a frame) page faults: 233642 +``` + +In contrast, here is the output of `time` when running +the same program compiled with the new version of +the compiler: + +``` +Maximum resident set size (kbytes): 7448 +Minor (reclaiming a frame) page faults: 1577 +``` + +We have reduced maximum memory usage by a factor of +125, and the number of page faults by a factor of 148. +That seems pretty good! + +With this success, we end today's post. As I mentioned +before, we're not done. The language is still clunky to use, +and can benefit from `let/in` expressions and __lambda functions__. +Furthermore, our language is currently monomorphic, and would +be much better with __polymorphism__. Finally, to make our language +capable of more-than-trivial work, we may want to implement +__Input/Output__ and __strings__. I hope to see you in future posts, +where we will implement these features!