3 changed files with 37 additions and 121 deletions
--- a/content/blog/00_compiler_intro.md
+++ b/content/blog/00_compiler_intro.md
@ -145,5 +145,4 @@ Here are the posts that I've written so far for this series:
 * [Polymorphism]({{< relref "10_compiler_polymorphism.md" >}})
 * [Polymorphic Data Types]({{< relref "11_compiler_polymorphic_data_types.md" >}})
 * [Let/In and Lambdas]({{< relref "12_compiler_let_in_lambda/index.md" >}})
-* [Cleanup]({{< relref "13_compiler_cleanup/index.md" >}})

--- a/content/blog/12_compiler_let_in_lambda/index.md
+++ b/content/blog/12_compiler_let_in_lambda/index.md
@ -984,6 +984,5 @@ Before either of those things, though, I think that I want to go through
 the compiler and perform another round of improvements, similarly to
 [part 4]({{< relref "04_compiler_improvements" >}}). It's hard to do a lot
 of refactoring while covering new content, since major changes need to
-be explained and presented for the post to make sense.
-I do this in [part 13]({{< relref "13_compiler_cleanup/index.md" >}}) - cleanup.
-I hope to see you there!
+be explained and presented for the post to make sense. I hope to see
+you in these future posts!
--- a/content/blog/13_compiler_cleanup/index.md
+++ b/content/blog/13_compiler_cleanup/index.md
@ -1,8 +1,9 @@
 ---
 title: Compiling a Functional Language Using C++, Part 13 - Cleanup
-date: 2020-09-19T16:14:13-07:00
+date: 2020-09-10T18:50:02-07:00
 tags: ["C and C++", "Functional Languages", "Compilers"]
 description: "In this post, we clean up our compiler."
+draft: true
 ---

 In [part 12]({{< relref "12_compiler_let_in_lambda" >}}), we added `let/in`
@ -69,11 +70,11 @@ than _characters_, it effectively doesn't interact with the source
 text at all, and can't determine from which line or column a token
 originated. The task of determining the locations of input tokens
 is delegated to the tokenizer -- Flex, in our case. Flex, on the
-other hand, doesn't have a built-in mechanism for tracking
+other hand, doesn't doesn't have a built-in mechanism for tracking
 locations. Fortunately, Bison provides a `yy::location` class that
 includes most of the needed functionality.

-A `yy::location` consists of two source positions, `begin` and `end`,
+A `yy::location` consists of `begin` and `end` source position,
 which themselves are represented using lines and columns. It
 also has the following methods:

@ -84,7 +85,7 @@ then `columns(token_length)` will move `end` to the token's end,
 and thus make the whole `location` contain the token.
 * `yy::location::lines(int)` behaves similarly to `columns`,
 except that it advances `end` by the given number of lines,
-rather than columns. It also resets the columns counter to `1`.
+rather than columns.
 * `yy::location::step()` moves `begin` to where `end` is. This
 is useful for when we've finished processing a token, and want
 to move on to the next one.
@ -101,20 +102,10 @@ We'll see why we are using `LOC` instead of something like `location` soon;
 for now, you can treat `LOC` as if it were a global variable declared 
 in the tokenizer. Before processing each token, we ensure that
 the `yy::location` has its `begin` and `end` at the same position,
-and then advance `end` by `yyleng` columns. This is
-{{< sidenote "right" "sufficient-note" "sufficient" >}}
-This doesn't hold for all languages. It may be possible for a language
-to have tokens that contain <code>\n</code>, in which case,
-rather than just using <code>yyleng</code>, we'd need to
-add special logic to iterate over the token and detect the line
-breaks.<br>
-<br>
-Also, this requires that the <code>end</code> of the previous token was
-correctly computed.
-{{< /sidenote >}}
+and then advance `end` by `yyleng` columns. This is sufficient
 to make `LOC` represent our token's source position. For
 the moment, don't worry too much about `drv`; this is the
-parsing driver, and we will talk about it shortly.
+parse driver, and we will talk about it shortly.

 So now we have a "global" variable `LOC` that gives
 us the source position of the current token. To get it
@ -137,7 +128,7 @@ we need to add a `yy::location` argument to each of our `ast` nodes,
 as well as to the `pattern` subclasses, `definition_defn` and
 `definition_data`. To avoid breaking all the code that creates
 AST nodes and definitions outside of the parser, we'll make this
-argument optional. Inside of `ast.hpp`, we define a new field as follows:
+argument optional. Inside of `ast.hpp`, we define it as follows:

 {{< codelines "C++" "compiler/13/ast.hpp" 16 16 >}}

@ -145,7 +136,7 @@ Then, we add a constructor to `ast` as follows:

 {{< codelines "C++" "compiler/13/ast.hpp" 18 18 >}}

-Note that it's not optional here, since `ast` itself is an
+Note that it's not default here, since `ast` itself is an
 abstract class, and thus will never be constructed directly.
 It is in the subclasses of `ast` that we provide a default
 value. The change is rather mechanical, but here's an example
@ -164,7 +155,7 @@ detail:
 Here, the `@$` character is used to reference the current
 nonterminal's location data.

-#### Line Offsets, File Input, and the Parsing Driver
+#### Line Offsets, File Input, and the Parse Driver
 There are three more challenges with printing out the line
 of code where an error occurred. First of all, to
 print out a line of code, we need to have that line of code
@ -206,7 +197,7 @@ to read source code from files, anyway.
 To address the second issue, we can keep a mapping of line numbers
 to their locations in the source buffer. This is rather easy to
 maintain using an array: the first element of the array is 0,
-which is the beginning of the first line in any source file. From there,
+which is the beginning of any line in any source file. From there,
 every time we encounter the character `\n`, we can push
 the current source location to the top, marking it as
 the beginning of another line. Where exactly we store this
@ -422,7 +413,7 @@ structure containing Flex's state.

 Adding a parameter to Bison doesn't automatically affect
 Flex. To let Flex know that its `yylex` function must now accept
-the state and the parsing driver, we have to define the
+the state and the parse driver, we have to define the
 `YY_DECL` macro. We do this in `parse_driver.hpp`, since
 this forward declaration will be used by both Flex
 and Bison:
@ -541,8 +532,8 @@ Here's an example from `parsed_type`:

 {{< codelines "C++" "compiler/13/parsed_type.cpp" 16 23 >}}

-In general, this change is also rather mechanical. Before we
-move on, to maintain a balance between exceptions and assertions, here
+In general, this change is also rather mechanical, but, to
+maintain a balance between exceptions and assertions, here
 are a couple more assertions from `type_env`:

 {{< codelines "C++" "compiler/13/type_env.cpp" 81 82 >}}
@ -590,7 +581,9 @@ while the actual type was:
  Bool
 ```

-Here's an error that was previously a `throw 0` statement in our code:
+The exclamation marks in front of the two types are due to some
+changes from section 2. Here's an error that was previously
+a `throw 0` statement in our code:

 ```
 an error occured while compiling the program: type variable a used twice in data type definition.
@ -611,21 +604,7 @@ Now that I've had some more time to think about it
 (and now that I've returned to the compiler after
 a brief hiatus), I think that this was not the right call.
 Mangled names make sense when translating to LLVM; we certainly
-don't want to declare two LLVM functions
-{{< sidenote "right" "mangling-note" "with the same name." >}}
-By the way, LLVM has its own name mangling functionality. If you
-declare two functions with the same name, they'll appear as
-<code>function</code> and <code>function.0</code>. Since LLVM
-uses the <code>Function*</code> C++ values to refer to functions,
-as long as we keep them seaprate on <em>our</em> end, things will
-work.<br>
-<br>
-However, in our compiler, name mangling occurs before LLVM is
-introduced, at translation time. We could create LLVM functions
-at that time, too, and associate them with variables. But then,
-our G-machine instructions will be coupled to LLVM, which
-would not be as clean.
-{{< /sidenote >}}
+don't want to declare two LLVM functions with the same name.
 But things are different for local variables. Our local variables
 are graphs on a stack, and are not actually compiled to LLVM
 definitions. It doesn't make sense to mangle their names, since
@ -633,8 +612,8 @@ their names aren't present anywhere in the final executable.
 It's not even "consistent" to mangle them, since global definitions
 are compiled directly to __PushGlobal__ instructions, while local
 variables are only referenced through the current `env`.
-So, I opted to reverse my decision. We will go back to
-placing variable names directly into `env_var`. Here's
+So, I decided to reverse my decision. We will go back to
+placing variable names directly onto `env_var`. Here's
 an example of this from `global_scope.cpp`:

 {{< codelines "C++" "compiler/13/global_scope.cpp" 6 8 >}}
@ -651,8 +630,8 @@ that a variable from a __PushGlobal__ instruction
 is referencing the right function. To achieve
 this, we change `get_mangled_name` to stop
 returning the input string if a mangled name was not
-found; doing so makes it impossible to check if a mangled
-name was explicitly defined. Instead,
+found; now that we _must_ have a mangled name, doing
+so is effectively obscuring the error. Instead,
 we add two assertions. First, if an environment scope doesn't
 contain a variable, then it _must_ have a parent. 
 If it does contain variable, that variable _must_ have
@ -673,19 +652,7 @@ Here's the definition of `type_env::variable_data` now:
 {{< codelines "C++" "compiler/13/type_env.hpp" 16 25 >}}

 Since looking up a mangled name for non-global variable
-{{< sidenote "right" "unrepresentable-note" "will now result in an assertion failure," >}}
-A very wise human at the very dawn of our species once said,
-"make illegal states unrepresentable". Their friends and family were a little
-busy making a fire, and didn't really understand what the heck they meant. Now,
-we kind of do.<br>
-<br>
-It's <em>possible</em> for our <code>type_env</code> to include a
-<code>variable_data</code> entry that is both global and has no mangled
-name. But it doesn't have to be this way. We could define two subclasses
-of <code>variable_data</code>, one global and one local,
-where only the global one has a <code>mangled_name</code>
-field. It would be impossible to reach this assertion failure then.
-{{< /sidenote >}} we have to change
+will now result in an assertion failure, we have to change
 `ast_lid::compile` to only call `get_mangled_name` once
 it ensures that the variable being compiled is, in fact,
 global:
@ -745,7 +712,7 @@ They're just temporarily allowed access.

 So, what should be the owner of all of these disparate components?
 Thus far, that has been the `main` function, or the utility
-functions that it calls out to. However, this is sloppy:
+functions that it calls out to. However, this is in bad taste:
 we have related data and operations on it, but we don't group
 them into an object. We can group all of the components of our
 compiler into a `compiler` object, and leave `main.cpp` with
@ -780,11 +747,14 @@ The methods of the compiler are arranged similarly:
 The methods go as follows:

 * `add_default_types` adds the built-in types to the `global_env`.
-At this point, these types only include `Int`. 
+At this point in the post, these types only include `Int`. However,
+in the second section, we'll make `Bool` a built-in type, too.
 * `add_binop_type` adds a single binary operator to the global
 type environment. We saw its implementation earlier: it deals
 with both binding a type, and setting a mangled name.
-* `add_default_types` adds the types for each binary operator.
+* `add_default_types` adds the types for each binary operator,
+and also for the `True` and `False` constructors (which we will
+cover in the second section).
 * `parse`, `typecheck`, `translate` and `compile` all do exactly
 what they say. In this case, compilation refers to creating G-machine
 instructions.
@ -806,7 +776,7 @@ file with the
 file that we end up with at the end of this post.

 Next, we have the compiler's constructor, and its `operator()`. The
-latter, analogously to our parsing driver, will trigger the compilation
+latter, analogously to our parse driver, will trigger the compilation
 process. Their implementations are straightforward:

 {{< codelines "C++" "compiler/13/compiler.cpp" 131 145 >}}
@ -823,8 +793,11 @@ pretty printing code:

 {{< codelines "C++" "compiler/13/main.cpp" 11 27 >}}

-With this, we complete our transition to a compiler object.
-All that's left is to clean up the code style.
+That's all for the cleanup! We've added locations and more errors
+the compiler, stopped throwing `0` in favor of proper exceptions
+or assertions, made name mangling more reasonable, fixed a bug with
+accidentally shadowing default functions, and organized our compilation
+process into a `compiler` class.

 ### Keeping Things Private
 Hand-writing or generating hundreds of trivial getters and setters
@ -907,58 +880,3 @@ name with `f_`, much like `create_custom_function`:
 I think that's enough. If we chose to turn more compiler
 data structures into classes, I think we would've quickly drowned
 in one-line getter and setter methods.
-
-That's all for the cleanup! We've added locations and more errors
-to the compiler, stopped throwing `0` in favor of proper exceptions
-or assertions, made name mangling more reasonable, fixed a bug with
-accidentally shadowing default functions, organized our compilation
-process into a `compiler` class, and made more things into classes.
-In the next post, I hope to tackle __strings__ and __Input/Output__.
-I also think that implementing __modules__ would be a good idea,
-though at the moment I don't know too much on the subject. I hope
-you'll join me in my future writing!
-
-### Appendix: Optimization
-When I started working on the compiler after the previous post,
-I went a little overboard. I started working on optimizing the generated programs,
-but eventually decided I wasn't doing a
-{{< sidenote "right" "good-note" "good enough" >}}
-I think authors should feel a certain degree of responsibility
-for the content they create. If I do something badly, somebody
-else trusts me and learns from it, who knows how much damage I've done.
-I try not to do damage.<br>
-<br>
-If anyone reads what I write, anyway!
-{{< /sidenote >}} job to present it to others,
-and scrapped that part of the compiler altogether. I'm not
-sure if I will try again in the near future. But,
-if you're curious about optimization, here are a few avenues
-I've explored or thought about:
-
-* __Unboxing numbers__. Right now, numbers are allocated and garbage
-collected just like the rest of the graph nodes. This is far from ideal.
-We could use pointers to represent numbers, by tagging their most significant
-bits on 64-bit CPUs. Rather than allocating a node, the runtime will just
-cast a number to a pointer, tag it, and push it on the stack.
-* __Converting enumeration data types to numbers__. If no constructor
-of a data type takes any arguments, then the tag uniquely identifies
-each constructor. Combined with unboxed numbers, this can save unnecessary
-allocations and memory accesses.
-* __Special treatment for global constants__. It makes sense for
-global functions to be converted into LLVM functions, but the
-same is not the case for
-{{< sidenote "right" "constant-note" "constants." >}}
-Yeah, yeah, a constant is just a nullary function. Get
-out of here with your pedantry!
-{{< /sidenote >}} We can find a way to
-initialize global constants once, which would save some work. To
-make more constants suitable for this, we could employ
-[monomorphism restriction](https://wiki.haskell.org/Monomorphism_restriction).
-* __Optimizing stack operations.__ If you read through the LLVM IR
-we produce, you can see a lot of code that peeks at something twice,
-or pops-then-pushes the same value, or does other absurd things. LLVM
-isn't aware of the semantics of our stacks, but perhaps we could write an
-optimization pass to deal with some of the more blatant instances of
-this issue.
-
-If you attempt any of these, let me know how it goes, please!