Update blog post, switching away from two sections.

2020-09-17 22:35:40 -07:00 · 2020-09-17 22:35:40 -07:00 · 98cac103c4
commit 98cac103c4
parent 7226d66f67
1 changed files with 396 additions and 103 deletions
--- a/content/blog/13_compiler_cleanup_optimization/index.md
+++ b/content/blog/13_compiler_cleanup_optimization/index.md
@ -8,28 +8,26 @@ description: "In this post, we clean up our compiler and add some basic optimiza
 In [part 12]({{< relref "12_compiler_let_in_lambda" >}}), we added `let/in`
 and lambda expressions to our compiler. At the end of that post, I mentioned
 that before we move on to bigger and better things, I wanted to take a 
-step back and clean up the compiler.
+step back and clean up the compiler. Now is the time to do that.
-Recently, I got around to doing that. Unfortunately, I also got around to doing
+In particular, I identified three things that could be improved
-a lot more. Furthermore, I managed to make the changes in such a way that I
+or cleaned up:
 can't cleanly separate the 'cleanup' and 'optimization' portions of my work.
 This is partially due to the way in which I organize code, where each post
 is associated with a version of the compiler with the necessary changes.
 Because of all this, instead of making this post about the cleanup, and the
 next post about the optimizations, I have to merge them into one.
-So, this post is split into two major portions: cleanup, which deals mostly
+* __Error handling__. We need to stop using `throw 0` and start
-with touching up exceptions and improving the 'name mangling' logic, and
+using `assert`. We can also make our errors much more descriptive
-optimizations, which deals with adding special treatment to booleans,
+by including source locations in the output.
-unboxing integers, and implementing more binary operators.
+* __Name mangling__. I don't think I got it quite right last
 time. Now is the time to clean it up.
 * __Code organization__. I think we can benefit from a top-level
 class, and a more clear "dependency order" between the various
 classes and structures we've defined.
 * __Code style__. In particular, I've been lazily using `struct`
 in a lot of places. That's not a good idea; it's better
 to use `class`, and only expose _some_ fields and methods
 to the rest of the code.
-### Section 1: Cleanup
+### Error Reporting and Handling
-
+The previous post was rather long, which led me to omit
 The previous post was
 {{< sidenote "right" "long-note" "rather long," >}}
 Probably not as long as this one, though! I really need to get the
 size of my posts under control.
 {{< /sidenote >}} which led me to omit
 a rather important aspect of the compiler: proper error reporting.
 Once again our compiler has instances of `throw 0`, which is a cheap way
 of avoiding properly handling a runtime error. Before we move on,
@ -62,7 +60,7 @@ automatically assemble the "from" and "to" locations of a nonterminal
 from the locations of children, which would be very tedious to write
 by hand. We enable this feature using the following option:
-{{< codelines "C++" "compiler/13/parser.y" 50 50 >}}
+{{< codelines "C++" "compiler/13/parser.y" 46 46 >}}
 There's just one hitch, though. Sure, Bison can compute bigger
 locations from smaller ones, but it must get the smaller ones
@ -97,25 +95,27 @@ to `columns` and `step` to every rule, we can define the
 `YY_USER_ACTION` macro, which is run before each token
 is processed.
-{{< codelines "C++" "compiler/13/scanner.l" 12 12 >}}
+{{< codelines "C++" "compiler/13/scanner.l" 12 14 >}}
-We'll see why we are using `drv` soon; for now, you can treat
+We'll see why we are using `LOC` instead of something like `location` soon;
-`location` as if it were a global variable declared in the
+for now, you can treat `LOC` as if it were a global variable declared 
-tokenizer. Before processing each token, we ensure that
+in the tokenizer. Before processing each token, we ensure that
-`location` has its `begin` and `end` at the same position,
+the `yy::location` has its `begin` and `end` at the same position,
 and then advance `end` by `yyleng` columns. This is sufficient
-to make `location` represent our token's source position.
+to make `LOC` represent our token's source position. For
 the moment, don't worry too much about `drv`; this is the
 parse driver, and we will talk about it shortly.
-So now we have a "global" variable `location` that gives
+So now we have a "global" variable `LOC` that gives
 us the source position of the current token. To get it
 to Bison, we have to pass it as an argument to each
 of the `make_TOKEN` calls. Here are a few sample lines
 that should give you the general idea:
-{{< codelines "C++" "compiler/13/scanner.l" 41 44 >}}
+{{< codelines "C++" "compiler/13/scanner.l" 40 43 >}}
 That last line is actually new. Previously, we somehow
-got away without explicitly sending the EOF token to Bison.
+got away without explicitly sending the end-of-file token to Bison.
 I suspect that this was due to some kind of implicit conversion
 of the Flex macro `YY_NULL` into a token; now that we have
 to pass a position to every token constructor, such an implicit
@ -146,10 +146,10 @@ from `ast_binop`:
 Finally, we tell Bison to pass the computed location
 data as an argument when constructing our data structures.
 This too is a mechanical change, and I think the following
-couple of lines demonstrate the general idea in sufficient
+few lines demonstrate the general idea in sufficient
 detail:
-{{< codelines "C++" "compiler/13/parser.y" 107 110 >}}
+{{< codelines "C++" "compiler/13/parser.y" 92 96 >}}
 Here, the `@$` character is used to reference the current
 nonterminal's location data.
@ -189,7 +189,9 @@ working with files, why not just work directly with the files
 created by the user? Instead of reading from `stdin`, we may
 as well take in a path to a file via `argv`, and read from there.
 Also, instead of `fseek` and `rewind`, we can just read the file
-into memory, and access it like a normal character buffer.
+into memory, and access it like a normal character buffer. This
 does mean that we can stick with `stdin`, but it's more conventional
 to read source code from files, anyway.
 To address the second issue, we can keep a mapping of line numbers
 to their locations in the source buffer. This is rather easy to
@ -200,7 +202,7 @@ the current source location to the top, marking it as
 the beginning of another line. Where exactly we store this
 array is as yet unclear, since we're trying to avoid global variables.
-Finally, begin addressing the third issue, we can use Flex's `reentrant`
+Finally, to begin addressing the third issue, we can use Flex's `reentrant`
 option, which makes it so that all of the tokenizer's state is stored in an
 opaque `yyscan_t` structure, rather than in global variables. This way,
 we can configure `yyin` without setting a global variable, which is a step
@ -221,50 +223,38 @@ creation of a _parsing driver_.
 The parsing driver is a class (or struct) that holds all the parse-related
 state. We can arrange for this class to be available to our tokenizing
 and parsing functions, which will allow us to use it pretty much like we'd
-use a global variable. We can define it as follows:
+use a global variable. This is the `drv` that we saw in `YY_USER_ACTION`.
 We can define it as follows:
-{{< codelines "C++" "compiler/13/parse_driver.hpp" 14 37 >}}
+{{< codelines "C++" "compiler/13/parse_driver.hpp" 36 54 >}}
-There are quite a few fields here. The `file_name` string represents
+There aren't many fields here. The `file_name` string represents
-the file that we'll be reading code from. the `string_stream` will
+the file that we'll be reading code from. The `location` field
-be used to back up the contents of source file as Flex reads them;
+will be accessed by Flex via `get_current_location`. Bison will
-once Flex is done, the content of the `string_stream` will be
+store the function and data type definitions it reads into `global_defs`
-saved into the `file_content` string.
+via `get_global_defs`. Finally, `file_m` will be used to keep track
 of the content of the file we're reading, as well as the line offsets
 within that file. Notice that a couple of these fields are pointers
 that we take by reference in the constructor. The `parse_driver` doesn't
 _own_ the global definitions, nor the file manager. They exist outside
 of it, and will continue to be used in other ways the `parse_driver`
 does not need to know about. Also, the `LOC` variable in Flex is
 actually a call to `get_current_location`:
-The next three fields deal with tracking source code
+{{< codelines "C++" "compiler/13/scanner.l" 15 15 >}}
 locations. The `location` field will be accessed by Flex
 via `drv.location` (where `drv` is a reference to our driver class).
 The `file_offset` and `line_offsets` fields will be used to
 keep track of where each line begins, as we have discussed above.
 Finally, `global_defs` will be the new home of our top-level
 definitions.
-The methods on `parse_driver` are rather simple, too:
+The methods of `parse_driver` are rather simple. The majority of 
 them deals with giving access to the parser's members: the `yy::location`,
 the `definition_group`, and the `file_mgr`. The only exception
 to this is `operator()`, which we use to actually trigger the parsing process.
 We'll make this method return `true` if parsing succeeded, and `false`
 otherwise (if, say, the file we tried to read doesn't exist). 
 Here's its implementation:
-* `run_parse` handles the initialization of the tokenizer
+{{< codelines "C++" "compiler/13/parse_driver.cpp" 48 60 >}}
 and parser, which includes obtaining the `FILE*` and configuring
 Flex to use it. It also handles invoking the parsing code.
 We'll make this method return `true` if parsing succeeded,
 and `false` otherwise (if, say, the file we tried to read doesn't exist).
 * `write` will be called from Flex, and will allow us to
 record the content of the file we're processing to the `string_stream`.
 We've already seen it used in the `YY_USER_ACTION` macro.
 * `mark_line` will also be called from Flex, and will mark the current
 `file_offset` as the beginning of a line by pushing it into `line_offsets`.
 * `get_index` and `get_line_end` will be used for converting
 `yy::location` instances to offsets within the source code buffer.
 * `print_location` will be used for printing errors.
 It will print the lines spanned by the given location, with the
 location itself colored and underlined if the last argument is `true`.
 This will make our errors easier on the eyes.
 Let's take a look at their implementations. First, `run_parse`:
 {{< codelines "C++" "compiler/13/parse_driver.cpp" 5 18 >}}
 We try open the user-specified file, and return `false` if we can't.
-We then initialize `line_offsets` as we discussed above. After
+After this, we start doing the setup specific to a reentrant
 this, we start doing the setup specific to a reentrant
 Flex scanner. We declare a `yyscan_t` variable, which
 will contain all of Flex's state. Then, we initialize
 it using `yylex_init`. Finally, since we can no longer
@ -279,24 +269,65 @@ We'll come back to how this works in a moment. With
 the scanner and parser initialized, we invoke `parser::operator()`,
 which actually runs the Flex- and Bison-generated code.
 To clean up, we run `yylex_destroy` and `fclose`. Finally,
-we extract the contents of our file into the `file_contents`
+we call `file_mgr::finalize`, and return. But what
-string, and return.
+_is_ `file_mgr`?
-Next, the `write` method. For the most part, this method
+The `file_mgr` class does two things: it stores the part of the file
-is a proxy for the `write` method of our `string_stream`:
+that has already been read by Flex in memory, and it keeps track of
 where each line in our source file begins within the text. Here is its
 definition:
-{{< codelines "C++" "compiler/13/parse_driver.cpp" 20 23 >}}
+{{< codelines "C++" "compiler/13/parse_driver.hpp" 14 34 >}}
 In this class, the `string_stream` member is used to construct
 an `std::string` from the bits of text that Flex reads,
 processes, and feeds to the `file_mgr` using the `write` method.
 It's more efficient to use a string stream than to concatenate
 strings repeatedly. Once Flex is finished processing the file,
 the final contents of the `string_stream` are transferred into
 the `file_contents` string using the `finalize` method. The `offset`
 and `line_offsets` fields will be used as we described earlier: each time Flex
 encounters the `\n` character, the `offset` variable will pushed
 in top of the `line_offsets` vector, marking the beginning of
 the corresponding line. The methods of the class are as follows:
 * `write` will be called from Flex, and will allow us to
 record the content of the file we're processing to the `string_stream`.
 We've already seen it used in the `YY_USER_ACTION` macro.
 * `mark_line` will also be called from Flex, and will mark the current
 `file_offset` as the beginning of a line by pushing it into `line_offsets`.
 * `finalize` will be called by the `parse_driver` when the parsing
 finishes. At this time, the `string_stream` should contain all of
 the input file, and this data is transferred to `file_contents`, as
 we mentioned above.
 * `get_index` and `get_line_end` will be used for converting
 `yy::location` instances to offsets within the source code buffer.
 * `print_location` will be used for printing errors.
 It will print the lines spanned by the given location, with the
 location itself colored and underlined if the last argument is `true`.
 This will make our errors easier on the eyes.
 Let's take a look at their implementations. First, `write`.
 For the most part, this method is a proxy for the `write`
 method of our `string_stream`:
 {{< codelines "C++" "compiler/13/parse_driver.cpp" 9 12 >}}
 We do, however, also keep track of the `file_offset` variable
 here, which ensures we have up-to-date information
 regarding our position in the source file. The implementation
 of `mark_line` uses this information:
-{{< codelines "C++" "compiler/13/parse_driver.cpp" 25 27 >}}
+{{< codelines "C++" "compiler/13/parse_driver.cpp" 14 16 >}}
 The `finalize` method is trivial, and requires little additional
 discussion:
 {{< codelines "C++" "compiler/13/parse_driver.cpp" 18 20 >}}
 Once we have the line offsets, `get_index` becomes very simple:
-{{< codelines "C++" "compiler/13/parse_driver.cpp" 29 32 >}}
+{{< codelines "C++" "compiler/13/parse_driver.cpp" 22 25 >}}
 Here, we use an assertion for the first time. Calling
 `get_index` with a negative or zero line doesn't make
@ -313,7 +344,7 @@ beginning of the next line. We stick to the C convention
 of marking 'end' indices exclusive (pointing just past
 the end of the array):
-{{< codelines "C++" "compiler/13/parse_driver.cpp" 34 37 >}}
+{{< codelines "C++" "compiler/13/parse_driver.cpp" 27 30 >}}
 Since `line_offsets` has as many elements as there are lines,
 the last line number would be equal to the vector's size.
@ -333,7 +364,7 @@ we sprinkle the ANSI escape codes to enable and disable
 special formatting, respectively. For now, the special
 formatting involves underlining the text and making it red.
-{{< codelines "C++" "compiler/13/parse_driver.cpp" 39 53 >}}
+{{< codelines "C++" "compiler/13/parse_driver.cpp" 32 46 >}}
 Finally, to get the forward declarations for the `yy*` functions
 and types, we set the `header-file` option in Flex:
@ -386,12 +417,7 @@ the state and the parse driver, we have to define the
 this forward declaration will be used by both Flex
 and Bison:
-{{< codelines "C++" "compiler/13/parse_driver.hpp" 39 41 >}}
+{{< codelines "C++" "compiler/13/parse_driver.hpp" 56 58 >}}
 Finally, we can change our `main.cpp` file to use the
 `parse_driver`:
 {{< codelines "C++" "compiler/13/main.cpp" 178 186 >}}
 #### Improving Exceptions
 Now, it's time to add location data (and a little bit more) to our
@ -421,7 +447,7 @@ the following two lines to our CMakeLists.txt:
 Now, let's add a new base class for all of our compiler errors,
 unsurprisingly called `compiler_error`:
-{{< codelines "C++" "compiler/13/error.hpp" 8 23 >}}
+{{< codelines "C++" "compiler/13/error.hpp" 10 26 >}}
 We'll put some 'common' exception functionality
 into the `print_location` and `print_about` methods. If the error
@ -467,11 +493,7 @@ first, and is treat like the "correct" type. The
 `right` type, on the other hand, is treated
 like the "wrong" type that should have been
 unifiable with `left`. This will affect the
-calling conventions of our unification code. In
+calling conventions of our unification code.
 `main`, we remove all our old exception printing code
 in favor of calls to `pretty_print`:
 {{< codelines "C++" "compiler/13/main.cpp" 207 213 >}}
 Now, we can go through and find all the places where
 we `throw 0`. One such place was in the data type
@ -513,7 +535,7 @@ In general, this change is also rather mechanical, but, to
 maintain a balance between exceptions and assertions, here
 are a couple more assertions from `type_env`:
-{{< codelines "C++" "compiler/13/type_env.cpp" 77 78 >}}
+{{< codelines "C++" "compiler/13/type_env.cpp" 76 77 >}}
 Once again, it should not be possible for the compiler
 to try generalize the type of a variable that doesn't
@ -528,35 +550,34 @@ To fix this, we add a new `loc` parameter to `unify`,
 which we make optional to allow for unification without
 a known location. Here's the declaration:
-{{< codelines "C++" "compiler/13/type.hpp" 101 101 >}}
+{{< codelines "C++" "compiler/13/type.hpp" 92 92 >}}
 The change to the implementation is mechanical and repetitive,
 so instead of showing you the whole method, I'll settle for
 a couple of lines:
-{{< codelines "C++" "compiler/13/type.cpp" 119 121 >}}
+{{< codelines "C++" "compiler/13/type.cpp" 121 122 >}}
 We want to make sure that a location provided to the
 top-level call to `unify` is also forwarded to the
 recursive calls, so we have to explicitly add it
 to the call.
-With all of that done, we can finally stand back and
+We'll also have to update the 'main' code to call the
-marvel at the results of our hard work. Here is what a
+`pretty_print` methods, but there's another big change
-basic unification error looks like now:
+that we're going to make before then. However, once that
-
+change is made, our errors will look a lot better.
-{{< figure src="unification_error.png" caption="The result of a unification error." >}}
+Here is what's printed out to the user when a type error
-
+occurs:
 I used an image to show colors, but here is the content of the error in textual form:
 ```
 an error occured while checking the types of the program: failed to unify types
 occuring on line 2:
    3 + False
 the expected type was:
-  !Int
+  Int
 while the actual type was:
-  !Bool
+  Bool
 ```
 The exclamation marks in front of the two types are due to some
@ -572,3 +593,275 @@ data Pair a a = { MkPair a a }
 Now, not only have we eliminated the lazy uses of `throw 0` in our
 code, but we've also improved the presentation of the errors
 to the user!
 ### Rethinking Name Mangling
 In the previous post, I said the following:
 > One more thing. Let’s adopt the convention of storing mangled names into the compilation environment. This way, rather than looking up mangled names only for global functions, which would be a ‘gotcha’ for anyone working on the compiler, we will always use the mangled names during compilation.
 Now that I've had some more time to think about it
 (and now that I've returned to the compiler after
 a brief hiatus), I think that this was not the right call.
 Mangled names make sense when translating to LLVM; we certainly
 don't want to declare two LLVM functions with the same name.
 But things are different for local variables. Our local variables
 are graphs on a stack, and are not actually compiled to LLVM
 definitions. It doesn't make sense to mangle their names, since
 their names aren't present anywhere in the final executable.
 It's not even "consistent" to mangle them, since global definitions
 are compiled directly to __PushGlobal__ instructions, while local
 variables are only referenced through the current `env`.
 So, I decided to reverse my decision. We will go back to
 placing variable names directly onto `env_var`. Here's
 an example of this from `global_scope.cpp`:
 {{< codelines "C++" "compiler/13/global_scope.cpp" 6 8 >}}
 Now that we've started using assertions, I also think it's worth
 to put our new invariant -- "only global definitions have mangled
 names" -- into code:
 {{< codelines "C++" "compiler/13/type_env.cpp" 35 43 >}}
 Furthermore, we'll _require_ that a global definition
 has a mangled name. This way, we can be more confident
 that a variable from a __PushGlobal__ instruction
 is referencing the right function. To achieve
 this, we change `get_mangled_name` to stop
 returning the input string if a mangled name was not
 found; now that we _must_ have a mangled name, doing
 so is effectively obscuring the error. Instead,
 we add another assertion: if an environment scope doesn't
 contain a mangled name for a variable, then it _must_
 have a parent. We end up with the following:
 {{< codelines "C++" "compiler/13/type_env.cpp" 45 51 >}}
 Since looking up a mangled name for non-global variable
 will now result in an assertion failure, we have to change
 `ast_lid::compile` to only call `get_mangled_name` once
 it ensures that the variable being compiled is, in fact,
 global:
 {{< codelines "C++" "compiler/13/ast.cpp" 58 63 >}}
 Since all global functions now need to have mangled
 names, we run into a bit of a problem. What are
 the mangled names of `(+)`, `(-)`, and so on? We could
 continue to hardcode them as `plus`, `minus`, etc., but this can
 (and currently does!) lead to errors. Consider the following
 piece of code:
 ```
 defn plus x y = { x + y }
 defn main = { plus 320 6 }
 ```
 We've hardcoded the mangled name of `(+)` to be `plus`. However,
 `global_scope` doesn't know about this, so when the actual
 `plus` function gets translated, it also gets assigned the
 mangled name `plus`. The name is also overwritten in the
 `llvm_context`, which effectively means that `(+)` is
 now compiled to a call of the user-defined `plus` function.
 If we didn't overwrite the name, we would've run into an assertion
 failure in this scenario anyway. In short, this example illustrates
 an important point: mangling information needs to be available
 outside of a `global_scope`. We don't want to do this by having
 every function take in a `global_scope` to access the mangling
 information; instead, we'll store the mangling information in
 a new `mangler` class, which `global_scope` will take as an argument.
 The new class is very simple:
 {{< codelines "C++" "compiler/13/mangler.hpp" 5 11 >}}
 As with `parse_driver`, `global_scope` takes `mangler` by reference
 and stores a pointer:
 {{< codelines "C++" "compiler/13/global_scope.hpp" 50 50 >}}
 The implementation of `new_mangled_name` doesn't change, so I'm
 not going to show it here. With this new mangling information
 in hand, we can now correctly set the mangled names of binary
 operators:
 {{< codelines "C++" "compiler/13/compiler.cpp" 22 27 >}}
 Wait a moment, what's a `compiler`? Let's talk about that next.
 ### A Top-Level Class
 Now that we've moved name mangling out of `global_scope`, we have
 to put it somewhere. The same goes for global definition group
 and the file manager that are given to `parse_driver`. The two
 classes _make use_ of the other data, but they don't _own it_.
 That's why they take it by reference, and store it as a pointer.
 They're just temporarily allowed access.
 So, what should be the owner of all of these disparate components?
 Thus far, that has been the `main` function, or the utility
 functions that it calls out to. However, this is in bad taste:
 we have related data and operations on it, but we don't group
 them into an object. We can group all of the components of our
 compiler into a `compiler` object, and leave `main.cpp` with
 exception printing code.
 The definition of the `compiler` class begins with all of the data
 structures that we use in the process of compilation:
 {{< codelines "C++" "compiler/13/compiler.hpp" 12 20 >}}
 There's a loose ordering to these fields. In C++, class members are
 initialized in the order they are declared; we therefore want to make
 sure that fields that are depended on by other fields are initialized first.
 Otherwise, I tried to keep the order consistent with the conceptual path
 of the code through the compiler.
 * Parsing happens first, so we begin with `parse_driver`, which needs a 
 `file_manager` (to populate with line information) and a `definition_group`
 (to receive the global definitions from the parser).
 * We then proceed to typechecking, for which we use a global `type_env_ptr`
 (to define the built-in functions and constructors) and a `type_mgr` (to
 manage the assignments of type variables).
 * Once a program is typechecked, we transform it, eliminating local
 function definitions and lambda functions. This is done by storing
 newly-emitted global functions into the `global_scope`, which requires a
 `mangler` to generate new names for the target functions.
 * Finally, to generate LLVM IR, we need our `llvm_context` class.
 The methods of the compiler are arranged similarly:
 {{< codelines "C++" "compiler/13/compiler.hpp" 22 31 >}}
 The methods go as follows:
 * `add_default_types` adds the built-in types to the `global_env`.
 At this point in the post, these types only include `Int`. However,
 in the second section, we'll make `Bool` a built-in type, too.
 * `add_binop_type` adds a single binary operator to the global
 type environment. We saw its implementation earlier: it deals
 with both binding a type, and setting a mangled name.
 * `add_default_types` adds the types for each binary operator,
 and also for the `True` and `False` constructors (which we will
 cover in the second section).
 * `parse`, `typecheck`, `translate` and `compile` all do exactly
 what they say. In this case, compilation refers to creating G-machine
 instructions.
 * `create_llvm_binop` creates an internal function that forces the
 evaluation of its two arguments, and actually applies the given binary
 operator. Recall that the `(+)` in user code constructs a call to this
 function, but leaves it unevaluated until it's needed.
 * `generate_llvm` converts all the definitions in `global_scope`, which
 are at this point compiled into G-machine `instruction`s, into LLVM IR.
 * `output_llvm` contains all the code to actually generate an object
 file from the LLVM IR.
 These functions are mostly taken from part 12's `main.cpp`, and adjusted
 to use the `compiler`'s members rather than local definitions or arguments.
 You should compare part 12's
 [`main.cpp`](https://dev.danilafe.com/Web-Projects/blog-static/src/branch/master/code/compiler/12/main.cpp)
 file with the 
 [`compiler.cpp`](https://dev.danilafe.com/Web-Projects/blog-static/src/branch/master/code/compiler/13/compiler.cpp)
 file that we end up with at the end of this post.
 Next, we have the compiler's constructor, and its `operator()`. The
 latter, analogously to our parse driver, will trigger the compilation
 process. Their implementations are straightforward:
 {{< codelines "C++" "compiler/13/compiler.cpp" 131 145 >}}
 We also add a couple of methods to give external code access to
 some of the compiler's data structures. I omit their (trivial)
 implementations, but they have the following signatures:
 {{< codelines "C++" "compiler/13/compiler.hpp" 35 36 >}}
 With all the compilation code tucked into our new `compiler` class,
 `main` becomes very simple. We also finally get to use our exception
 pretty printing code:
 {{< codelines "C++" "compiler/13/main.cpp" 11 27 >}}
 That's all for the cleanup! We've added locations and more errors
 the compiler, stopped throwing `0` in favor of proper exceptions
 or assertions, made name mangling more reasonable, fixed a bug with
 accidentally shadowing default functions, and organized our compilation
 process into a `compiler` class.
 ### Keeping Things Private
 Hand-writing or generating hundreds of trivial getters and setters
 for the fields of a data class (which is standard in the world of Java) seems
 absurd to me. So, for most of this project, I stuck with
 `struct`s, rather than classes. But this is not a good policy
 to apply _everywhere_. I still think it makes sense to make
 data structures like `ast` and `type` public-by-default;
 however, I _don't_ think that way about classes like `type_mgr`,
 `llvm_context`, `type_env`, and `env`. All of these have information
 that we should never be accessing directly. Some guard this
 information with assertions. In short, it should be protected.
 For most classes, the changes are mechanical. For instance, we
 can make `type_env` a class simply by changing its declaration,
 and marking all of its functions public. This requires a slight
 refactoring of a line that used its `parent` field. Here's
 what it used to be (in context):
 {{< codelines "C++" "compiler/12/main.cpp" 57 60 >}}
 And here's what it is now:
 {{< codelines "C++" "compiler/13/compiler.cpp" 55 58 >}}
 We always declare the `definition_defn` function in
 the `global_env`. Thus, that's the only environment
 we need to know about to update the mangled name.
 The deal with `env` is about as simple. We just make
 it and its two descendants classes, and mark their
 methods and constructors public. The same
 goes for `global_scope`. To make `type_mgr`
 a class, we have to add a new method: `lookup`.
 Here's its implementation:
 {{< codelines "C++" "compiler/13/type.cpp" 81 85 >}}
 It's used in `type_var::print` as follows:
 {{< codelines "C++" "compiler/13/type.cpp" 28 35 >}}
 We can't use `resolve` here because it takes (and returns)
 a `type_ptr`. If we make it _take_ a `type*`, it won't
 be able to return its argument if it's already resolved. If we
 allow it to _return_ `type*`, we won't have an owning
 reference. We also don't want to duplicate the
 method just for this one call. Notice, though, how similar
 `type_var::print`/`lookup` and `resolve` are in terms of execution.
 The change for `llvm_context` requires a little more work.
 Right now, `ctx.builder` is used a _lot_ in `instruction.cpp`.
 Since we don't want to forward each of the LLVM builder methods,
 and since it feels weird to make `llvm_context` extend `llvm::IRBuilder`,
 we'll just provide a getter for the `builder` field. The
 same goes for `module`:
 {{< codelines "C++" "compiler/13/llvm_context.hpp" 46 47 >}}
 Here's what some of the code from `instruction.cpp` looks like now:
 {{< codelines "C++" "compiler/13/instruction.cpp" 144 145 >}}
 Right now, the `ctx` field of the `llvm_context` (which contains
 the `llvm::LLVMContext`) is only externally used to create
 instances of `llvm::BasicBlock`. We'll add a proxy method
 for this functionality:
 {{< codelines "C++" "compiler/13/llvm_context.cpp" 174 176 >}}
 Finally, `instruction_pushglobal` needs to access the
 `llvm::Function` instances that we create in the process
 of compilation. We add a new `get_custom_function` method
 to support this, which automatically prefixes the function
 name with `f_`, much like `create_custom_function`:
 {{< codelines "C++" "compiler/13/llvm_context.cpp" 292 294 >}}
 I think that's enough. If we chose to turn more compiler
 data structures into classes, I think we would've quickly drowned
 in one-line getter and setter methods.