575 lines
25 KiB
Markdown
575 lines
25 KiB
Markdown
---
|
|
title: Compiling a Functional Language Using C++, Part 13 - More Improvements
|
|
date: 2020-09-10T18:50:02-07:00
|
|
tags: ["C and C++", "Functional Languages", "Compilers"]
|
|
description: "In this post, we clean up our compiler and add some basic optimizations."
|
|
---
|
|
|
|
In [part 12]({{< relref "12_compiler_let_in_lambda" >}}), we added `let/in`
|
|
and lambda expressions to our compiler. At the end of that post, I mentioned
|
|
that before we move on to bigger and better things, I wanted to take a
|
|
step back and clean up the compiler.
|
|
|
|
Recently, I got around to doing that. Unfortunately, I also got around to doing
|
|
a lot more. Furthermore, I managed to make the changes in such a way that I
|
|
can't cleanly separate the 'cleanup' and 'optimization' portions of my work.
|
|
This is partially due to the way in which I organize code, where each post
|
|
is associated with a version of the compiler with the necessary changes.
|
|
Because of all this, instead of making this post about the cleanup, and the
|
|
next post about the optimizations, I have to merge them into one.
|
|
|
|
So, this post is split into two major portions: cleanup, which deals mostly
|
|
with touching up exceptions and improving the 'name mangling' logic, and
|
|
optimizations, which deals with adding special treatment to booleans,
|
|
unboxing integers, and implementing more binary operators.
|
|
|
|
### Section 1: Cleanup
|
|
|
|
The previous post was
|
|
{{< sidenote "right" "long-note" "rather long," >}}
|
|
Probably not as long as this one, though! I really need to get the
|
|
size of my posts under control.
|
|
{{< /sidenote >}} which led me to omit
|
|
a rather important aspect of the compiler: proper error reporting.
|
|
Once again our compiler has instances of `throw 0`, which is a cheap way
|
|
of avoiding properly handling a runtime error. Before we move on,
|
|
it's best to get rid of such blatantly lazy code.
|
|
|
|
Our existing exceptions (mostly type errors) can use some work, too.
|
|
Even the most descriptive issues our compiler reports -- unification errors --
|
|
don't include the crucial information of _where_ the error is. For large
|
|
programs, this means having to painstakingly read through the entire file
|
|
to try figure out which subexpression could possibly have an incorrect type.
|
|
This is far from the ideal debugging experience.
|
|
|
|
Addressing all this is a multi-step change in itself. We want to:
|
|
|
|
* Replace all `throw 0` code with actual exceptions.
|
|
* Replace some exceptions that shouldn't be possible for a user to trigger
|
|
with assertions.
|
|
* Keep track of source locations of each subexpression, so that we may
|
|
be able to print it if it causes an error.
|
|
* Be able to print out said source locations at will. This isn't
|
|
a _necessity_, but virtually all "big" compilers do this. Instead
|
|
of reporting that an error occurs on a particular line, we will
|
|
actually print the line.
|
|
|
|
Let's start with gathering the actual location data.
|
|
|
|
#### Bison's Locations
|
|
Bison actually has some rather nice support for location tracking. It can
|
|
automatically assemble the "from" and "to" locations of a nonterminal
|
|
from the locations of children, which would be very tedious to write
|
|
by hand. We enable this feature using the following option:
|
|
|
|
{{< codelines "C++" "compiler/13/parser.y" 50 50 >}}
|
|
|
|
There's just one hitch, though. Sure, Bison can compute bigger
|
|
locations from smaller ones, but it must get the smaller ones
|
|
from somewhere. Since Bison operates on _tokens_, rather
|
|
than _characters_, it effectively doesn't interact with the source
|
|
text at all, and can't determine from which line or column a token
|
|
originated. The task of determining the locations of input tokens
|
|
is delegated to the tokenizer -- Flex, in our case. Flex, on the
|
|
other hand, doesn't doesn't have a built-in mechanism for tracking
|
|
locations. Fortunately, Bison provides a `yy::location` class that
|
|
includes most of the needed functionality.
|
|
|
|
A `yy::location` consists of `begin` and `end` source position,
|
|
which themselves are represented using lines and columns. It
|
|
also has the following methods:
|
|
|
|
* `yy::location::columns(int)` advances the `end` position by
|
|
the given number of columns, while `begin` stays the same.
|
|
If `begin` and `end` both point to the beginning of a token,
|
|
then `columns(token_length)` will move `end` to the token's end,
|
|
and thus make the whole `location` contain the token.
|
|
* `yy::location::lines(int)` behaves similarly to `columns`,
|
|
except that it advances `end` by the given number of lines,
|
|
rather than columns.
|
|
* `yy::location::step()` moves `begin` to where `end` is. This
|
|
is useful for when we've finished processing a token, and want
|
|
to move on to the next one.
|
|
|
|
For Flex specifically, `yyleng` has the length of the token
|
|
currently being processed. Rather than adding the calls
|
|
to `columns` and `step` to every rule, we can define the
|
|
`YY_USER_ACTION` macro, which is run before each token
|
|
is processed.
|
|
|
|
{{< codelines "C++" "compiler/13/scanner.l" 12 12 >}}
|
|
|
|
We'll see why we are using `drv` soon; for now, you can treat
|
|
`location` as if it were a global variable declared in the
|
|
tokenizer. Before processing each token, we ensure that
|
|
`location` has its `begin` and `end` at the same position,
|
|
and then advance `end` by `yyleng` columns. This is sufficient
|
|
to make `location` represent our token's source position.
|
|
|
|
So now we have a "global" variable `location` that gives
|
|
us the source position of the current token. To get it
|
|
to Bison, we have to pass it as an argument to each
|
|
of the `make_TOKEN` calls. Here are a few sample lines
|
|
that should give you the general idea:
|
|
|
|
{{< codelines "C++" "compiler/13/scanner.l" 41 44 >}}
|
|
|
|
That last line is actually new. Previously, we somehow
|
|
got away without explicitly sending the EOF token to Bison.
|
|
I suspect that this was due to some kind of implicit conversion
|
|
of the Flex macro `YY_NULL` into a token; now that we have
|
|
to pass a position to every token constructor, such an implicit
|
|
conversion is probably impossible.
|
|
|
|
Now we have Bison computing source locations for each nonterminal.
|
|
However, at the moment, we still aren't using them. To change that,
|
|
we need to add a `yy::location` argument to each of our `ast` nodes,
|
|
as well as to the `pattern` subclasses, `definition_defn` and
|
|
`definition_data`. To avoid breaking all the code that creates
|
|
AST nodes and definitions outside of the parser, we'll make this
|
|
argument optional. Inside of `ast.hpp`, we define it as follows:
|
|
|
|
{{< codelines "C++" "compiler/13/ast.hpp" 16 16 >}}
|
|
|
|
Then, we add a constructor to `ast` as follows:
|
|
|
|
{{< codelines "C++" "compiler/13/ast.hpp" 18 18 >}}
|
|
|
|
Note that it's not default here, since `ast` itself is an
|
|
abstract class, and thus will never be constructed directly.
|
|
It is in the subclasses of `ast` that we provide a default
|
|
value. The change is rather mechanical, but here's an example
|
|
from `ast_binop`:
|
|
|
|
{{< codelines "C++" "compiler/13/ast.hpp" 98 99 >}}
|
|
|
|
Finally, we tell Bison to pass the computed location
|
|
data as an argument when constructing our data structures.
|
|
This too is a mechanical change, and I think the following
|
|
couple of lines demonstrate the general idea in sufficient
|
|
detail:
|
|
|
|
{{< codelines "C++" "compiler/13/parser.y" 107 110 >}}
|
|
|
|
Here, the `@$` character is used to reference the current
|
|
nonterminal's location data.
|
|
|
|
#### Line Offsets, File Input, and the Parse Driver
|
|
There are three more challenges with printing out the line
|
|
of code where an error occurred. First of all, to
|
|
print out a line of code, we need to have that line of code
|
|
available to us. We do not currently meet this requirement:
|
|
our compiler reads code form `stdin` (as is default for Flex),
|
|
and `stdin` doesn't always support rewinding. This, in turn,
|
|
means that once Flex has read a character from the input,
|
|
it may not be possible to go back and retrieve that character
|
|
again.
|
|
|
|
Second, even if we do have have the entire stream or buffer
|
|
available to us, to retrieve an offset and length within
|
|
that buffer from just a line and column number would be a lot
|
|
of work. A naive approach would be to iterate through
|
|
the input again, once more keeping track of lines and columns,
|
|
and print the desired line once we reach it. However, this
|
|
would lead us to redo a lot of work that our tokenizer
|
|
is already doing.
|
|
|
|
Third, Flex's input mechanism, even if it it's configured
|
|
not to read from `stdin`, uses a global file descriptor called
|
|
`yyin`. However, we're better off minimizing global state (especially
|
|
if we want to read, parse, and compile multiple files in
|
|
the future). While we're configuring Flex's input mechanism,
|
|
we may as well fix this, too.
|
|
|
|
There are several approaches to fixing the first issue. One possible
|
|
way is to store the content of `stdin` into a temporary file. Then,
|
|
it's possible to read from the file multiple times by using
|
|
the C functions `fseek` and `rewind`. However, since we're
|
|
working with files, why not just work directly with the files
|
|
created by the user? Instead of reading from `stdin`, we may
|
|
as well take in a path to a file via `argv`, and read from there.
|
|
Also, instead of `fseek` and `rewind`, we can just read the file
|
|
into memory, and access it like a normal character buffer.
|
|
|
|
To address the second issue, we can keep a mapping of line numbers
|
|
to their locations in the source buffer. This is rather easy to
|
|
maintain using an array: the first element of the array is 0,
|
|
which is the beginning of any line in any source file. From there,
|
|
every time we encounter the character `\n`, we can push
|
|
the current source location to the top, marking it as
|
|
the beginning of another line. Where exactly we store this
|
|
array is as yet unclear, since we're trying to avoid global variables.
|
|
|
|
Finally, begin addressing the third issue, we can use Flex's `reentrant`
|
|
option, which makes it so that all of the tokenizer's state is stored in an
|
|
opaque `yyscan_t` structure, rather than in global variables. This way,
|
|
we can configure `yyin` without setting a global variable, which is a step
|
|
in the right direction. We'll work on this momentarily.
|
|
|
|
Our tokenizing and parsing stack has more global variables
|
|
than just those specific to Flex. Among these variables is `global_defs`,
|
|
which receives all the top-level function and data type definitions. We
|
|
will also need some way of accessing the `yy::location` instance, and
|
|
a way of storing our file input in memory. Fortunately, we're not
|
|
the only ones to have ever come across the issue of creating non-global
|
|
state: the Bison documentation has a
|
|
[section in its C++ guide](https://www.gnu.org/software/bison/manual/html_node/Calc_002b_002b-Parsing-Driver.html)
|
|
that describes a technique for manipulating
|
|
state -- "parsing context", in their words. This technique involves the
|
|
creation of a _parsing driver_.
|
|
|
|
The parsing driver is a class (or struct) that holds all the parse-related
|
|
state. We can arrange for this class to be available to our tokenizing
|
|
and parsing functions, which will allow us to use it pretty much like we'd
|
|
use a global variable. We can define it as follows:
|
|
|
|
{{< codelines "C++" "compiler/13/parse_driver.hpp" 14 37 >}}
|
|
|
|
There are quite a few fields here. The `file_name` string represents
|
|
the file that we'll be reading code from. the `string_stream` will
|
|
be used to back up the contents of source file as Flex reads them;
|
|
once Flex is done, the content of the `string_stream` will be
|
|
saved into the `file_content` string.
|
|
|
|
The next three fields deal with tracking source code
|
|
locations. The `location` field will be accessed by Flex
|
|
via `drv.location` (where `drv` is a reference to our driver class).
|
|
The `file_offset` and `line_offsets` fields will be used to
|
|
keep track of where each line begins, as we have discussed above.
|
|
Finally, `global_defs` will be the new home of our top-level
|
|
definitions.
|
|
|
|
The methods on `parse_driver` are rather simple, too:
|
|
|
|
* `run_parse` handles the initialization of the tokenizer
|
|
and parser, which includes obtaining the `FILE*` and configuring
|
|
Flex to use it. It also handles invoking the parsing code.
|
|
We'll make this method return `true` if parsing succeeded,
|
|
and `false` otherwise (if, say, the file we tried to read doesn't exist).
|
|
* `write` will be called from Flex, and will allow us to
|
|
record the content of the file we're processing to the `string_stream`.
|
|
We've already seen it used in the `YY_USER_ACTION` macro.
|
|
* `mark_line` will also be called from Flex, and will mark the current
|
|
`file_offset` as the beginning of a line by pushing it into `line_offsets`.
|
|
* `get_index` and `get_line_end` will be used for converting
|
|
`yy::location` instances to offsets within the source code buffer.
|
|
* `print_location` will be used for printing errors.
|
|
It will print the lines spanned by the given location, with the
|
|
location itself colored and underlined if the last argument is `true`.
|
|
This will make our errors easier on the eyes.
|
|
|
|
Let's take a look at their implementations. First, `run_parse`:
|
|
|
|
{{< codelines "C++" "compiler/13/parse_driver.cpp" 5 18 >}}
|
|
|
|
We try open the user-specified file, and return `false` if we can't.
|
|
We then initialize `line_offsets` as we discussed above. After
|
|
this, we start doing the setup specific to a reentrant
|
|
Flex scanner. We declare a `yyscan_t` variable, which
|
|
will contain all of Flex's state. Then, we initialize
|
|
it using `yylex_init`. Finally, since we can no longer
|
|
touch the `yyin` global variable (it doesn't exist),
|
|
we have to resort to using a setter function provided by Flex
|
|
to configure the tokenizer's input stream.
|
|
|
|
Next, we construct our Bison-generated parser. Note that
|
|
unlike before, we have to pass in two arguments:
|
|
`scanner` and `*this`, the latter being of type `parse_driver&`.
|
|
We'll come back to how this works in a moment. With
|
|
the scanner and parser initialized, we invoke `parser::operator()`,
|
|
which actually runs the Flex- and Bison-generated code.
|
|
To clean up, we run `yylex_destroy` and `fclose`. Finally,
|
|
we extract the contents of our file into the `file_contents`
|
|
string, and return.
|
|
|
|
Next, the `write` method. For the most part, this method
|
|
is a proxy for the `write` method of our `string_stream`:
|
|
|
|
{{< codelines "C++" "compiler/13/parse_driver.cpp" 20 23 >}}
|
|
|
|
We do, however, also keep track of the `file_offset` variable
|
|
here, which ensures we have up-to-date information
|
|
regarding our position in the source file. The implementation
|
|
of `mark_line` uses this information:
|
|
|
|
{{< codelines "C++" "compiler/13/parse_driver.cpp" 25 27 >}}
|
|
|
|
Once we have the line offsets, `get_index` becomes very simple:
|
|
|
|
{{< codelines "C++" "compiler/13/parse_driver.cpp" 29 32 >}}
|
|
|
|
Here, we use an assertion for the first time. Calling
|
|
`get_index` with a negative or zero line doesn't make
|
|
any sense, since Bison starts tracking line numbers
|
|
at 1. Similarly, asking for a line for which we don't
|
|
have a recorded offset is invalid. Both
|
|
of these nonsensical calls to `get_index` cannot
|
|
be caused by the user under normal circumstances,
|
|
and indicate the method's misuse by the author of
|
|
the compiler (us!). Thus, we terminate the program.
|
|
|
|
Finally, the implementation of `line_end` just finds the
|
|
beginning of the next line. We stick to the C convention
|
|
of marking 'end' indices exclusive (pointing just past
|
|
the end of the array):
|
|
|
|
{{< codelines "C++" "compiler/13/parse_driver.cpp" 34 37 >}}
|
|
|
|
Since `line_offsets` has as many elements as there are lines,
|
|
the last line number would be equal to the vector's size.
|
|
When looking up the end of the last line, we can't look for
|
|
the beginning of the next line, so instead we return the end of the file.
|
|
|
|
Next, the `print_location` method prints three sections
|
|
of the source file. These are the text "before" the error,
|
|
the error itself, and, finally, the text "after" the error.
|
|
For example, if an error began on the fifth column of the third
|
|
line, and ended on the eighth column of the fourth line, the
|
|
"before" section would include the first four columns of the third
|
|
line, and the "after" section would be the ninth column onward
|
|
on the fourth line. Before and after the error itself,
|
|
if the `highlight` argument is true,
|
|
we sprinkle the ANSI escape codes to enable and disable
|
|
special formatting, respectively. For now, the special
|
|
formatting involves underlining the text and making it red.
|
|
|
|
{{< codelines "C++" "compiler/13/parse_driver.cpp" 39 53 >}}
|
|
|
|
Finally, to get the forward declarations for the `yy*` functions
|
|
and types, we set the `header-file` option in Flex:
|
|
|
|
{{< codelines "C++" "compiler/13/scanner.l" 3 3 >}}
|
|
|
|
We also include this `scanner.hpp` file in our `parse_driver.cpp`:
|
|
|
|
{{< codelines "C++" "compiler/13/parse_driver.cpp" 2 2 >}}
|
|
|
|
#### Adding the Driver to Flex and Bison
|
|
Bison's C++ language template generates a class called
|
|
`yy::parser`. We don't really want to modify this class
|
|
in any way: not only is it generated code, but it's
|
|
also rather complex. Instead, Bison provides us
|
|
with a mechanism to pass more data in to the parser.
|
|
This data is made available to all the actions
|
|
that the parser runs. Better yet, Bison also attempts
|
|
to pass this data on to the tokenizer, which in our
|
|
case would mean that whatever data we provide Bison
|
|
will also be available to Flex. This is how we'll
|
|
allow the two components to access our new `parse_driver`
|
|
class. This is also how we'll pass in the `yyscan_t`
|
|
that Flex now needs to run its tokenizing code. To
|
|
do all this, we use Bison's `%param` option. I'm
|
|
going to include a few more lines from `parser.y`,
|
|
since they contain the necessary `#include` directives
|
|
and a required type definition:
|
|
|
|
{{< codelines "C++" "compiler/13/parser.y" 1 18 >}}
|
|
|
|
The `%param` option effectively adds the parameter listed
|
|
between the curly braces to the constructor of the generated
|
|
`yy::parser`. We've already seen this in the implementation
|
|
of our driver, where we passed `scanner` and `*this` as
|
|
arguments when creating the parser. The parameters we declare are also passed to the
|
|
`yylex` function, which is expected to accept them in the same order.
|
|
|
|
Since we're adding `parse_driver` as an argument we have to
|
|
declare it. However, we can't include the `parse_driver` header
|
|
right away because `parse_driver` itself includes the `parser` header:
|
|
we'd end up with a circular dependency. Instead, we resort to
|
|
forward-declaring the driver class, as well as the `yyscan_t`
|
|
structure containing Flex's state.
|
|
|
|
Adding a parameter to Bison doesn't automatically affect
|
|
Flex. To let Flex know that its `yylex` function must now accept
|
|
the state and the parse driver, we have to define the
|
|
`YY_DECL` macro. We do this in `parse_driver.hpp`, since
|
|
this forward declaration will be used by both Flex
|
|
and Bison:
|
|
|
|
{{< codelines "C++" "compiler/13/parse_driver.hpp" 39 41 >}}
|
|
|
|
Finally, we can change our `main.cpp` file to use the
|
|
`parse_driver`:
|
|
|
|
{{< codelines "C++" "compiler/13/main.cpp" 178 186 >}}
|
|
|
|
#### Improving Exceptions
|
|
Now, it's time to add location data (and a little bit more) to our
|
|
exceptions. We want to make it possible for exceptions to include
|
|
data about where the error occurred, and to print this data to the user.
|
|
However, it's also possible for us to have exceptions that simply
|
|
do not have that location data. Furthermore, we want to know
|
|
whether or not an exception has an associated location; we'd
|
|
rather not print an invalid or "default" location when an error
|
|
occurs.
|
|
|
|
In the old days of programming, we could represent the absence
|
|
of location data with a `nullptr`, or `NULL`. But not only
|
|
does this approach expose us to all kind of `NULl`-safety
|
|
bugs, but it also requires heap allocation! This doesn't
|
|
make it sound all that appealing; instead, I think we should
|
|
opt for using `std::optional`.
|
|
|
|
Though `std::optional` is standard (as may be obvious from its
|
|
namespace), it's a rather recent addition to the C++ STL.
|
|
In order to gain access to it, we need to ensure that our
|
|
project is compiled using C++17. To this end, we add
|
|
the following two lines to our CMakeLists.txt:
|
|
|
|
{{< codelines "CMake" "compiler/13/CMakeLists.txt" 5 6 >}}
|
|
|
|
Now, let's add a new base class for all of our compiler errors,
|
|
unsurprisingly called `compiler_error`:
|
|
|
|
{{< codelines "C++" "compiler/13/error.hpp" 8 23 >}}
|
|
|
|
We'll put some 'common' exception functionality
|
|
into the `print_location` and `print_about` methods. If the error
|
|
has an associated location, the former method will print that
|
|
location to the screen. We don't always want to highlight
|
|
the part of the code that caused the error: for instance,
|
|
an invalid data type definition may span several lines,
|
|
and coloring that whole section of text red would be
|
|
too much. To address this, we add the `highlight`
|
|
boolean argument, which can be used to switch the
|
|
colors on and off. The `print_about` method
|
|
will simply print the `what()` message of the exception,
|
|
in addition to the "specific" error that occurred (stored
|
|
in `description`). Here are the implementations of the
|
|
functions:
|
|
|
|
{{< codelines "C++" "compiler/13/error.cpp" 3 16 >}}
|
|
|
|
We will also add a `pretty_print` method to all of
|
|
our exceptions. This method will handle
|
|
all the exception-specific printing logic.
|
|
For the generic compiler error, this means
|
|
simply printing out the error text and the location:
|
|
|
|
{{< codelines "C++" "compiler/13/error.cpp" 18 21 >}}
|
|
|
|
For `type_error`, this logic slightly changes,
|
|
enabling colors when printing the location:
|
|
|
|
{{< codelines "C++" "compiler/13/error.cpp" 27 30 >}}
|
|
|
|
Finally, for `unification_error`, we also include
|
|
the code to print out the two types that our
|
|
compiler could not unify:
|
|
|
|
{{< codelines "C++" "compiler/13/error.cpp" 32 41 >}}
|
|
|
|
There's a subtle change here. Compared to the previous
|
|
type-printing code (which we had in `main`), what
|
|
we wrote here deals with "expected" and "actual" types.
|
|
The `left` type passed to the exception is printed
|
|
first, and is treat like the "correct" type. The
|
|
`right` type, on the other hand, is treated
|
|
like the "wrong" type that should have been
|
|
unifiable with `left`. This will affect the
|
|
calling conventions of our unification code. In
|
|
`main`, we remove all our old exception printing code
|
|
in favor of calls to `pretty_print`:
|
|
|
|
{{< codelines "C++" "compiler/13/main.cpp" 207 213 >}}
|
|
|
|
Now, we can go through and find all the places where
|
|
we `throw 0`. One such place was in the data type
|
|
definition code, where declaring the same type parameter
|
|
twice is invalid. We replace the `0` with a
|
|
`compiler_error`:
|
|
|
|
{{< codelines "C++" "compiler/13/definition.cpp" 66 69 >}}
|
|
|
|
Not all `throw 0` statements should become exceptions.
|
|
For example, here's code from the previous version of
|
|
the compiler:
|
|
|
|
{{< codelines "C++" "compiler/12/definition.cpp" 123 127 >}}
|
|
|
|
If a definition `def_defn` has a dependency on a "nearby" (declared
|
|
in the same group) definition called `dependency`, and if
|
|
`dependency` does not exist within the same definition group,
|
|
we throw an exception. But this error is impossible
|
|
for a user to trigger: the only reason for a variable to appear
|
|
in the `nearby_variables` vector is that it was previously
|
|
found in the definition group. Here's the code that proves this
|
|
(from the current version of the compiler):
|
|
|
|
{{< codelines "C++" "compiler/13/definition.cpp" 102 106 >}}
|
|
|
|
Not being able to find the variable in the definition group
|
|
is a compiler bug, and should never occur. So, instead
|
|
of throwing an exception, we'll use an assertion:
|
|
|
|
{{< codelines "C++" "compiler/13/definition.cpp" 128 128 >}}
|
|
|
|
For more complicated error messages, we can use a `stringstream`.
|
|
Here's an example from `parsed_type`:
|
|
|
|
{{< codelines "C++" "compiler/13/parsed_type.cpp" 16 23 >}}
|
|
|
|
In general, this change is also rather mechanical, but, to
|
|
maintain a balance between exceptions and assertions, here
|
|
are a couple more assertions from `type_env`:
|
|
|
|
{{< codelines "C++" "compiler/13/type_env.cpp" 77 78 >}}
|
|
|
|
Once again, it should not be possible for the compiler
|
|
to try generalize the type of a variable that doesn't
|
|
exist, and nor should generalization occur twice.
|
|
|
|
While we're on the topic of types, let's talk about
|
|
`type_mgr::unify`. In practice, I suspect that a lot of
|
|
errors in our compiler will originate from this method.
|
|
However, at present, this method does not in any way
|
|
track the locations of where a unification error occurred.
|
|
To fix this, we add a new `loc` parameter to `unify`,
|
|
which we make optional to allow for unification without
|
|
a known location. Here's the declaration:
|
|
|
|
{{< codelines "C++" "compiler/13/type.hpp" 101 101 >}}
|
|
|
|
The change to the implementation is mechanical and repetitive,
|
|
so instead of showing you the whole method, I'll settle for
|
|
a couple of lines:
|
|
|
|
{{< codelines "C++" "compiler/13/type.cpp" 119 121 >}}
|
|
|
|
We want to make sure that a location provided to the
|
|
top-level call to `unify` is also forwarded to the
|
|
recursive calls, so we have to explicitly add it
|
|
to the call.
|
|
|
|
With all of that done, we can finally stand back and
|
|
marvel at the results of our hard work. Here is what a
|
|
basic unification error looks like now:
|
|
|
|
{{< figure src="unification_error.png" caption="The result of a unification error." >}}
|
|
|
|
I used an image to show colors, but here is the content of the error in textual form:
|
|
|
|
```
|
|
an error occured while checking the types of the program: failed to unify types
|
|
occuring on line 2:
|
|
3 + False
|
|
the expected type was:
|
|
!Int
|
|
while the actual type was:
|
|
!Bool
|
|
```
|
|
|
|
The exclamation marks in front of the two types are due to some
|
|
changes from section 2. Here's an error that was previously
|
|
a `throw 0` statement in our code:
|
|
|
|
```
|
|
an error occured while compiling the program: type variable a used twice in data type definition.
|
|
occuring on line 1:
|
|
data Pair a a = { MkPair a a }
|
|
```
|
|
|
|
Now, not only have we eliminated the lazy uses of `throw 0` in our
|
|
code, but we've also improved the presentation of the errors
|
|
to the user!
|