2019-08-26 18:52:25 -07:00
|
|
|
---
|
|
|
|
title: Compiling a Functional Language Using C++, Part 4 - Small Improvements
|
|
|
|
date: 2019-08-06T14:26:38-07:00
|
|
|
|
tags: ["C and C++", "Functional Languages", "Compilers"]
|
|
|
|
---
|
|
|
|
We've done quite a big push in the previous post. We defined
|
|
|
|
type rules for our language, implemented unification,
|
|
|
|
and then implemented unification to enforce these rules for
|
|
|
|
our program. The post was pretty long, and even then we
|
|
|
|
weren't able to fit quite everything into it.
|
|
|
|
|
|
|
|
For instance, we threw 0 whenever an error occured. This
|
|
|
|
gives us no indication of what actually went wrong. We should
|
|
|
|
probably define an exception class, one that can contain
|
|
|
|
information about the error, and report it to the user.
|
|
|
|
|
|
|
|
Also, when there's no error, our compiler doesn't
|
|
|
|
really tell us anything at all about the code besides
|
|
|
|
the number of definitions. We probably want to see the types
|
|
|
|
of these definitions, or at least some intermediate information.
|
|
|
|
At the very least, we want to have the __ability__ to see
|
|
|
|
this information.
|
|
|
|
|
|
|
|
Finally, we have no build system. We are creating more
|
|
|
|
and more source files, and so far (unless you've taken
|
|
|
|
initiative), we've been compiling them by hand. We want
|
|
|
|
to only compile source files that have changed,
|
|
|
|
and we want to have a standard definition of how to
|
|
|
|
build our program.
|
|
|
|
|
2019-08-26 21:05:44 -07:00
|
|
|
### Printing Syntax Trees
|
|
|
|
Let's start by printing the trees we get from our parser.
|
|
|
|
This is long overdue - we had no way to verify the structure
|
|
|
|
of what our parser returned to us since Part 2. We'll print
|
|
|
|
the trees top-down, with the children of a node
|
|
|
|
indent one block further than the node itself. For this,
|
|
|
|
we'll make a new virtual function with the signature:
|
|
|
|
```
|
|
|
|
virtual void print(int indent, std::ostream& to) const;
|
|
|
|
```
|
|
|
|
We'll include a similar printing function into our
|
|
|
|
pattern struct, too:
|
|
|
|
```
|
|
|
|
virtual void print(std::ostream& to) const;
|
|
|
|
```
|
|
|
|
|
|
|
|
Let's take a look at the implementation. For `ast_int`,
|
|
|
|
`ast_lid`, and `ast_uid`:
|
2019-08-28 15:34:13 -07:00
|
|
|
{{< codelines "C++" "compiler/04/ast.cpp" 19 22 >}}
|
|
|
|
{{< codelines "C++" "compiler/04/ast.cpp" 28 31 >}}
|
|
|
|
{{< codelines "C++" "compiler/04/ast.cpp" 37 40 >}}
|
2019-08-26 21:05:44 -07:00
|
|
|
|
|
|
|
With `ast_binop` things get a bit more interesting.
|
|
|
|
We call `print` recursively on the children of the
|
|
|
|
`binop` node:
|
2019-08-28 15:34:13 -07:00
|
|
|
{{< codelines "C++" "compiler/04/ast.cpp" 46 51 >}}
|
2019-08-26 21:05:44 -07:00
|
|
|
|
|
|
|
The same idea for `ast_app`:
|
2019-08-28 15:34:13 -07:00
|
|
|
{{< codelines "C++" "compiler/04/ast.cpp" 67 72 >}}
|
2019-08-26 21:05:44 -07:00
|
|
|
|
|
|
|
Finally, just like `ast_case::typecheck` called
|
|
|
|
`pattern::match`, `ast_case::print` calls `pattern::print`:
|
2019-08-28 15:34:13 -07:00
|
|
|
{{< codelines "C++" "compiler/04/ast.cpp" 84 93 >}}
|
2019-08-26 21:05:44 -07:00
|
|
|
|
|
|
|
We follow the same implementation strategy for patterns,
|
|
|
|
but we don't need indentation, or recursion:
|
2019-10-10 17:59:44 -07:00
|
|
|
{{< codelines "C++" "compiler/04/ast.cpp" 115 117 >}}
|
|
|
|
{{< codelines "C++" "compiler/04/ast.cpp" 123 128 >}}
|
2019-08-26 21:05:44 -07:00
|
|
|
|
2019-08-28 15:34:13 -07:00
|
|
|
In `main`, let's print the bodies of each function we receive from the parser:
|
|
|
|
{{< codelines "C++" "compiler/04/main.cpp" 47 56 >}}
|
2019-08-26 21:05:44 -07:00
|
|
|
|
|
|
|
### Printing Types
|
|
|
|
Types are another thing that we want to be able to inspect, so let's
|
|
|
|
add a similar print method to them:
|
|
|
|
```
|
|
|
|
virtual void print(const type_mgr& mgr, std::ostream& to) const;
|
|
|
|
```
|
|
|
|
We need the type manager so we can follow substitutions.
|
|
|
|
The implementation is simple enough:
|
2019-08-28 15:34:13 -07:00
|
|
|
{{< codelines "C++" "compiler/04/type.cpp" 6 24 >}}
|
2019-08-26 21:05:44 -07:00
|
|
|
|
|
|
|
Let's also print out the types we infer. We'll make it a separate loop
|
2019-08-27 00:45:51 -07:00
|
|
|
at the bottom of the `typecheck_program` function, because it's mostly just
|
2019-08-27 00:26:02 -07:00
|
|
|
for debugging purposes:
|
|
|
|
{{< codelines "C++" "compiler/04/main.cpp" 34 38 >}}
|
2019-08-26 21:05:44 -07:00
|
|
|
|
|
|
|
### Fixing Bugs
|
|
|
|
We actually discover not one, but two bugs in our implementation thanks
|
2019-08-27 00:26:02 -07:00
|
|
|
to the output we get from printing trees and types.
|
|
|
|
Observe the output for `works3.txt`:
|
2019-08-26 21:05:44 -07:00
|
|
|
```
|
|
|
|
length l:
|
|
|
|
CASE:
|
|
|
|
Nil
|
|
|
|
INT: 0
|
|
|
|
*: Int -> (Int -> (Int))
|
|
|
|
+: Int -> (Int -> (Int))
|
|
|
|
-: Int -> (Int -> (Int))
|
|
|
|
/: Int -> (Int -> (Int))
|
|
|
|
Cons: List -> (Int -> (List))
|
|
|
|
Nil: List
|
|
|
|
length: List -> (Int)
|
|
|
|
2
|
|
|
|
```
|
|
|
|
|
|
|
|
First, we're missing the `Cons` branch. The culprit is `parser.y`, specifically
|
|
|
|
this line:
|
|
|
|
```C++
|
|
|
|
: branches branch { $$ = std::move($1); $1.push_back(std::move($2)); }
|
|
|
|
```
|
|
|
|
Notice that we move our list of branches out of `$1`. However, when we
|
|
|
|
`push_back`, we use `$1` again. That's wrong! We need to `push_back`
|
|
|
|
to `$$` instead:
|
|
|
|
{{< codelines "C++" "compiler/04/parser.y" 110 110 >}}
|
|
|
|
|
|
|
|
Next, observe that `Cons` has type `List -> Int -> List`. That's not right,
|
|
|
|
since `Int` comes first in our definition. The culprit is this fragment of code:
|
|
|
|
```C++
|
|
|
|
for(auto& type_name : constructor->types) {
|
|
|
|
type_ptr type = type_ptr(new type_base(type_name));
|
|
|
|
full_type = type_ptr(new type_arr(type, full_type));
|
|
|
|
}
|
|
|
|
```
|
|
|
|
Remember how we build the function type backwards in Part 3? We have to do the same here.
|
|
|
|
We replace the fragment with the proper reverse iteration:
|
|
|
|
{{< codelines "C++" "compiler/04/definition.cpp" 37 40 >}}
|
|
|
|
|
2019-08-28 15:34:13 -07:00
|
|
|
### Throwing Exceptions
|
|
|
|
Throwing 0 is never a good idea. Such an exception doesn't contain any information
|
|
|
|
that we may find useful in debugging, nor any information that would benefit
|
|
|
|
the users of the compiler. Instead, let's define our own exception classes,
|
|
|
|
and throw them instead. We'll make two:
|
|
|
|
{{< codeblock "C++" "compiler/04/error.hpp" >}}
|
|
|
|
|
|
|
|
Only one function needs to be implemented, and it's pretty boring:
|
|
|
|
{{< codeblock "C++" "compiler/04/error.cpp" >}}
|
|
|
|
|
|
|
|
It's time to throw these instead of 0. Let's take a look at the places
|
|
|
|
we do so.
|
|
|
|
|
|
|
|
First, we throw 0 in `type.cpp`, in the `type_mgr::unify` method. This is
|
|
|
|
where our `unification_error` comes in. The error will
|
|
|
|
contain the two types that we failed to unify, which we will
|
|
|
|
later report to the user:
|
|
|
|
{{< codelines "C++" "compiler/04/type.cpp" 91 91 >}}
|
|
|
|
|
|
|
|
Next up, we have a few throws in `ast.cpp`. The first is in `op_string`, but
|
|
|
|
we will simply replace it with `return "??"`, which will be caught later on
|
|
|
|
(either way, the case expression falling through would be a compiler bug,
|
|
|
|
since the user has no way of providing an invalid binary operator). The
|
|
|
|
first throw we need to address is in `ast_binop::typecheck`, in the case
|
|
|
|
that we don't find a type for a binary operator. We report this
|
|
|
|
directly:
|
|
|
|
{{< codelines "C++" "compiler/04/ast.cpp" 57 57 >}}
|
|
|
|
|
|
|
|
We will introduce a new exception into `ast_case::typecheck`. Previously,
|
|
|
|
we simply pass the type of the expression to be case analyzed into
|
|
|
|
the pattern matching method. However, since we don't want
|
|
|
|
case analysis on functions, we ensure that the type of the expression
|
|
|
|
is `type_base`. If not, we report this:
|
2019-10-10 17:59:44 -07:00
|
|
|
{{< codelines "C++" "compiler/04/ast.cpp" 107 110 >}}
|
2019-08-28 15:34:13 -07:00
|
|
|
|
|
|
|
The next exception is in `pattern_constr::match`. It occurs
|
|
|
|
when the pattern has a constructor we don't recognize, and
|
|
|
|
that's exactly what we report:
|
2019-10-10 17:59:44 -07:00
|
|
|
{{< codelines "C++" "compiler/04/ast.cpp" 132 134 >}}
|
2019-08-28 15:34:13 -07:00
|
|
|
|
|
|
|
The next exception occurs in a loop, when we bind
|
|
|
|
types for each of the constructor pattern's variables.
|
|
|
|
We throw when we are unable to cast the remaining
|
|
|
|
constructor type to a `type_arr`. Conceptually,
|
|
|
|
this means that the pattern wants to apply the
|
|
|
|
constructor to more parameters than it actually
|
|
|
|
takes:
|
2019-10-10 17:59:44 -07:00
|
|
|
{{< codelines "C++" "compiler/04/ast.cpp" 138 138 >}}
|
2019-08-28 15:34:13 -07:00
|
|
|
|
|
|
|
We remove the last throw at the bottom of `pattern_constr::match`.
|
|
|
|
This is because once unification succeeds, we know
|
|
|
|
that the return type of the pattern is a base type since
|
|
|
|
we know the type of the case expression is a base type
|
|
|
|
(we know this because we added that check to `ast_case::typecheck`).
|
|
|
|
|
|
|
|
Finally, let's catch and report these exceptions. We could do it
|
|
|
|
in `typecheck_program`, but I think doing so in `main` is neater.
|
|
|
|
Since printing types requires a `type_mgr`, we'll move the
|
|
|
|
declarations of both `type_mgr` and `type_env` to the top of
|
|
|
|
main, and pass them to `typecheck_program` as parameters. Then,
|
|
|
|
we can surround the call to `typecheck_program` with
|
|
|
|
try/catch:
|
|
|
|
{{< codelines "C++" "compiler/04/main.cpp" 57 69 >}}
|
|
|
|
|
|
|
|
We use some [ANSI escape codes](https://en.wikipedia.org/wiki/ANSI_escape_code)
|
|
|
|
to color the types in the case of a unification error.
|
2019-08-26 21:05:44 -07:00
|
|
|
|
2019-08-26 18:52:25 -07:00
|
|
|
### Setting up CMake
|
2019-08-27 00:26:02 -07:00
|
|
|
We will set up CMake as our build system. This would be extremely easy
|
|
|
|
if not for Flex and Bison, but it's not hard either way. We start with the usual:
|
2019-08-26 18:52:25 -07:00
|
|
|
{{< codelines "CMake" "compiler/04/CMakeLists.txt" 1 2 >}}
|
|
|
|
|
|
|
|
Next, we want to set up Flex and Bison. CMake provides two commands for this:
|
|
|
|
{{< codelines "CMake" "compiler/04/CMakeLists.txt" 4 5 >}}
|
|
|
|
|
|
|
|
We now have access to commands that allow us to tell CMake about our parser
|
|
|
|
and tokenizer (or scanner). We use them as follows:
|
|
|
|
{{< codelines "CMake" "compiler/04/CMakeLists.txt" 6 12 >}}
|
|
|
|
|
|
|
|
We also want CMake to know that the scanner needs to parser's header file
|
|
|
|
in order to compile. We add this dependency:
|
|
|
|
{{< codelines "CMake" "compiler/04/CMakeLists.txt" 13 13 >}}
|
|
|
|
|
|
|
|
Finally, we add our source code to a CMake target. We use
|
|
|
|
the `BISON_parser_OUTPUTS` and `FLEX_scanner_OUTPUTS` to
|
|
|
|
pass in the source files generated by Flex and Bison.
|
2019-08-28 15:34:13 -07:00
|
|
|
{{< codelines "CMake" "compiler/04/CMakeLists.txt" 15 23 >}}
|
2019-08-26 18:52:25 -07:00
|
|
|
|
|
|
|
Almost there! `parser.cpp` will be generated in the `build` directory
|
|
|
|
during an out-of-source build, and so will `parser.hpp`. When building,
|
|
|
|
`parser.cpp` will try to look for `ast.hpp`, and `main.cpp` will look for
|
|
|
|
`parser.hpp`. We want them to be able to find each other, so we
|
|
|
|
add both the source directory and the build (binary) directory to
|
2019-08-28 15:34:13 -07:00
|
|
|
the list of include directories:
|
2019-08-26 18:52:25 -07:00
|
|
|
|
2019-08-28 15:34:13 -07:00
|
|
|
{{< codelines "CMake" "compiler/04/CMakeLists.txt" 24 25 >}}
|
2019-08-26 18:52:25 -07:00
|
|
|
|
|
|
|
That's it for CMake! Let's try our build:
|
|
|
|
```
|
|
|
|
cmake -S . -B build
|
|
|
|
cd build && make -j8
|
|
|
|
```
|
|
|
|
|
2019-08-28 15:34:13 -07:00
|
|
|
### Updated Code
|
|
|
|
We've made a lot of changes to the codebase, and I've only shown snippets of the code
|
|
|
|
so far. If you'de like to see the whole codebase, you can go to my site's git repository
|
|
|
|
and check out [the code so far](https://dev.danilafe.com/Web-Projects/blog-static/src/branch/master/code/compiler/04).
|
2019-09-04 01:55:48 -07:00
|
|
|
|
|
|
|
Having taken this little break, it's time for our next push. We will define
|
|
|
|
how our programs will be evaluated in [Part 5 - Execution]({{< relref "05_compiler_execution.md" >}}).
|