2019-08-26 18:52:25 -07:00
|
|
|
---
|
|
|
|
title: Compiling a Functional Language Using C++, Part 4 - Small Improvements
|
|
|
|
date: 2019-08-06T14:26:38-07:00
|
|
|
|
draft: true
|
|
|
|
tags: ["C and C++", "Functional Languages", "Compilers"]
|
|
|
|
---
|
|
|
|
We've done quite a big push in the previous post. We defined
|
|
|
|
type rules for our language, implemented unification,
|
|
|
|
and then implemented unification to enforce these rules for
|
|
|
|
our program. The post was pretty long, and even then we
|
|
|
|
weren't able to fit quite everything into it.
|
|
|
|
|
|
|
|
For instance, we threw 0 whenever an error occured. This
|
|
|
|
gives us no indication of what actually went wrong. We should
|
|
|
|
probably define an exception class, one that can contain
|
|
|
|
information about the error, and report it to the user.
|
|
|
|
|
|
|
|
Also, when there's no error, our compiler doesn't
|
|
|
|
really tell us anything at all about the code besides
|
|
|
|
the number of definitions. We probably want to see the types
|
|
|
|
of these definitions, or at least some intermediate information.
|
|
|
|
At the very least, we want to have the __ability__ to see
|
|
|
|
this information.
|
|
|
|
|
|
|
|
Finally, we have no build system. We are creating more
|
|
|
|
and more source files, and so far (unless you've taken
|
|
|
|
initiative), we've been compiling them by hand. We want
|
|
|
|
to only compile source files that have changed,
|
|
|
|
and we want to have a standard definition of how to
|
|
|
|
build our program.
|
|
|
|
|
2019-08-26 21:05:44 -07:00
|
|
|
### Printing Syntax Trees
|
|
|
|
Let's start by printing the trees we get from our parser.
|
|
|
|
This is long overdue - we had no way to verify the structure
|
|
|
|
of what our parser returned to us since Part 2. We'll print
|
|
|
|
the trees top-down, with the children of a node
|
|
|
|
indent one block further than the node itself. For this,
|
|
|
|
we'll make a new virtual function with the signature:
|
|
|
|
```
|
|
|
|
virtual void print(int indent, std::ostream& to) const;
|
|
|
|
```
|
|
|
|
We'll include a similar printing function into our
|
|
|
|
pattern struct, too:
|
|
|
|
```
|
|
|
|
virtual void print(std::ostream& to) const;
|
|
|
|
```
|
|
|
|
|
|
|
|
Let's take a look at the implementation. For `ast_int`,
|
|
|
|
`ast_lid`, and `ast_uid`:
|
|
|
|
{{< codelines "C++" "compiler/04/ast.cpp" 18 21 >}}
|
|
|
|
{{< codelines "C++" "compiler/04/ast.cpp" 27 30 >}}
|
|
|
|
{{< codelines "C++" "compiler/04/ast.cpp" 36 39 >}}
|
|
|
|
|
|
|
|
With `ast_binop` things get a bit more interesting.
|
|
|
|
We call `print` recursively on the children of the
|
|
|
|
`binop` node:
|
|
|
|
{{< codelines "C++" "compiler/04/ast.cpp" 45 50 >}}
|
|
|
|
|
|
|
|
The same idea for `ast_app`:
|
|
|
|
{{< codelines "C++" "compiler/04/ast.cpp" 66 71 >}}
|
|
|
|
|
|
|
|
Finally, just like `ast_case::typecheck` called
|
|
|
|
`pattern::match`, `ast_case::print` calls `pattern::print`:
|
|
|
|
{{< codelines "C++" "compiler/04/ast.cpp" 83 92 >}}
|
|
|
|
|
|
|
|
We follow the same implementation strategy for patterns,
|
|
|
|
but we don't need indentation, or recursion:
|
|
|
|
{{< codelines "C++" "compiler/04/ast.cpp" 108 110 >}}
|
|
|
|
{{< codelines "C++" "compiler/04/ast.cpp" 116 121 >}}
|
|
|
|
|
|
|
|
Let's print the bodies of each function we receive from the parser:
|
|
|
|
{{< codelines "C++" "compiler/04/main.cpp" 35 50 >}}
|
|
|
|
|
|
|
|
### Printing Types
|
|
|
|
Types are another thing that we want to be able to inspect, so let's
|
|
|
|
add a similar print method to them:
|
|
|
|
```
|
|
|
|
virtual void print(const type_mgr& mgr, std::ostream& to) const;
|
|
|
|
```
|
|
|
|
We need the type manager so we can follow substitutions.
|
|
|
|
The implementation is simple enough:
|
|
|
|
{{< codelines "C++" "compiler/04/type.cpp" 5 24 >}}
|
|
|
|
|
|
|
|
Let's also print out the types we infer. We'll make it a separate loop
|
|
|
|
in the `typecheck_program` function, because it's mostly just
|
|
|
|
for debugging purposes.
|
|
|
|
|
|
|
|
### Fixing Bugs
|
|
|
|
We actually discover not one, but two bugs in our implementation thanks
|
|
|
|
to this output. Observe the output for `works3.txt`:
|
|
|
|
```
|
|
|
|
length l:
|
|
|
|
CASE:
|
|
|
|
Nil
|
|
|
|
INT: 0
|
|
|
|
*: Int -> (Int -> (Int))
|
|
|
|
+: Int -> (Int -> (Int))
|
|
|
|
-: Int -> (Int -> (Int))
|
|
|
|
/: Int -> (Int -> (Int))
|
|
|
|
Cons: List -> (Int -> (List))
|
|
|
|
Nil: List
|
|
|
|
length: List -> (Int)
|
|
|
|
2
|
|
|
|
```
|
|
|
|
|
|
|
|
First, we're missing the `Cons` branch. The culprit is `parser.y`, specifically
|
|
|
|
this line:
|
|
|
|
```C++
|
|
|
|
: branches branch { $$ = std::move($1); $1.push_back(std::move($2)); }
|
|
|
|
```
|
|
|
|
Notice that we move our list of branches out of `$1`. However, when we
|
|
|
|
`push_back`, we use `$1` again. That's wrong! We need to `push_back`
|
|
|
|
to `$$` instead:
|
|
|
|
{{< codelines "C++" "compiler/04/parser.y" 110 110 >}}
|
|
|
|
|
|
|
|
Next, observe that `Cons` has type `List -> Int -> List`. That's not right,
|
|
|
|
since `Int` comes first in our definition. The culprit is this fragment of code:
|
|
|
|
```C++
|
|
|
|
for(auto& type_name : constructor->types) {
|
|
|
|
type_ptr type = type_ptr(new type_base(type_name));
|
|
|
|
full_type = type_ptr(new type_arr(type, full_type));
|
|
|
|
}
|
|
|
|
```
|
|
|
|
Remember how we build the function type backwards in Part 3? We have to do the same here.
|
|
|
|
We replace the fragment with the proper reverse iteration:
|
|
|
|
{{< codelines "C++" "compiler/04/definition.cpp" 37 40 >}}
|
|
|
|
|
|
|
|
|
2019-08-26 18:52:25 -07:00
|
|
|
### Setting up CMake
|
|
|
|
This would be extremely easy if not for Flex and Bison. We start with the usual:
|
|
|
|
{{< codelines "CMake" "compiler/04/CMakeLists.txt" 1 2 >}}
|
|
|
|
|
|
|
|
Next, we want to set up Flex and Bison. CMake provides two commands for this:
|
|
|
|
{{< codelines "CMake" "compiler/04/CMakeLists.txt" 4 5 >}}
|
|
|
|
|
|
|
|
We now have access to commands that allow us to tell CMake about our parser
|
|
|
|
and tokenizer (or scanner). We use them as follows:
|
|
|
|
{{< codelines "CMake" "compiler/04/CMakeLists.txt" 6 12 >}}
|
|
|
|
|
|
|
|
We also want CMake to know that the scanner needs to parser's header file
|
|
|
|
in order to compile. We add this dependency:
|
|
|
|
{{< codelines "CMake" "compiler/04/CMakeLists.txt" 13 13 >}}
|
|
|
|
|
|
|
|
Finally, we add our source code to a CMake target. We use
|
|
|
|
the `BISON_parser_OUTPUTS` and `FLEX_scanner_OUTPUTS` to
|
|
|
|
pass in the source files generated by Flex and Bison.
|
|
|
|
{{< codelines "CMake" "compiler/04/CMakeLists.txt" 15 22 >}}
|
|
|
|
|
|
|
|
Almost there! `parser.cpp` will be generated in the `build` directory
|
|
|
|
during an out-of-source build, and so will `parser.hpp`. When building,
|
|
|
|
`parser.cpp` will try to look for `ast.hpp`, and `main.cpp` will look for
|
|
|
|
`parser.hpp`. We want them to be able to find each other, so we
|
|
|
|
add both the source directory and the build (binary) directory to
|
|
|
|
the list of includes directories:
|
|
|
|
|
|
|
|
{{< codelines "CMake" "compiler/04/CMakeLists.txt" 23 24 >}}
|
|
|
|
|
|
|
|
That's it for CMake! Let's try our build:
|
|
|
|
```
|
|
|
|
cmake -S . -B build
|
|
|
|
cd build && make -j8
|
|
|
|
```
|
|
|
|
|
|
|
|
We get an executable called `compiler`. Excellent! Here's the whole file:
|
|
|
|
{{< codeblock "CMake" "compiler/04/CMakeLists.txt" >}}
|