blog-static/04_compiler_improvements.md atzarā 8ba501bd84edb4f4365bd6c3252d97cb3ba14e57

Web-Projects/blog-static

Atdalīts 0

Danila Fedorin 8ba501bd84 Add output and fix two bugs.

2019-08-26 21:05:44 -07:00

6.1 KiB

Neapstrādāts Vainot Vēsture

title

date

draft

Printing Syntax Trees

Let's start by printing the trees we get from our parser. This is long overdue - we had no way to verify the structure of what our parser returned to us since Part 2. We'll print the trees top-down, with the children of a node indent one block further than the node itself. For this, we'll make a new virtual function with the signature:

virtual void print(int indent, std::ostream& to) const;

We'll include a similar printing function into our pattern struct, too:

virtual void print(std::ostream& to) const;

Let's take a look at the implementation. For ast_int, ast_lid, and ast_uid: {{< codelines "C++" "compiler/04/ast.cpp" 18 21 >}} {{< codelines "C++" "compiler/04/ast.cpp" 27 30 >}} {{< codelines "C++" "compiler/04/ast.cpp" 36 39 >}}

With ast_binop things get a bit more interesting. We call print recursively on the children of the binop node: {{< codelines "C++" "compiler/04/ast.cpp" 45 50 >}}

The same idea for ast_app: {{< codelines "C++" "compiler/04/ast.cpp" 66 71 >}}

Finally, just like ast_case::typecheck called pattern::match, ast_case::print calls pattern::print: {{< codelines "C++" "compiler/04/ast.cpp" 83 92 >}}

We follow the same implementation strategy for patterns, but we don't need indentation, or recursion: {{< codelines "C++" "compiler/04/ast.cpp" 108 110 >}} {{< codelines "C++" "compiler/04/ast.cpp" 116 121 >}}

Let's print the bodies of each function we receive from the parser: {{< codelines "C++" "compiler/04/main.cpp" 35 50 >}}

Printing Types

Types are another thing that we want to be able to inspect, so let's add a similar print method to them:

virtual void print(const type_mgr& mgr, std::ostream& to) const;

We need the type manager so we can follow substitutions. The implementation is simple enough: {{< codelines "C++" "compiler/04/type.cpp" 5 24 >}}

Let's also print out the types we infer. We'll make it a separate loop in the typecheck_program function, because it's mostly just for debugging purposes.

Fixing Bugs

We actually discover not one, but two bugs in our implementation thanks to this output. Observe the output for works3.txt:

length l:
  CASE:
    Nil
      INT: 0
*: Int -> (Int -> (Int))
+: Int -> (Int -> (Int))
-: Int -> (Int -> (Int))
/: Int -> (Int -> (Int))
Cons: List -> (Int -> (List))
Nil: List
length: List -> (Int)
2

First, we're missing the Cons branch. The culprit is parser.y, specifically this line:

    : branches branch { $$ = std::move($1); $1.push_back(std::move($2)); }

Notice that we move our list of branches out of $1. However, when we push_back, we use $1 again. That's wrong! We need to push_back to $$ instead: {{< codelines "C++" "compiler/04/parser.y" 110 110 >}}

Next, observe that Cons has type List -> Int -> List. That's not right, since Int comes first in our definition. The culprit is this fragment of code:

        for(auto& type_name : constructor->types) {
            type_ptr type = type_ptr(new type_base(type_name));
            full_type = type_ptr(new type_arr(type, full_type));
        }

Remember how we build the function type backwards in Part 3? We have to do the same here. We replace the fragment with the proper reverse iteration: {{< codelines "C++" "compiler/04/definition.cpp" 37 40 >}}

Setting up CMake

This would be extremely easy if not for Flex and Bison. We start with the usual: {{< codelines "CMake" "compiler/04/CMakeLists.txt" 1 2 >}}

Next, we want to set up Flex and Bison. CMake provides two commands for this: {{< codelines "CMake" "compiler/04/CMakeLists.txt" 4 5 >}}

We now have access to commands that allow us to tell CMake about our parser and tokenizer (or scanner). We use them as follows: {{< codelines "CMake" "compiler/04/CMakeLists.txt" 6 12 >}}

We also want CMake to know that the scanner needs to parser's header file in order to compile. We add this dependency: {{< codelines "CMake" "compiler/04/CMakeLists.txt" 13 13 >}}

Finally, we add our source code to a CMake target. We use the BISON_parser_OUTPUTS and FLEX_scanner_OUTPUTS to pass in the source files generated by Flex and Bison. {{< codelines "CMake" "compiler/04/CMakeLists.txt" 15 22 >}}

Almost there! parser.cpp will be generated in the build directory during an out-of-source build, and so will parser.hpp. When building, parser.cpp will try to look for ast.hpp, and main.cpp will look for parser.hpp. We want them to be able to find each other, so we add both the source directory and the build (binary) directory to the list of includes directories:

That's it for CMake! Let's try our build:

cmake -S . -B build
cd build && make -j8

We get an executable called compiler. Excellent! Here's the whole file: {{< codeblock "CMake" "compiler/04/CMakeLists.txt" >}}

6.1 KiB Neapstrādāts Vainot Vēsture

Printing Syntax Trees

Printing Types

Fixing Bugs

Setting up CMake

6.1 KiB

Neapstrādāts Vainot Vēsture