Finish the draft of the parsing post

This commit is contained in:
Danila Fedorin 2019-08-05 00:39:54 -07:00
parent f6c6a2be28
commit 43a72533f5

View File

@ -233,3 +233,73 @@ another expression.
Finally, we get to writing our Bison file, `parser.y`. Here's what I come up with:
{{< rawblock "compiler_parser.y" >}}
There's a few things to note here. First of all, the __parser__ is the "source of truth" regarding what tokens exist in our language.
We have a list of `%token` declarations, each of which corresponds to a regular expression in our scanner.
Next, observe that there's
a certain symmetry between our parser and our scanner. In our scanner, we mixed the theoretical idea of a regular expression
with an __action__, a C++ code snippet to be executed when a regular expression is matched. This same idea is present
in the parser, too. Each rule can produce a value, which we call a __semantic value__. For type safety, we allow
each nonterminal and terminal to produce only one type of semantic value. For instance, all rules for \\(A_{add}\\) must
produce an expression. We specify the type of each nonterminal and using `%type` directives. The types of terminals
are specified when they're declared.
Next, we must recognize that Bison was originally made for C, rather than C++. In order to allow the parser
to store and operate on semantic values of various types, the canonical solution back in those times was to
use a C `union`. Unions are great, but for C++, they're more trouble than they're worth: unions don't
allow for non-trivial constructors! This means that stuff like `std::unique_ptr` and `std::string` is off limits as
a semantic value. But we'd really much rather use them! The solution is to:
1. Specify the language to be C++, rather than C.
2. Enable the `variant` API feature, which uses a lightweight `std::variant` alternative in place of a union.
3. Enable the creation of token constructors, which we will use in Flex.
In order to be able to use the variant-based API, we also need to change the Flex `yylex` function
to return `yy::parser::symbol_type`. You can see it in our forward declaration of `yylex`.
Now that we made these changes, it's time to hook up Flex to all this. Here's a new version
of the Flex scanner, with all necessary modifications:
{{< rawblock "compiler_scanner_bison.l" >}}
The key two ideas are that we overrode the default signature of `yylex` by changing the
`YY_DECL` preprocessor variable, and used the `yy::parser::make_<TOKEN>` functions
to return the `symbol_type` rather than `int`.
Finally, let's get a main function so that we can at least check for segmentation faults
and other obvious mistakes:
{{< codeblock "C++" "compiler_main.cpp" >}}
Now, we can compile and run the code:
```
flex -o compiler_scanner.cpp compiler_scanner_bison.l
bison -o compiler_parser.cpp -d compiler_parser.y
g++ -c -o scanner.o compiler_scanner.cpp
g++ -c -o parser.o compiler_parser.cpp
g++ compiler_main.cpp parser.o scanner.o
```
We used the `-d` option for Bison to generate the `compiler_parser.hpp` header file,
which exports our token declarations and token creation functions, allowing
us to use them in Flex.
At last, we can feed some code to the parser from `stdin`. Let's try it:
```
./a.out
defn main = { add 320 6 }
defn add x y = { x + y }
```
The program prints `2`, indicating two declarations were made. Let's try something obviously
wrong:
```
./a.out
}{
```
We are told an error occured. Excellent!
There's still a number of flaws with our parser:
2. We don't print errors properly.
3. We also have no way of verifying our tree was built correctly.
1. We're missing the data declaration, from both our C++ source and from the Bison grammars.
This post is getting a little long, so we will revisit the parser in the next one. See you then!