Finish the draft of the parsing post
This commit is contained in:
parent
f6c6a2be28
commit
43a72533f5
@ -233,3 +233,73 @@ another expression.
|
||||
|
||||
Finally, we get to writing our Bison file, `parser.y`. Here's what I come up with:
|
||||
{{< rawblock "compiler_parser.y" >}}
|
||||
|
||||
There's a few things to note here. First of all, the __parser__ is the "source of truth" regarding what tokens exist in our language.
|
||||
We have a list of `%token` declarations, each of which corresponds to a regular expression in our scanner.
|
||||
|
||||
Next, observe that there's
|
||||
a certain symmetry between our parser and our scanner. In our scanner, we mixed the theoretical idea of a regular expression
|
||||
with an __action__, a C++ code snippet to be executed when a regular expression is matched. This same idea is present
|
||||
in the parser, too. Each rule can produce a value, which we call a __semantic value__. For type safety, we allow
|
||||
each nonterminal and terminal to produce only one type of semantic value. For instance, all rules for \\(A_{add}\\) must
|
||||
produce an expression. We specify the type of each nonterminal and using `%type` directives. The types of terminals
|
||||
are specified when they're declared.
|
||||
|
||||
Next, we must recognize that Bison was originally made for C, rather than C++. In order to allow the parser
|
||||
to store and operate on semantic values of various types, the canonical solution back in those times was to
|
||||
use a C `union`. Unions are great, but for C++, they're more trouble than they're worth: unions don't
|
||||
allow for non-trivial constructors! This means that stuff like `std::unique_ptr` and `std::string` is off limits as
|
||||
a semantic value. But we'd really much rather use them! The solution is to:
|
||||
|
||||
1. Specify the language to be C++, rather than C.
|
||||
2. Enable the `variant` API feature, which uses a lightweight `std::variant` alternative in place of a union.
|
||||
3. Enable the creation of token constructors, which we will use in Flex.
|
||||
|
||||
In order to be able to use the variant-based API, we also need to change the Flex `yylex` function
|
||||
to return `yy::parser::symbol_type`. You can see it in our forward declaration of `yylex`.
|
||||
|
||||
Now that we made these changes, it's time to hook up Flex to all this. Here's a new version
|
||||
of the Flex scanner, with all necessary modifications:
|
||||
{{< rawblock "compiler_scanner_bison.l" >}}
|
||||
|
||||
The key two ideas are that we overrode the default signature of `yylex` by changing the
|
||||
`YY_DECL` preprocessor variable, and used the `yy::parser::make_<TOKEN>` functions
|
||||
to return the `symbol_type` rather than `int`.
|
||||
|
||||
Finally, let's get a main function so that we can at least check for segmentation faults
|
||||
and other obvious mistakes:
|
||||
{{< codeblock "C++" "compiler_main.cpp" >}}
|
||||
|
||||
Now, we can compile and run the code:
|
||||
```
|
||||
flex -o compiler_scanner.cpp compiler_scanner_bison.l
|
||||
bison -o compiler_parser.cpp -d compiler_parser.y
|
||||
g++ -c -o scanner.o compiler_scanner.cpp
|
||||
g++ -c -o parser.o compiler_parser.cpp
|
||||
g++ compiler_main.cpp parser.o scanner.o
|
||||
```
|
||||
We used the `-d` option for Bison to generate the `compiler_parser.hpp` header file,
|
||||
which exports our token declarations and token creation functions, allowing
|
||||
us to use them in Flex.
|
||||
|
||||
At last, we can feed some code to the parser from `stdin`. Let's try it:
|
||||
```
|
||||
./a.out
|
||||
defn main = { add 320 6 }
|
||||
defn add x y = { x + y }
|
||||
```
|
||||
The program prints `2`, indicating two declarations were made. Let's try something obviously
|
||||
wrong:
|
||||
```
|
||||
./a.out
|
||||
}{
|
||||
```
|
||||
We are told an error occured. Excellent!
|
||||
|
||||
There's still a number of flaws with our parser:
|
||||
|
||||
2. We don't print errors properly.
|
||||
3. We also have no way of verifying our tree was built correctly.
|
||||
1. We're missing the data declaration, from both our C++ source and from the Bison grammars.
|
||||
|
||||
This post is getting a little long, so we will revisit the parser in the next one. See you then!
|
||||
|
Loading…
Reference in New Issue
Block a user