Finish the draft of the parsing post
This commit is contained in:
parent
f6c6a2be28
commit
43a72533f5
|
@ -233,3 +233,73 @@ another expression.
|
||||||
|
|
||||||
Finally, we get to writing our Bison file, `parser.y`. Here's what I come up with:
|
Finally, we get to writing our Bison file, `parser.y`. Here's what I come up with:
|
||||||
{{< rawblock "compiler_parser.y" >}}
|
{{< rawblock "compiler_parser.y" >}}
|
||||||
|
|
||||||
|
There's a few things to note here. First of all, the __parser__ is the "source of truth" regarding what tokens exist in our language.
|
||||||
|
We have a list of `%token` declarations, each of which corresponds to a regular expression in our scanner.
|
||||||
|
|
||||||
|
Next, observe that there's
|
||||||
|
a certain symmetry between our parser and our scanner. In our scanner, we mixed the theoretical idea of a regular expression
|
||||||
|
with an __action__, a C++ code snippet to be executed when a regular expression is matched. This same idea is present
|
||||||
|
in the parser, too. Each rule can produce a value, which we call a __semantic value__. For type safety, we allow
|
||||||
|
each nonterminal and terminal to produce only one type of semantic value. For instance, all rules for \\(A_{add}\\) must
|
||||||
|
produce an expression. We specify the type of each nonterminal and using `%type` directives. The types of terminals
|
||||||
|
are specified when they're declared.
|
||||||
|
|
||||||
|
Next, we must recognize that Bison was originally made for C, rather than C++. In order to allow the parser
|
||||||
|
to store and operate on semantic values of various types, the canonical solution back in those times was to
|
||||||
|
use a C `union`. Unions are great, but for C++, they're more trouble than they're worth: unions don't
|
||||||
|
allow for non-trivial constructors! This means that stuff like `std::unique_ptr` and `std::string` is off limits as
|
||||||
|
a semantic value. But we'd really much rather use them! The solution is to:
|
||||||
|
|
||||||
|
1. Specify the language to be C++, rather than C.
|
||||||
|
2. Enable the `variant` API feature, which uses a lightweight `std::variant` alternative in place of a union.
|
||||||
|
3. Enable the creation of token constructors, which we will use in Flex.
|
||||||
|
|
||||||
|
In order to be able to use the variant-based API, we also need to change the Flex `yylex` function
|
||||||
|
to return `yy::parser::symbol_type`. You can see it in our forward declaration of `yylex`.
|
||||||
|
|
||||||
|
Now that we made these changes, it's time to hook up Flex to all this. Here's a new version
|
||||||
|
of the Flex scanner, with all necessary modifications:
|
||||||
|
{{< rawblock "compiler_scanner_bison.l" >}}
|
||||||
|
|
||||||
|
The key two ideas are that we overrode the default signature of `yylex` by changing the
|
||||||
|
`YY_DECL` preprocessor variable, and used the `yy::parser::make_<TOKEN>` functions
|
||||||
|
to return the `symbol_type` rather than `int`.
|
||||||
|
|
||||||
|
Finally, let's get a main function so that we can at least check for segmentation faults
|
||||||
|
and other obvious mistakes:
|
||||||
|
{{< codeblock "C++" "compiler_main.cpp" >}}
|
||||||
|
|
||||||
|
Now, we can compile and run the code:
|
||||||
|
```
|
||||||
|
flex -o compiler_scanner.cpp compiler_scanner_bison.l
|
||||||
|
bison -o compiler_parser.cpp -d compiler_parser.y
|
||||||
|
g++ -c -o scanner.o compiler_scanner.cpp
|
||||||
|
g++ -c -o parser.o compiler_parser.cpp
|
||||||
|
g++ compiler_main.cpp parser.o scanner.o
|
||||||
|
```
|
||||||
|
We used the `-d` option for Bison to generate the `compiler_parser.hpp` header file,
|
||||||
|
which exports our token declarations and token creation functions, allowing
|
||||||
|
us to use them in Flex.
|
||||||
|
|
||||||
|
At last, we can feed some code to the parser from `stdin`. Let's try it:
|
||||||
|
```
|
||||||
|
./a.out
|
||||||
|
defn main = { add 320 6 }
|
||||||
|
defn add x y = { x + y }
|
||||||
|
```
|
||||||
|
The program prints `2`, indicating two declarations were made. Let's try something obviously
|
||||||
|
wrong:
|
||||||
|
```
|
||||||
|
./a.out
|
||||||
|
}{
|
||||||
|
```
|
||||||
|
We are told an error occured. Excellent!
|
||||||
|
|
||||||
|
There's still a number of flaws with our parser:
|
||||||
|
|
||||||
|
2. We don't print errors properly.
|
||||||
|
3. We also have no way of verifying our tree was built correctly.
|
||||||
|
1. We're missing the data declaration, from both our C++ source and from the Bison grammars.
|
||||||
|
|
||||||
|
This post is getting a little long, so we will revisit the parser in the next one. See you then!
|
||||||
|
|
Loading…
Reference in New Issue
Block a user