Add more work on part 11 of compiler series
Some checks failed
continuous-integration/drone/push Build is failing
Some checks failed
continuous-integration/drone/push Build is failing
This commit is contained in:
parent
b4c91d2dd4
commit
2255543d94
@ -104,3 +104,117 @@ In effect, they take zero arguments and produce types (themselves).
|
||||
|
||||
Polytypes (type schemes) in our system can be all of the above, but may also include a "forall"
|
||||
quantifier at the front, generalizing the type (like \\(\\forall a \\; . \\; \\text{List} \\; a \\rightarrow \\text{Int}\\)).
|
||||
|
||||
Let's start implementing all of this. Why don't we start with the change to the syntax of our language?
|
||||
We have complicated the situation quite a bit. Let's take a look at the _old_ grammar
|
||||
for data type declarations (this is going back as far as [part 2]({{< relref "02_compiler_parsing.md" >}})).
|
||||
Here, \\(L\_D\\) is the nonterminal for the things that go between the curly braces of a data type
|
||||
declaration, \\(D\\) is the nonterminal representing a single constructor definition,
|
||||
and \\(L\_U\\) is a list of zero or more uppercase variable names:
|
||||
|
||||
{{< latex >}}
|
||||
\begin{aligned}
|
||||
L_D & \rightarrow D \; , \; L_D \\
|
||||
L_D & \rightarrow D \\
|
||||
D & \rightarrow \text{upperVar} \; L_U \\
|
||||
L_U & \rightarrow \text{upperVar} \; L_U \\
|
||||
L_U & \rightarrow \epsilon
|
||||
\end{aligned}
|
||||
{{< /latex >}}
|
||||
|
||||
This grammar was actually too simple even for our monomorphically typed language!
|
||||
Since functions are not represented using a single uppercase variable, it wasn't possible for us
|
||||
to define constructors that accept as arguments anything other than integers and user-defined
|
||||
data types. Now, we also need to modify this grammar to allow for constructor applications (which can be nested!)
|
||||
To do so, we will define a new nonterminal, \\(Y\\), for types:
|
||||
|
||||
{{< latex >}}
|
||||
\begin{aligned}
|
||||
Y & \rightarrow N \; ``\rightarrow" Y \\
|
||||
Y & \rightarrow N
|
||||
\end{aligned}
|
||||
{{< /latex >}}
|
||||
|
||||
We make it right-recursive (because the \\(\\rightarrow\\) operator is right-associative). Next, we define
|
||||
a nonterminal for all types _except_ those constructed with the arrow, \\(N\\).
|
||||
|
||||
{{< latex >}}
|
||||
\begin{aligned}
|
||||
N & \rightarrow \text{upperVar} \; L_Y \\
|
||||
N & \rightarrow \text{typeVar} \\
|
||||
N & \rightarrow ( Y )
|
||||
\end{aligned}
|
||||
{{< /latex >}}
|
||||
|
||||
The first of the above rules allows a type to be a constructor applied to zero or more arguments
|
||||
(generated by \\(L\_Y\\)). The second rule allows a type to be a placeholder type variable. Finally,
|
||||
the third rule allows for any type (including functions, again) to occur between parentheses.
|
||||
This is so that higher-order functions, like \\((a \rightarrow b) \rightarrow a \rightarrow a \\),
|
||||
can be represented.
|
||||
|
||||
Unfortunately, the definition of \\(L\_Y\\) is not as straightforward as we imagine. We could define
|
||||
it as just a list of \\(Y\\) nonterminals, but this would make the grammar ambigous: something
|
||||
like `List Maybe Int` could be interpreted as "`List`, applied to types `Maybe` and `Int`", and
|
||||
"`List`, applied to type `Maybe Int`". To avoid this, we define a "type list element" \\(Y'\\),
|
||||
which does not take arguments:
|
||||
|
||||
{{< latex >}}
|
||||
\begin{aligned}
|
||||
Y' & \rightarrow \text{upperVar} \\
|
||||
Y' & \rightarrow \text{lowerVar} \\
|
||||
Y' & \rightarrow ( Y )
|
||||
\end{aligned}
|
||||
{{< /latex >}}
|
||||
|
||||
We then make \\(L\_Y\\) a list of \\(Y'\\):
|
||||
|
||||
{{< latex >}}
|
||||
\begin{aligned}
|
||||
L_Y & \rightarrow Y' \; L_Y \\
|
||||
L_Y & \rightarrow \epsilon
|
||||
\end{aligned}
|
||||
{{< /latex >}}
|
||||
|
||||
Finally, we update the rules for the data type declaration, as well as for a single
|
||||
constructor:
|
||||
|
||||
{{< latex >}}
|
||||
\begin{aligned}
|
||||
T & \rightarrow \text{data} \; \text{upperVar} \; L_T = \{ L_D \} \\
|
||||
D & \rightarrow \text{upperVar} \; L_Y \\
|
||||
\end{aligned}
|
||||
{{< /latex >}}
|
||||
|
||||
Now that we have a grammar for all these things, we have to implement
|
||||
the corresponding data structures. We define a new family of structs,
|
||||
extending `parsed_type`, which represent types as they are
|
||||
received from the parser. These differ from regular types in that they
|
||||
do not require that the types they represent are valid; validating
|
||||
types requires two passes, which is a luxury we do not have when
|
||||
parsing. We can define them as follows:
|
||||
|
||||
{{< codeblock "C++" "compiler/11/parsed_type.hpp" >}}
|
||||
|
||||
We define the conversion function `to_type`, which requires
|
||||
a set of type variables quantified in the given type, and
|
||||
the environment in which to look up the arities of various
|
||||
type constructors. The implementation is as follows:
|
||||
|
||||
{{< codeblock "C++" "compiler/11/parsed_type.cpp" >}}
|
||||
|
||||
With this definition in hand, we can now update the grammar in our Bison file.
|
||||
First things first, we'll add the type parameters to the data type definition:
|
||||
|
||||
{{< codelines "plaintext" "compiler/11/parser.y" 127 130 >}}
|
||||
|
||||
Next, we add the new grammar rules we came up with:
|
||||
|
||||
{{< codelines "plaintext" "compiler/11/parser.y" 138 163 >}}
|
||||
|
||||
Finally, we define the types for these new rules at the top of the file:
|
||||
|
||||
{{< codelines "plaintext" "compiler/11/parser.y" 43 44 >}}
|
||||
|
||||
{{< todo >}}
|
||||
Nullary is not the right word.
|
||||
{{< /todo >}}
|
||||
|
Loading…
Reference in New Issue
Block a user