From 2255543d94bc3ddf388ca6365ca315bce78e2f56 Mon Sep 17 00:00:00 2001 From: Danila Fedorin Date: Tue, 14 Apr 2020 00:15:32 -0700 Subject: [PATCH] Add more work on part 11 of compiler series --- .../11_compiler_polymorphic_data_types.md | 114 ++++++++++++++++++ 1 file changed, 114 insertions(+) diff --git a/content/blog/11_compiler_polymorphic_data_types.md b/content/blog/11_compiler_polymorphic_data_types.md index 8962489..e6e203d 100644 --- a/content/blog/11_compiler_polymorphic_data_types.md +++ b/content/blog/11_compiler_polymorphic_data_types.md @@ -104,3 +104,117 @@ In effect, they take zero arguments and produce types (themselves). Polytypes (type schemes) in our system can be all of the above, but may also include a "forall" quantifier at the front, generalizing the type (like \\(\\forall a \\; . \\; \\text{List} \\; a \\rightarrow \\text{Int}\\)). + +Let's start implementing all of this. Why don't we start with the change to the syntax of our language? +We have complicated the situation quite a bit. Let's take a look at the _old_ grammar +for data type declarations (this is going back as far as [part 2]({{< relref "02_compiler_parsing.md" >}})). +Here, \\(L\_D\\) is the nonterminal for the things that go between the curly braces of a data type +declaration, \\(D\\) is the nonterminal representing a single constructor definition, +and \\(L\_U\\) is a list of zero or more uppercase variable names: + +{{< latex >}} +\begin{aligned} +L_D & \rightarrow D \; , \; L_D \\ +L_D & \rightarrow D \\ +D & \rightarrow \text{upperVar} \; L_U \\ +L_U & \rightarrow \text{upperVar} \; L_U \\ +L_U & \rightarrow \epsilon +\end{aligned} +{{< /latex >}} + +This grammar was actually too simple even for our monomorphically typed language! +Since functions are not represented using a single uppercase variable, it wasn't possible for us +to define constructors that accept as arguments anything other than integers and user-defined +data types. Now, we also need to modify this grammar to allow for constructor applications (which can be nested!) +To do so, we will define a new nonterminal, \\(Y\\), for types: + +{{< latex >}} +\begin{aligned} +Y & \rightarrow N \; ``\rightarrow" Y \\ +Y & \rightarrow N +\end{aligned} +{{< /latex >}} + +We make it right-recursive (because the \\(\\rightarrow\\) operator is right-associative). Next, we define +a nonterminal for all types _except_ those constructed with the arrow, \\(N\\). + +{{< latex >}} +\begin{aligned} +N & \rightarrow \text{upperVar} \; L_Y \\ +N & \rightarrow \text{typeVar} \\ +N & \rightarrow ( Y ) +\end{aligned} +{{< /latex >}} + +The first of the above rules allows a type to be a constructor applied to zero or more arguments +(generated by \\(L\_Y\\)). The second rule allows a type to be a placeholder type variable. Finally, +the third rule allows for any type (including functions, again) to occur between parentheses. +This is so that higher-order functions, like \\((a \rightarrow b) \rightarrow a \rightarrow a \\), +can be represented. + +Unfortunately, the definition of \\(L\_Y\\) is not as straightforward as we imagine. We could define +it as just a list of \\(Y\\) nonterminals, but this would make the grammar ambigous: something +like `List Maybe Int` could be interpreted as "`List`, applied to types `Maybe` and `Int`", and +"`List`, applied to type `Maybe Int`". To avoid this, we define a "type list element" \\(Y'\\), +which does not take arguments: + +{{< latex >}} +\begin{aligned} +Y' & \rightarrow \text{upperVar} \\ +Y' & \rightarrow \text{lowerVar} \\ +Y' & \rightarrow ( Y ) +\end{aligned} +{{< /latex >}} + +We then make \\(L\_Y\\) a list of \\(Y'\\): + +{{< latex >}} +\begin{aligned} +L_Y & \rightarrow Y' \; L_Y \\ +L_Y & \rightarrow \epsilon +\end{aligned} +{{< /latex >}} + +Finally, we update the rules for the data type declaration, as well as for a single +constructor: + +{{< latex >}} +\begin{aligned} +T & \rightarrow \text{data} \; \text{upperVar} \; L_T = \{ L_D \} \\ +D & \rightarrow \text{upperVar} \; L_Y \\ +\end{aligned} +{{< /latex >}} + +Now that we have a grammar for all these things, we have to implement +the corresponding data structures. We define a new family of structs, +extending `parsed_type`, which represent types as they are +received from the parser. These differ from regular types in that they +do not require that the types they represent are valid; validating +types requires two passes, which is a luxury we do not have when +parsing. We can define them as follows: + +{{< codeblock "C++" "compiler/11/parsed_type.hpp" >}} + +We define the conversion function `to_type`, which requires +a set of type variables quantified in the given type, and +the environment in which to look up the arities of various +type constructors. The implementation is as follows: + +{{< codeblock "C++" "compiler/11/parsed_type.cpp" >}} + +With this definition in hand, we can now update the grammar in our Bison file. +First things first, we'll add the type parameters to the data type definition: + +{{< codelines "plaintext" "compiler/11/parser.y" 127 130 >}} + +Next, we add the new grammar rules we came up with: + +{{< codelines "plaintext" "compiler/11/parser.y" 138 163 >}} + +Finally, we define the types for these new rules at the top of the file: + +{{< codelines "plaintext" "compiler/11/parser.y" 43 44 >}} + +{{< todo >}} +Nullary is not the right word. +{{< /todo >}}