Update "compiler: polymorphic data types" to new math delimiters

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
This commit is contained in:
Danila Fedorin 2024-05-13 18:50:05 -07:00
parent 2fde7e5cf8
commit 6f20b17948

View File

@ -42,11 +42,11 @@ empty.
Let's talk about `List` itself, now. I suggest that we ponder the following table: Let's talk about `List` itself, now. I suggest that we ponder the following table:
\\(\\text{List}\\)|\\(\\text{Cons}\\) \(\text{List}\)|\(\text{Cons}\)
----|---- ----|----
\\(\\text{List}\\) is not a type; it must be followed up with arguments, like \\(\\text{List} \\; \\text{Int}\\).|\\(\\text{Cons}\\) is not a list; it must be followed up with arguments, like \\(\\text{Cons} \\; 3 \\; \\text{Nil}\\). \(\text{List}\) is not a type; it must be followed up with arguments, like \(\text{List} \; \text{Int}\).|\(\text{Cons}\) is not a list; it must be followed up with arguments, like \(\text{Cons} \; 3 \; \text{Nil}\).
\\(\\text{List} \\; \\text{Int}\\) is in its simplest form.|\\(\\text{Cons} \\; 3 \\; \\text{Nil}\\) is in its simplest form. \(\text{List} \; \text{Int}\) is in its simplest form.|\(\text{Cons} \; 3 \; \text{Nil}\) is in its simplest form.
\\(\\text{List} \\; \\text{Int}\\) is a type.|\\(\\text{Cons} \\; 3 \\; \\text{Nil}\\) is a value of type \\(\\text{List} \\; \\text{Int}\\). \(\text{List} \; \text{Int}\) is a type.|\(\text{Cons} \; 3 \; \text{Nil}\) is a value of type \(\text{List} \; \text{Int}\).
I hope that the similarities are quite striking. I claim that I hope that the similarities are quite striking. I claim that
`List` is quite similar to a constructor `Cons`, except that it occurs `List` is quite similar to a constructor `Cons`, except that it occurs
@ -74,18 +74,18 @@ for functional programming) or <a href="https://coq.inria.fr/">Coq</a> (to see h
propositions and proofs can be encoded in a dependently typed language). propositions and proofs can be encoded in a dependently typed language).
{{< /sidenote >}} {{< /sidenote >}}
our type constructors will only take zero or more types as input, and produce our type constructors will only take zero or more types as input, and produce
a type as output. In this case, writing \\(\\text{Type}\\) becomes quite repetitive, a type as output. In this case, writing \(\text{Type}\) becomes quite repetitive,
and we will adopt the convention of writing \\(*\\) instead. The types of such and we will adopt the convention of writing \(*\) instead. The types of such
constructors are called [kinds](https://en.wikipedia.org/wiki/Kind_(type_theory)). constructors are called [kinds](https://en.wikipedia.org/wiki/Kind_(type_theory)).
Let's look at a few examples, just to make sure we're on the same page: Let's look at a few examples, just to make sure we're on the same page:
* The kind of \\(\\text{Bool}\\) is \\(*\\): it does not accept any * The kind of \(\text{Bool}\) is \(*\): it does not accept any
type arguments, and is a type in its own right. type arguments, and is a type in its own right.
* The kind of \\(\\text{List}\\) is \\(*\\rightarrow *\\): it takes * The kind of \(\text{List}\) is \(*\rightarrow *\): it takes
one argument (the type of the things inside the list), and creates one argument (the type of the things inside the list), and creates
a type from it. a type from it.
* If we define a pair as `data Pair a b = { MkPair a b }`, then its * If we define a pair as `data Pair a b = { MkPair a b }`, then its
kind is \\(* \\rightarrow * \\rightarrow *\\), because it requires kind is \(* \rightarrow * \rightarrow *\), because it requires
two parameters. two parameters.
As one final observation, we note that effectively, all we're doing is As one final observation, we note that effectively, all we're doing is
@ -94,24 +94,24 @@ type.
Let's now enumerate all the possible forms that (mono)types can take in our system: Let's now enumerate all the possible forms that (mono)types can take in our system:
1. A type can be a placeholder, like \\(a\\), \\(b\\), etc. 1. A type can be a placeholder, like \(a\), \(b\), etc.
2. A type can be a type constructor, applied to 2. A type can be a type constructor, applied to
{{< sidenote "right" "zero-more-note" "zero ore more arguments," >}} {{< sidenote "right" "zero-more-note" "zero ore more arguments," >}}
It is convenient to treat regular types (like \(\text{Bool}\)) as It is convenient to treat regular types (like \(\text{Bool}\)) as
type constructors of arity 0 (that is, type constructors with kind \(*\)). type constructors of arity 0 (that is, type constructors with kind \(*\)).
In effect, they take zero arguments and produce types (themselves). In effect, they take zero arguments and produce types (themselves).
{{< /sidenote >}} such as \\(\\text{List} \\; \\text{Int}\\) or \\(\\text{Bool}\\). {{< /sidenote >}} such as \(\text{List} \; \text{Int}\) or \(\text{Bool}\).
3. A function from one type to another, like \\(\\text{List} \\; a \\rightarrow \\text{Int}\\). 3. A function from one type to another, like \(\text{List} \; a \rightarrow \text{Int}\).
Polytypes (type schemes) in our system can be all of the above, but may also include a "forall" Polytypes (type schemes) in our system can be all of the above, but may also include a "forall"
quantifier at the front, generalizing the type (like \\(\\forall a \\; . \\; \\text{List} \\; a \\rightarrow \\text{Int}\\)). quantifier at the front, generalizing the type (like \(\forall a \; . \; \text{List} \; a \rightarrow \text{Int}\)).
Let's start implementing all of this. Why don't we start with the change to the syntax of our language? Let's start implementing all of this. Why don't we start with the change to the syntax of our language?
We have complicated the situation quite a bit. Let's take a look at the _old_ grammar We have complicated the situation quite a bit. Let's take a look at the _old_ grammar
for data type declarations (this is going back as far as [part 2]({{< relref "02_compiler_parsing.md" >}})). for data type declarations (this is going back as far as [part 2]({{< relref "02_compiler_parsing.md" >}})).
Here, \\(L\_D\\) is the nonterminal for the things that go between the curly braces of a data type Here, \(L_D\) is the nonterminal for the things that go between the curly braces of a data type
declaration, \\(D\\) is the nonterminal representing a single constructor definition, declaration, \(D\) is the nonterminal representing a single constructor definition,
and \\(L\_U\\) is a list of zero or more uppercase variable names: and \(L_U\) is a list of zero or more uppercase variable names:
{{< latex >}} {{< latex >}}
\begin{aligned} \begin{aligned}
@ -127,7 +127,7 @@ This grammar was actually too simple even for our monomorphically typed language
Since functions are not represented using a single uppercase variable, it wasn't possible for us Since functions are not represented using a single uppercase variable, it wasn't possible for us
to define constructors that accept as arguments anything other than integers and user-defined to define constructors that accept as arguments anything other than integers and user-defined
data types. Now, we also need to modify this grammar to allow for constructor applications (which can be nested). data types. Now, we also need to modify this grammar to allow for constructor applications (which can be nested).
To do all of these things, we will define a new nonterminal, \\(Y\\), for types: To do all of these things, we will define a new nonterminal, \(Y\), for types:
{{< latex >}} {{< latex >}}
\begin{aligned} \begin{aligned}
@ -136,8 +136,8 @@ Y & \rightarrow N
\end{aligned} \end{aligned}
{{< /latex >}} {{< /latex >}}
We make it right-recursive (because the \\(\\rightarrow\\) operator is right-associative). Next, we define We make it right-recursive (because the \(\rightarrow\) operator is right-associative). Next, we define
a nonterminal for all types _except_ those constructed with the arrow, \\(N\\). a nonterminal for all types _except_ those constructed with the arrow, \(N\).
{{< latex >}} {{< latex >}}
\begin{aligned} \begin{aligned}
@ -148,15 +148,15 @@ N & \rightarrow ( Y )
{{< /latex >}} {{< /latex >}}
The first of the above rules allows a type to be a constructor applied to zero or more arguments The first of the above rules allows a type to be a constructor applied to zero or more arguments
(generated by \\(L\_Y\\)). The second rule allows a type to be a placeholder type variable. Finally, (generated by \(L_Y\)). The second rule allows a type to be a placeholder type variable. Finally,
the third rule allows for any type (including functions, again) to occur between parentheses. the third rule allows for any type (including functions, again) to occur between parentheses.
This is so that higher-order functions, like \\((a \rightarrow b) \rightarrow a \rightarrow a \\), This is so that higher-order functions, like \((a \rightarrow b) \rightarrow a \rightarrow a \),
can be represented. can be represented.
Unfortunately, the definition of \\(L\_Y\\) is not as straightforward as we imagine. We could define Unfortunately, the definition of \(L_Y\) is not as straightforward as we imagine. We could define
it as just a list of \\(Y\\) nonterminals, but this would make the grammar ambigous: something it as just a list of \(Y\) nonterminals, but this would make the grammar ambigous: something
like `List Maybe Int` could be interpreted as "`List`, applied to types `Maybe` and `Int`", and like `List Maybe Int` could be interpreted as "`List`, applied to types `Maybe` and `Int`", and
"`List`, applied to type `Maybe Int`". To avoid this, we define a "type list element" \\(Y'\\), "`List`, applied to type `Maybe Int`". To avoid this, we define a "type list element" \(Y'\),
which does not take arguments: which does not take arguments:
{{< latex >}} {{< latex >}}
@ -167,7 +167,7 @@ Y' & \rightarrow ( Y )
\end{aligned} \end{aligned}
{{< /latex >}} {{< /latex >}}
We then make \\(L\_Y\\) a list of \\(Y'\\): We then make \(L_Y\) a list of \(Y'\):
{{< latex >}} {{< latex >}}
\begin{aligned} \begin{aligned}
@ -177,7 +177,7 @@ L_Y & \rightarrow \epsilon
{{< /latex >}} {{< /latex >}}
Finally, we update the rules for the data type declaration, as well as for a single Finally, we update the rules for the data type declaration, as well as for a single
constructor. In these new rules, we use \\(L\_T\\) to mean a list of type variables. constructor. In these new rules, we use \(L_T\) to mean a list of type variables.
The rules are as follows: The rules are as follows:
{{< latex >}} {{< latex >}}
@ -336,7 +336,7 @@ it will be once the type manager generates its first type variable, and things w
wanted type constructors to be monomorphic (but generic, with type variables) we'd need to internally wanted type constructors to be monomorphic (but generic, with type variables) we'd need to internally
instnatiate fresh type variables for every user-defined type variable, and substitute them appropriately. instnatiate fresh type variables for every user-defined type variable, and substitute them appropriately.
{{< /sidenote >}} {{< /sidenote >}}
as we have discussed above with \\(\\text{Nil}\\) and \\(\\text{Cons}\\). as we have discussed above with \(\text{Nil}\) and \(\text{Cons}\).
To accomodate for this, we also add all type variables to the "forall" quantifier To accomodate for this, we also add all type variables to the "forall" quantifier
of a new type scheme, whose monotype is our newly assembled function type. This of a new type scheme, whose monotype is our newly assembled function type. This
type scheme is what we store as the type of the constructor. type scheme is what we store as the type of the constructor.