blog-static/content/blog/11_compiler_polymorphic_dat...

107 lines
5.6 KiB
Markdown

---
title: Compiling a Functional Language Using C++, Part 11 - Polymorphic Data Types
date: 2020-03-28T20:10:35-07:00
draft: true
tags: ["C and C++", "Functional Languages", "Compilers"]
---
[In part 10]({{< relref "10_compiler_polymorphism.md" >}}), we managed to get our
compiler to accept functions that were polymorphically typed. However, a piece
of the puzzle is still missing: while our _functions_ can handle values
of different types, the same cannot be said for our _data types_. This means
that we cannot construct data structures that can contain arbitrary types.
While we can define and use a list of integers, if we want to also have a
list of booleans, we must copy all of our constructors and define a new data type.
Worse, not only do we have to duplicate the constructors, but also all the functions
that operate on the list. As far as our compiler is concerned, a list of
integers and a list of booleans are entirely different beasts, and cannot
be operated on by the same code.
To make polymorphic data types possible, we must extend our language (and type
system) a little. We will now allow for something like this:
```
data List a = { Nil, Cons a List }
```
In the above snippet, we are no longer declaring a single type, but a collection
of related types, __parameterized__ by a type `a`. Any type can take the place
of `a` to get a list containing that type of element.
Then, `List Int` is a type,
as is `List Bool` and `List (List Int)`. The constructors in the snippet also
get polymorphic types:
{{< latex >}}
\text{Nil} : \forall a \; . \; \text{List} \; a \\
\text{Cons} : \forall a \; . \; a \rightarrow \text{List} \; a \rightarrow \text{List} \; a
{{< /latex >}}
When you call `Cons`, the type of the resulting list varies with the type of element
you pass in. The empty list `Nil` is a valid list of any type, since, well, it's
empty.
Let's talk about `List` itself, now. I suggest that we ponder the following table:
\\(\\text{List}\\)|\\(\\text{Cons}\\)
----|----
\\(\\text{List}\\) is not a type; it must be followed up with arguments, like \\(\\text{List} \\; \\text{Int}\\).|\\(\\text{Cons}\\) is not a list; it must be followed up with arguments, like \\(\\text{Cons} \\; 3 \\; \\text{Nil}\\).
\\(\\text{List} \\; \\text{Int}\\) is in its simplest form.|\\(\\text{Cons} \\; 3 \\; \\text{Nil}\\) is in its simplest form.
\\(\\text{List} \\; \\text{Int}\\) is a type.|\\(\\text{Cons} \\; 3 \\; \\text{Nil}\\) is a value of type \\(\\text{List} \\; \\text{Int}\\).
I hope that the similarities are quite striking. I claim that
`List` is quite similar to a constructor `Cons`, except that it occurs
in a different context: whereas `Cons` is a way to create values,
`List` is a way to create types. Indeed, while we call `Cons` a constructor,
it's typical to call `List` a __type constructor__.
We know that `Cons` is a function which
assigns to values (like `3` and `Nil`) other values (like `Cons 3 Nil`, or `[3]` for
short). In a similar manner, `List` can be thought of as a function
that assigns to types (like `Int`) other types (like `List Int`). We can
even claim that it has a type:
{{< latex >}}
\text{List} : \text{Type} \rightarrow \text{Type}
{{< /latex >}}
{{< sidenote "right" "dependent-types-note" "Unless we get really wacky," >}}
When your type constructors take as input not only other types but also values
such as <code>3</code>, you've ventured into the territory of
<a href="https://en.wikipedia.org/wiki/Dependent_type">dependent types</a>.
This is a significant step up in complexity from what we'll be doing in this
series. If you're interested, check out
<a href="https://www.idris-lang.org/">Idris</a> (if you want to know about dependent types
for functional programming) or <a href="https://coq.inria.fr/">Coq</a> (to see how
propositions and proofs can be encoded in a dependently typed language).
{{< /sidenote >}}
our type constructors will only take zero or more types as input, and produce
a type as output. In this case, writing \\(\\text{Type}\\) becomes quite repetitive,
and we will adopt the convention of writing \\(*\\) instead. The types of such
constructors are called [kinds](https://en.wikipedia.org/wiki/Kind_(type_theory)).
Let's look at a few examples, just to make sure we're on the same page:
* The kind of \\(\\text{Bool}\\) is \\(*\\): it does not accept any
type arguments, and is a type in its own right.
* The kind of \\(\\text{List}\\) is \\(*\\rightarrow *\\): it takes
one argument (the type of the things inside the list), and creates
a type from it.
* If we define a pair as `data Pair a b = { MkPair a b }`, then its
kind is \\(* \\rightarrow * \\rightarrow *\\), because it requires
two parameters.
As one final observation, we note that effectively, all we're doing is
tracking the [arity](https://en.wikipedia.org/wiki/Arity) of the constructor
type.
Let's now enumerate all the possible forms that (mono)types can take in our system:
1. A type can be a placeholder, like \\(a\\), \\(b\\), etc.
2. A type can be a type constructor, applied to
{{< sidenote "right" "zero-more-note" "zero ore more arguments," >}}
It is convenient to treat regular types (like \(\text{Bool}\)) as
type constructors of arity 0 (that is, type constructors with kind \(*\)).
In effect, they take zero arguments and produce types (themselves).
{{< /sidenote >}} such as \\(\\text{List} \; \\text{Int}\\) or \\(\\text{Bool}\\).
3. A function from one type to another, like \\(\\text{List} \\; a \\rightarrow \\text{Int}\\).
Polytypes (type schemes) in our system can be all of the above, but may also include a "forall"
quantifier at the front, generalizing the type (like \\(\\forall a \\; . \\; \\text{List} \\; a \\rightarrow \\text{Int}\\)).