--- title: Compiling a Functional Language Using C++, Part 11 - Polymorphic Data Types date: 2020-03-28T20:10:35-07:00 draft: true tags: ["C and C++", "Functional Languages", "Compilers"] --- [In part 10]({{< relref "10_compiler_polymorphism.md" >}}), we managed to get our compiler to accept functions that were polymorphically typed. However, a piece of the puzzle is still missing: while our _functions_ can handle values of different types, the same cannot be said for our _data types_. This means that we cannot construct data structures that can contain arbitrary types. While we can define and use a list of integers, if we want to also have a list of booleans, we must copy all of our constructors and define a new data type. Worse, not only do we have to duplicate the constructors, but also all the functions that operate on the list. As far as our compiler is concerned, a list of integers and a list of booleans are entirely different beasts, and cannot be operated on by the same code. To make polymorphic data types possible, we must extend our language (and type system) a little. We will now allow for something like this: ``` data List a = { Nil, Cons a List } ``` In the above snippet, we are no longer declaring a single type, but a collection of related types, __parameterized__ by a type `a`. Any type can take the place of `a` to get a list containing that type of element. Then, `List Int` is a type, as is `List Bool` and `List (List Int)`. The constructors in the snippet also get polymorphic types: {{< latex >}} \text{Nil} : \forall a \; . \; \text{List} \; a \\ \text{Cons} : \forall a \; . \; a \rightarrow \text{List} \; a \rightarrow \text{List} \; a {{< /latex >}} When you call `Cons`, the type of the resulting list varies with the type of element you pass in. The empty list `Nil` is a valid list of any type, since, well, it's empty. Let's talk about `List` itself, now. I suggest that we ponder the following table: \\(\\text{List}\\)|\\(\\text{Cons}\\) ----|---- \\(\\text{List}\\) is not a type; it must be followed up with arguments, like \\(\\text{List} \\; \\text{Int}\\).|\\(\\text{Cons}\\) is not a list; it must be followed up with arguments, like \\(\\text{Cons} \\; 3 \\; \\text{Nil}\\). \\(\\text{List} \\; \\text{Int}\\) is in its simplest form.|\\(\\text{Cons} \\; 3 \\; \\text{Nil}\\) is in its simplest form. \\(\\text{List} \\; \\text{Int}\\) is a type.|\\(\\text{Cons} \\; 3 \\; \\text{Nil}\\) is a value of type \\(\\text{List} \\; \\text{Int}\\). I hope that the similarities are quite striking. I claim that `List` is quite similar to a constructor `Cons`, except that it occurs in a different context: whereas `Cons` is a way to create values, `List` is a way to create types. Indeed, while we call `Cons` a constructor, it's typical to call `List` a __type constructor__. We know that `Cons` is a function which assigns to values (like `3` and `Nil`) other values (like `Cons 3 Nil`, or `[3]` for short). In a similar manner, `List` can be thought of as a function that assigns to types (like `Int`) other types (like `List Int`). We can even claim that it has a type: {{< latex >}} \text{List} : \text{Type} \rightarrow \text{Type} {{< /latex >}} {{< sidenote "right" "dependent-types-note" "Unless we get really wacky," >}} When your type constructors take as input not only other types but also values such as 3, you've ventured into the territory of dependent types. This is a significant step up in complexity from what we'll be doing in this series. If you're interested, check out Idris (if you want to know about dependent types for functional programming) or Coq (to see how propositions and proofs can be encoded in a dependently typed language). {{< /sidenote >}} our type constructors will only take zero or more types as input, and produce a type as output. In this case, writing \\(\\text{Type}\\) becomes quite repetitive, and we will adopt the convention of writing \\(*\\) instead. The types of such constructors are called [kinds](https://en.wikipedia.org/wiki/Kind_(type_theory)). Let's look at a few examples, just to make sure we're on the same page: * The kind of \\(\\text{Bool}\\) is \\(*\\): it does not accept any type arguments, and is a type in its own right. * The kind of \\(\\text{List}\\) is \\(*\\rightarrow *\\): it takes one argument (the type of the things inside the list), and creates a type from it. * If we define a pair as `data Pair a b = { MkPair a b }`, then its kind is \\(* \\rightarrow * \\rightarrow *\\), because it requires two parameters. As one final observation, we note that effectively, all we're doing is tracking the [arity](https://en.wikipedia.org/wiki/Arity) of the constructor type. Let's now enumerate all the possible forms that (mono)types can take in our system: 1. A type can be a placeholder, like \\(a\\), \\(b\\), etc. 2. A type can be a type constructor, applied to {{< sidenote "right" "zero-more-note" "zero ore more arguments," >}} It is convenient to treat regular types (like \(\text{Bool}\)) as type constructors of arity 0 (that is, type constructors with kind \(*\)). In effect, they take zero arguments and produce types (themselves). {{< /sidenote >}} such as \\(\\text{List} \; \\text{Int}\\) or \\(\\text{Bool}\\). 3. A function from one type to another, like \\(\\text{List} \\; a \\rightarrow \\text{Int}\\). Polytypes (type schemes) in our system can be all of the above, but may also include a "forall" quantifier at the front, generalizing the type (like \\(\\forall a \\; . \\; \\text{List} \\; a \\rightarrow \\text{Int}\\)).