blog-static/content/blog/10_compiler_polymorphism.md

6.1 KiB

title date tags draft
Compiling a Functional Language Using C++, Part 10 - Polymorphism 2020-02-29T20:09:37-08:00
C and C++
Functional Languages
Compilers
true

[In part 8]({{< relref "08_compiler_llvm.md" >}}), we wrote some pretty interesting programs in our little language. We successfully expressed arithmetic and recursion. But there's one thing that we cannot express in our language without further changes: an if statement.

Suppose we didn't want to add a special if/else expression into our language. Thanks to lazy evaluation, we can express it using a function:

defn if c t e = {
    case c of {
        True -> { t }
        False -> { e }
    }
}

But an issue still remains: so far, our compiler remains monomorphic. That is, a particular function can only have one possible type for each one of its arguments. With our current setup, something like this {{< sidenote "right" "if-note" "would not work:" >}} In a polymorphically typed language, the inner if would just evaluate to False, and the whole expression to 3. {{< /sidenote >}}

if (if True False True) 11 3

This is because, for this to work, both of the following would need to hold (borrowing some of our notation from the [typechecking]({{< relref "03_compiler_typechecking.md" >}}) post):

{{< latex >}} \text{if} : \text{Bool} \rightarrow \text{Int} \rightarrow \text{Int} \rightarrow \text{Int} {{< /latex >}}

{{< latex >}} \text{if} : \text{Bool} \rightarrow \text{Bool} \rightarrow \text{Bool} \rightarrow \text{Bool} {{< /latex >}}

But using our rules so far, such a thing is impossible, since there is no way for \(\text{Int}\) to be unified with \(\text{Bool}\). We need a more powerful set of rules to describe our program's types. One such set of rules is the Hindley-Milner type system, which we have previously alluded to. In fact, the rules we came up with were already very close to Hindley-Milner, with the exception of two: generalization and instantiation. It's been quite a while since the last time we worked on typechecking, so I'm going to present a table with these new rules, as well as all of the ones that we previously used. I will also give a quick summary of each of these rules.

Rule Name and Description
{{< latex >}}
\frac
{x:\sigma \in \Gamma}
{\Gamma \vdash x:\sigma}
{{< /latex >}} Var: If the variable \(x\) is known to have some polymorphic type \(\sigma\) then an expression consisting only of that variable is of that type.
{{< latex >}}
\frac
{\Gamma \vdash e_1 : \tau_1 \rightarrow \tau_2 \quad \Gamma \vdash e_2 : \tau_1}
{\Gamma \vdash e_1 ; e_2 : \tau_2}
{{< /latex >}} App: If an expression \(e_1\), which is a function from monomorphic type \(\tau_1\) to another monomorphic type \(\tau_2\), is applied to an argument \(e_2\) of type \(\tau_1\), then the result is of type \(\tau_2\).
{{< latex >}}
\frac
{\Gamma, x:\tau \vdash e : \tau'}
{\Gamma \vdash \lambda x.e : \tau \rightarrow \tau'}
{{< /latex >}} Abs: If the body \(e\) of a lambda abstraction \(\lambda x.e\) is of type \(\tau'\) when \(x\) is of type \(\tau\) then the whole lambda abstraction is of type \(\tau \rightarrow \tau'\).
{{< latex >}}
\frac
{\Gamma \vdash e : \tau \quad \text{matcht}(\tau, p_i) = b_i
\quad \Gamma,b_i \vdash e_i : \tau_c}
{\Gamma \vdash \text{case} ; e ; \text{of} ;
{ (p_1,e_1) \ldots (p_n, e_n) } : \tau_c }
{{< /latex >}} Case: This rule is not part of Hindley-Milner, and is specific to our language. If the expression being case-analyzed is of type \(\tau\) and each branch \((p_i, e_i)\) is of the same type \(\tau_c\) when the pattern \(p_i\) works with type \(\tau\) producing extra bindings \(b_i\), the whole case expression is of type \(\tau_c\).
{{< latex >}}
\frac{\Gamma \vdash e : \sigma \quad \sigma' \sqsubseteq \sigma}
{\Gamma \vdash e : \sigma'}
{{< /latex >}} Inst (New): If type \(\sigma'\) is an instantiation of type \(\sigma\) then an expression of type \(\sigma\) is also an expression of type \(\sigma'\).
{{< latex >}}
\frac
{\Gamma \vdash e : \sigma \quad \alpha \not \in \text{free}(\Gamma)}
{\Gamma \vdash e : \forall \alpha . \sigma}
{{< /latex >}} Gen (New): If an expression has a type with free variables, this rule allows us generalize it to allow all possible types to be used for these free variables.

Here, there is a distinction between different forms of types. First, there are monomorphic types, or monotypes, \(\tau\), which are types such as \(\text{Int}\), \(\text{Int} \rightarrow \text{Bool}\), \(a \rightarrow b\) and so on. These types are what we've been working with so far. Each of them represents one (hence, "mono-"), concrete type. This is obvious in the case of \(\text{Int}\) and \(\text{Int} \rightarrow \text{Bool}\), but for \(a \rightarrow b\) things are slightly less clear. Does it really represent a single type, if we can put an arbtirary thing for \(a\)? The answer is "yes"! Although \(a\) is not currently known, it stands in place of another monotype, which is yet to be determined.

So, how do we represent polymorphic types, like that of \(\text{if}\)? We borrow some more notation from mathematics, and use the "forall" quantifier:

{{< latex >}} \text{if} : \forall a ; . ; \text{Bool} \rightarrow a \rightarrow a \rightarrow a {{< /latex >}}

This basically says, "the type of \(\text{if}\) is \(\text{Bool} \rightarrow a \rightarrow a \rightarrow a\) for all possible \(a\)". This new expression using "forall" is what we call a type scheme, or a polytype \(\sigma\). For simplicity, we only allow "forall" to be at the front of a polytype. That is, expressions like \(a \rightarrow \forall b \; . \; b \rightarrow b\) are not valid polytypes as far as we're concerned.

It's key to observe that the majority of the typing rules in the above table use monotypes (\(\tau\)). Only two rules seem to mention \(\sigma\): Inst and Gen.