blog-static/content/blog/10_compiler_polymorphism.md

5.2 KiB

title date tags draft
Compiling a Functional Language Using C++, Part 10 - Polymorphism 2020-02-29T20:09:37-08:00
C and C++
Functional Languages
Compilers
true

[In part 8]({{< relref "08_compiler_llvm.md" >}}), we wrote some pretty interesting programs in our little language. We successfully expressed arithmetic and recursion. But there's one thing that we cannot express in our language without further changes: an if statement.

Suppose we didn't want to add a special if/else expression into our language. Thanks to lazy evaluation, we can express it using a function:

defn if c t e = {
    case c of {
        True -> { t }
        False -> { e }
    }
}

But an issue still remains: so far, our compiler remains monomorphic. That is, a particular function can only have one possible type for each one of its arguments. With our current setup, something like this {{< sidenote "right" "if-note" "would not work:" >}} In a polymorphically typed language, the inner if would just evaluate to False, and the whole expression to 3. {{< /sidenote >}}

if (if True False True) 11 3

This is because, for this to work, both of the following would need to hold (borrowing some of our notation from the [typechecking]({{< relref "03_compiler_typechecking.md" >}}) post):

{{< latex >}} \text{if} : \text{Int} \rightarrow \text{Int} {{< /latex >}}

{{< latex >}} \text{if} : \text{Bool} \rightarrow \text{Bool} {{< /latex >}}

But using our rules so far, such a thing is impossible, since there is no way for \(\text{Int}\) to be unified with \(\text{Bool}\). We need a more powerful set of rules to describe our program's types. One such set of rules is the Hindley-Milner type system, which we have previously alluded to. In fact, the rules we came up with were already very close to Hindley-Milner, with the exception of two: generalization and instantiation. It's been quite a while since the last time we worked on typechecking, so I'm going to present a table with these new rules, as well as all of the ones that we previously used. I will also give a quick summary of each of these rules.

Rule Name and Description
{{< latex >}}
\frac
{x:\sigma \in \Gamma}
{\Gamma \vdash x:\sigma}
{{< /latex >}} Var: If the variable \(x\) is known to have some polymorphic type \(\sigma\) then an expression consisting only of that variable is of that type.
{{< latex >}}
\frac
{\Gamma \vdash e_1 : \tau_1 \rightarrow \tau_2 \quad \Gamma \vdash e_2 : \tau_1}
{\Gamma \vdash e_1 ; e_2 : \tau_2}
{{< /latex >}} App: If an expression \(e_1\), which is a function from monomorphic type \(\tau_1\) to another monomorphic type \(\tau_2\), is applied to an argument \(e_2\) of type \(\tau_1\), then the result is of type \(\tau_2\).
{{< latex >}}
\frac
{\Gamma, x:\tau \vdash e : \tau'}
{\Gamma \vdash \lambda x.e : \tau \rightarrow \tau'}
{{< /latex >}} Abs: If the body \(e\) of a lambda abstraction \(\lambda x.e\) is of type \(\tau'\) when \(x\) is of type \(\tau\) then the whole lambda abstraction is of type \(\tau \rightarrow \tau'\).
{{< latex >}}
\frac
{\Gamma \vdash e : \tau \quad \text{matcht}(\tau, p_i) = b_i
\quad \Gamma,b_i \vdash e_i : \tau_c}
{\Gamma \vdash \text{case} ; e ; \text{of} ;
{ (p_1,e_1) \ldots (p_n, e_n) } : \tau_c }
{{< /latex >}} Case: This rule is not part of Hindley-Milner, and is specific to our language. If the expression being case-analyzed is of type \(\tau\) and each branch \((p_i, e_i)\) is of the same type \(\tau_c\) when the pattern \(p_i\) works with type \(\tau\) producing extra bindings \(b_i\), the whole case expression is of type \(\tau_c\).
{{< latex >}}
\frac{\Gamma \vdash e : \sigma \quad \sigma' \sqsubseteq \sigma}
{\Gamma \vdash e : \sigma'}
{{< /latex >}} Inst (New): If type \(\sigma'\) is an instantiation of type \(\sigma\) then an expression of type \(\sigma\) is also an expression of type \(\sigma'\).
{{< latex >}}
\frac
{\Gamma \vdash e : \sigma \quad \alpha \not \in \text{free}(\Gamma)}
{\Gamma \vdash e : \forall \alpha . \sigma}
{{< /latex >}} Gen (New): If an expression has a type with free variables, this rule allows us generalize it to allow all possible types to be used for these free variables.

Here, there is a distinction between different forms of types. First, there are monomorphic types, or monotypes, \(\tau\), which are types such as \(\text{Int}\), \(\text{Int} \rightarrow \text{Bool}\), \(a \rightarrow b\) and so on. These types are what we've been working with so far. Each of them represents one (hence, "mono-"), concrete type. This is obvious in the case of \(\text{Int}\) and \(\text{Int} \rightarrow \text{Bool}\), but for \(a \rightarrow b\) things are slightly less clear. Does it really represent a single type, if we can put an arbtirary thing for \(a\)? The answer is "no". The way it is right now, \(a\) is still an unknown, yet concrete thing. Once we find \(a\) and put it in, that's it. If only there was a way to say, "this is the type for any \(a\)"...