112 lines
5.9 KiB
Markdown
112 lines
5.9 KiB
Markdown
---
|
|
title: Compiling a Functional Language Using C++, Part 10 - Polymorphism
|
|
date: 2020-02-29T20:09:37-08:00
|
|
tags: ["C and C++", "Functional Languages", "Compilers"]
|
|
draft: true
|
|
---
|
|
|
|
[In part 8]({{< relref "08_compiler_llvm.md" >}}), we wrote some pretty interesting programs in our little language.
|
|
We successfully expressed arithmetic and recursion. But there's one thing
|
|
that we cannot express in our language without further changes: an `if` statement.
|
|
|
|
Suppose we didn't want to add a special `if/else` expression into our language.
|
|
Thanks to lazy evaluation, we can express it using a function:
|
|
|
|
```
|
|
defn if c t e = {
|
|
case c of {
|
|
True -> { t }
|
|
False -> { e }
|
|
}
|
|
}
|
|
```
|
|
|
|
But an issue still remains: so far, our compiler remains __monomorphic__. That
|
|
is, a particular function can only have one possible type for each one of its
|
|
arguments. With our current setup, something like this
|
|
{{< sidenote "right" "if-note" "would not work:" >}}
|
|
In a polymorphically typed language, the inner <code>if</code> would just evaluate to
|
|
<code>False</code>, and the whole expression to 3.
|
|
{{< /sidenote >}}
|
|
|
|
```
|
|
if (if True False True) 11 3
|
|
```
|
|
|
|
This is because, for this to work, both of the following would need to hold (borrowing
|
|
some of our notation from the [typechecking]({{< relref "03_compiler_typechecking.md" >}}) post):
|
|
|
|
{{< latex >}}
|
|
\text{if} : \text{Bool} \rightarrow \text{Int} \rightarrow \text{Int} \rightarrow \text{Int}
|
|
{{< /latex >}}
|
|
|
|
{{< latex >}}
|
|
\text{if} : \text{Bool} \rightarrow \text{Bool} \rightarrow \text{Bool} \rightarrow \text{Bool}
|
|
{{< /latex >}}
|
|
|
|
But using our rules so far, such a thing is impossible, since there is no way for
|
|
\\(\text{Int}\\) to be unified with \\(\text{Bool}\\). We need a more powerful
|
|
set of rules to describe our program's types. One such set of rules is
|
|
the [Hindley-Milner type system](https://en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_system),
|
|
which we have previously alluded to. In fact, the rules we came up
|
|
with were already very close to Hindley-Milner, with the exception of two:
|
|
__generalization__ and __instantiation__. It's been quite a while since the last time we worked on typechecking, so I'm going
|
|
to present a table with these new rules, as well as all of the ones that we previously used. I will also give a quick
|
|
summary of each of these rules.
|
|
|
|
Rule|Name and Description
|
|
-----|-------
|
|
{{< latex >}}
|
|
\frac
|
|
{x:\sigma \in \Gamma}
|
|
{\Gamma \vdash x:\sigma}
|
|
{{< /latex >}}| __Var__: If the variable \\(x\\) is known to have some polymorphic type \\(\\sigma\\) then an expression consisting only of that variable is of that type.
|
|
{{< latex >}}
|
|
\frac
|
|
{\Gamma \vdash e_1 : \tau_1 \rightarrow \tau_2 \quad \Gamma \vdash e_2 : \tau_1}
|
|
{\Gamma \vdash e_1 \; e_2 : \tau_2}
|
|
{{< /latex >}}| __App__: If an expression \\(e\_1\\), which is a function from monomorphic type \\(\\tau\_1\\) to another monomorphic type \\(\\tau\_2\\), is applied to an argument \\(e\_2\\) of type \\(\\tau\_1\\), then the result is of type \\(\\tau\_2\\).
|
|
{{< latex >}}
|
|
\frac
|
|
{\Gamma \vdash e : \tau \quad \text{matcht}(\tau, p_i) = b_i
|
|
\quad \Gamma,b_i \vdash e_i : \tau_c}
|
|
{\Gamma \vdash \text{case} \; e \; \text{of} \;
|
|
\{ (p_1,e_1) \ldots (p_n, e_n) \} : \tau_c }
|
|
{{< /latex >}}| __Case__: This rule is not part of Hindley-Milner, and is specific to our language. If the expression being case-analyzed is of type \\(\\tau\\) and each branch \\((p\_i, e\_i)\\) is of the same type \\(\\tau\_c\\) when the pattern \\(p\_i\\) works with type \\(\\tau\\) producing extra bindings \\(b\_i\\), the whole case expression is of type \\(\\tau\_c\\).
|
|
{{< latex >}}
|
|
\frac{\Gamma \vdash e : \sigma \quad \sigma' \sqsubseteq \sigma}
|
|
{\Gamma \vdash e : \sigma'}
|
|
{{< /latex >}}| __Inst (New)__: If type \\(\\sigma\'\\) is an instantiation of type \\(\\sigma\\) then an expression of type \\(\\sigma\\) is also an expression of type \\(\\sigma\'\\).
|
|
{{< latex >}}
|
|
\frac
|
|
{\Gamma \vdash e : \sigma \quad \alpha \not \in \text{free}(\Gamma)}
|
|
{\Gamma \vdash e : \forall \alpha . \sigma}
|
|
{{< /latex >}}| __Gen (New)__: If an expression has a type with free variables, this rule allows us generalize it to allow all possible types to be used for these free variables.
|
|
|
|
Here, there is a distinction between different forms of types. First, there are
|
|
monomorphic types, or __monotypes__, \\(\\tau\\), which are types such as \\(\\text{Int}\\),
|
|
\\(\\text{Int} \\rightarrow \\text{Bool}\\), \\(a \\rightarrow b\\)
|
|
and so on. These types are what we've been working with so far. Each of them
|
|
represents one (hence, "mono-"), concrete type. This is obvious in the case
|
|
of \\(\\text{Int}\\) and \\(\\text{Int} \\rightarrow \\text{Bool}\\), but
|
|
for \\(a \\rightarrow b\\) things are slightly less clear. Does it really
|
|
represent a single type, if we can put an arbtirary thing for \\(a\\)?
|
|
The answer is "yes"! Although \\(a\\) is not currently known, it stands
|
|
in place of another monotype, which is yet to be determined.
|
|
|
|
So, how do we represent polymorphic types, like that of \\(\\text{if}\\)?
|
|
We borrow some more notation from mathematics, and use the "forall" quantifier:
|
|
|
|
{{< latex >}}
|
|
\text{if} : \forall a \; . \; \text{Bool} \rightarrow a \rightarrow a \rightarrow a
|
|
{{< /latex >}}
|
|
|
|
This basically says, "the type of \\(\\text{if}\\) is \\(\\text{Bool} \\rightarrow a \\rightarrow a \\rightarrow a\\)
|
|
for all possible \\(a\\)". This new expression using "forall" is what we call a type scheme, or a polytype \\(\\sigma\\).
|
|
For simplicity, we only allow "forall" to be at the front of a polytype. That is, expressions like
|
|
\\(a \\rightarrow \\forall b \\; . \\; b \\rightarrow b\\) are not valid polytypes as far as we're concerned.
|
|
|
|
It's key to observe that only some of the typing rules in the above table use monotypes (\\(\\tau\\)). Whereas expressions
|
|
consisting of a single variable can be polymorphically typed, this is not true for function applications and case expressions.
|
|
In fact, according to our rules, there is no way to introduce a polytype anywhere into our system!
|