blog-static/content/blog/10_compiler_polymorphism.md

116 lines
6.1 KiB
Markdown
Raw Normal View History

2019-12-27 23:13:23 -08:00
---
title: Compiling a Functional Language Using C++, Part 10 - Polymorphism
2020-02-29 20:15:37 -08:00
date: 2020-02-29T20:09:37-08:00
2019-12-27 23:13:23 -08:00
tags: ["C and C++", "Functional Languages", "Compilers"]
draft: true
---
2020-02-29 20:15:37 -08:00
[In part 8]({{< relref "08_compiler_llvm.md" >}}), we wrote some pretty interesting programs in our little language.
2019-12-27 23:13:23 -08:00
We successfully expressed arithmetic and recursion. But there's one thing
that we cannot express in our language without further changes: an `if` statement.
Suppose we didn't want to add a special `if/else` expression into our language.
Thanks to lazy evaluation, we can express it using a function:
```
defn if c t e = {
case c of {
True -> { t }
False -> { e }
}
}
```
But an issue still remains: so far, our compiler remains __monomorphic__. That
is, a particular function can only have one possible type for each one of its
arguments. With our current setup, something like this
{{< sidenote "right" "if-note" "would not work:" >}}
In a polymorphically typed language, the inner <code>if</code> would just evaluate to
<code>False</code>, and the whole expression to 3.
{{< /sidenote >}}
```
if (if True False True) 11 3
```
This is because, for this to work, both of the following would need to hold (borrowing
some of our notation from the [typechecking]({{< relref "03_compiler_typechecking.md" >}}) post):
{{< latex >}}
\text{if} : \text{Bool} \rightarrow \text{Int} \rightarrow \text{Int} \rightarrow \text{Int}
{{< /latex >}}
{{< latex >}}
\text{if} : \text{Bool} \rightarrow \text{Bool} \rightarrow \text{Bool} \rightarrow \text{Bool}
{{< /latex >}}
2019-12-27 23:13:23 -08:00
But using our rules so far, such a thing is impossible, since there is no way for
\\(\text{Int}\\) to be unified with \\(\text{Bool}\\). We need a more powerful
set of rules to describe our program's types. One such set of rules is
the [Hindley-Milner type system](https://en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_system),
which we have previously alluded to. In fact, the rules we came up
with were already very close to Hindley-Milner, with the exception of two:
2020-02-29 20:15:37 -08:00
__generalization__ and __instantiation__. It's been quite a while since the last time we worked on typechecking, so I'm going
to present a table with these new rules, as well as all of the ones that we previously used. I will also give a quick
summary of each of these rules.
Rule|Name and Description
-----|-------
{{< latex >}}
\frac
{x:\sigma \in \Gamma}
{\Gamma \vdash x:\sigma}
{{< /latex >}}| __Var__: If the variable \\(x\\) is known to have some polymorphic type \\(\\sigma\\) then an expression consisting only of that variable is of that type.
{{< latex >}}
\frac
{\Gamma \vdash e_1 : \tau_1 \rightarrow \tau_2 \quad \Gamma \vdash e_2 : \tau_1}
{\Gamma \vdash e_1 \; e_2 : \tau_2}
{{< /latex >}}| __App__: If an expression \\(e\_1\\), which is a function from monomorphic type \\(\\tau\_1\\) to another monomorphic type \\(\\tau\_2\\), is applied to an argument \\(e\_2\\) of type \\(\\tau\_1\\), then the result is of type \\(\\tau\_2\\).
{{< latex >}}
\frac
{\Gamma, x:\tau \vdash e : \tau'}
{\Gamma \vdash \lambda x.e : \tau \rightarrow \tau'}
{{< /latex >}}| __Abs__: If the body \\(e\\) of a lambda abstraction \\(\\lambda x.e\\) is of type \\(\\tau\'\\) when \\(x\\) is of type \\(\\tau\\) then the whole lambda abstraction is of type \\(\\tau \\rightarrow \\tau\'\\).
{{< latex >}}
\frac
{\Gamma \vdash e : \tau \quad \text{matcht}(\tau, p_i) = b_i
\quad \Gamma,b_i \vdash e_i : \tau_c}
{\Gamma \vdash \text{case} \; e \; \text{of} \;
\{ (p_1,e_1) \ldots (p_n, e_n) \} : \tau_c }
{{< /latex >}}| __Case__: This rule is not part of Hindley-Milner, and is specific to our language. If the expression being case-analyzed is of type \\(\\tau\\) and each branch \\((p\_i, e\_i)\\) is of the same type \\(\\tau\_c\\) when the pattern \\(p\_i\\) works with type \\(\\tau\\) producing extra bindings \\(b\_i\\), the whole case expression is of type \\(\\tau\_c\\).
{{< latex >}}
\frac{\Gamma \vdash e : \sigma \quad \sigma' \sqsubseteq \sigma}
{\Gamma \vdash e : \sigma'}
{{< /latex >}}| __Inst (New)__: If type \\(\\sigma\'\\) is an instantiation of type \\(\\sigma\\) then an expression of type \\(\\sigma\\) is also an expression of type \\(\\sigma\'\\).
{{< latex >}}
\frac
{\Gamma \vdash e : \sigma \quad \alpha \not \in \text{free}(\Gamma)}
{\Gamma \vdash e : \forall \alpha . \sigma}
{{< /latex >}}| __Gen (New)__: If an expression has a type with free variables, this rule allows us generalize it to allow all possible types to be used for these free variables.
Here, there is a distinction between different forms of types. First, there are
monomorphic types, or __monotypes__, \\(\\tau\\), which are types such as \\(\\text{Int}\\),
\\(\\text{Int} \\rightarrow \\text{Bool}\\), \\(a \\rightarrow b\\)
and so on. These types are what we've been working with so far. Each of them
represents one (hence, "mono-"), concrete type. This is obvious in the case
of \\(\\text{Int}\\) and \\(\\text{Int} \\rightarrow \\text{Bool}\\), but
for \\(a \\rightarrow b\\) things are slightly less clear. Does it really
represent a single type, if we can put an arbtirary thing for \\(a\\)?
The answer is "yes"! Although \\(a\\) is not currently known, it stands
in place of another monotype, which is yet to be determined.
So, how do we represent polymorphic types, like that of \\(\\text{if}\\)?
We borrow some more notation from mathematics, and use the "forall" quantifier:
{{< latex >}}
\text{if} : \forall a \; . \; \text{Bool} \rightarrow a \rightarrow a \rightarrow a
{{< /latex >}}
This basically says, "the type of \\(\\text{if}\\) is \\(\\text{Bool} \\rightarrow a \\rightarrow a \\rightarrow a\\)
for all possible \\(a\\)". This new expression using "forall" is what we call a type scheme, or a polytype \\(\\sigma\\).
For simplicity, we only allow "forall" to be at the front of a polytype. That is, expressions like
\\(a \\rightarrow \\forall b \\; . \\; b \\rightarrow b\\) are not valid polytypes as far as we're concerned.
It's key to observe that the majority of the typing rules in the above table use monotypes (\\(\\tau\\)). Only
two rules seem to mention \\(\\sigma\\): __Inst__ and __Gen__.