diff --git a/content/blog/10_compiler_polymorphism.md b/content/blog/10_compiler_polymorphism.md index 9c2ac66..aa8f9dc 100644 --- a/content/blog/10_compiler_polymorphism.md +++ b/content/blog/10_compiler_polymorphism.md @@ -46,8 +46,11 @@ some of our notation from the [typechecking]({{< relref "03_compiler_typecheckin But using our rules so far, such a thing is impossible, since there is no way for \\(\text{Int}\\) to be unified with \\(\text{Bool}\\). We need a more powerful -set of rules to describe our program's types. One such set of rules is -the [Hindley-Milner type system](https://en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_system), +set of rules to describe our program's types. + + +### Hindley-Milner Type System +One such powerful set of rules is the [Hindley-Milner type system](https://en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_system), which we have previously alluded to. In fact, the rules we came up with were already very close to Hindley-Milner, with the exception of two: __generalization__ and __instantiation__. It's been quite a while since the last time we worked on typechecking, so I'm going @@ -74,9 +77,9 @@ Rule|Name and Description \{ (p_1,e_1) \ldots (p_n, e_n) \} : \tau_c } {{< /latex >}}| __Case__: This rule is not part of Hindley-Milner, and is specific to our language. If the expression being case-analyzed is of type \\(\\tau\\) and each branch \\((p\_i, e\_i)\\) is of the same type \\(\\tau\_c\\) when the pattern \\(p\_i\\) works with type \\(\\tau\\) producing extra bindings \\(b\_i\\), the whole case expression is of type \\(\\tau\_c\\). {{< latex >}} -\frac{\Gamma \vdash e : \sigma \quad \sigma' \sqsubseteq \sigma} - {\Gamma \vdash e : \sigma'} -{{< /latex >}}| __Inst (New)__: If type \\(\\sigma\'\\) is an instantiation of type \\(\\sigma\\) then an expression of type \\(\\sigma\\) is also an expression of type \\(\\sigma\'\\). +\frac{\Gamma \vdash e : \sigma' \quad \sigma' \sqsubseteq \sigma} + {\Gamma \vdash e : \sigma} +{{< /latex >}}| __Inst (New)__: If type \\(\\sigma\\) is an instantiation (or specialization) of type \\(\\sigma\'\\) then an expression of type \\(\\sigma\'\\) is also an expression of type \\(\\sigma\\). {{< latex >}} \frac {\Gamma \vdash e : \sigma \quad \alpha \not \in \text{free}(\Gamma)} @@ -106,6 +109,155 @@ for all possible \\(a\\)". This new expression using "forall" is what we call a For simplicity, we only allow "forall" to be at the front of a polytype. That is, expressions like \\(a \\rightarrow \\forall b \\; . \\; b \\rightarrow b\\) are not valid polytypes as far as we're concerned. -It's key to observe that only some of the typing rules in the above table use monotypes (\\(\\tau\\)). Whereas expressions +It's key to observe that only some of the typing rules in the above table use polytypes (\\(\\sigma\\)). Whereas expressions consisting of a single variable can be polymorphically typed, this is not true for function applications and case expressions. In fact, according to our rules, there is no way to introduce a polytype anywhere into our system! + +The reason for this is that we only allow polymorphism at certain locations. In the Hindley-Milner type system, +this is called __Let-Polymorphism__, which means that only in `let`/`in` expressions can variables or expressions +be given a polymorphic type. We, on the other hand, do not (yet) have `let`/`in` expressions, so our polymorphism +is further limited: only functions (and data type constructors) can be polymorphically typed. + +Let's talk about what __Inst__ does, and what "\\(\\sqsubseteq\\)" means. +First, why don't we go ahead and write the formal inference rule for \\(\\sqsubseteq\\): + +{{< latex >}} +\frac + {\tau'=\{\alpha_i \mapsto \tau_i \}\tau \quad \beta_i \not \in \text{free}(\forall \alpha_1...\forall \alpha_n \; . \; \tau)} + {\forall \alpha_1 ... \forall \alpha_n \; . \; \tau \sqsubseteq \forall \beta_1 ... \forall \beta_m \; . \; \tau'} +{{< /latex >}} + +In my opinion, this is one of the more confusing inference rules we have to deal with in Hindley-Milner. +In principle, though, it's not too hard to understand. \\(\\sigma\' \\sqsubseteq \\sigma\\) says "\\(\\sigma\'\\) +is more general than \\(\\sigma\\)". Alternatively, we can interpret it as "\\(\\sigma\\) is an instance of \\(\\sigma\'\\)". + +What does it mean for one polytype to be more general than another? Intuitively, we might say that \\(\forall a \\; . \\; a \\rightarrow a\\) is +more general than \\(\\text{Int} \\rightarrow \\text{Int}\\), because the former type can represent the latter, and more. Formally, +we define this in terms of __substitutions__. A substitution \\(\\{\\alpha \\mapsto \\tau \\}\\) replaces a variable +\\(\\alpha\\) with a monotype \\(\\tau\\). If we can use a substitution to convert one type into another, then the first +type (the one on which the substitution was performed) is more general than the resulting type. In our previous example, +since we can apply the substitution \\(\\{a \\mapsto \\text{Int}\\}\\) to get \\(\\text{Int} \\rightarrow \\text{Int}\\) +from \\(\\forall a \\; . \\; a \\rightarrow a\\), the latter type is more general; using our notation, +\\(\\forall a \\; . \\; a \\rightarrow a \\sqsubseteq \\text{Int} \\rightarrow \\text{Int}\\). + +That's pretty much all that the rule says, really. But what about the whole thing with \\(\\beta\\) and \\(\\text{free}\\)? The reason +for that part of the rule is that, in principle, we can substitute polytypes into polytypes. However, we can't +do this using \\(\\{ \\alpha \\mapsto \\sigma \\}\\). Consider, for example: + +{{< latex >}} +\{ a \mapsto \forall b \; . \; b \rightarrow b \} \; \text{Bool} \rightarrow a \rightarrow a \stackrel{?}{=} +\text{Bool} \rightarrow (\forall b \; . \; b \rightarrow b) \rightarrow \forall b \; . \; b \rightarrow b +{{< /latex >}} + +Hmm, this is not good. Didn't we agree that we only want quantifiers at the front of our types? Instead, to make that substitution, +we only substitute the monotype \\(b \\rightarrow b\\), and then add the \\(\\forall b\\) at the front. But +to do this, we must make sure that \\(b\\) doesn't occur anywhere in the original type +\\(\forall a \\; . \\; \\text{Bool} \\rightarrow a \\rightarrow a\\) (otherwise we can accidentally generalize +too much). So then, our concrete inference rule is as follows: + +{{< latex >}} +\frac + { + \begin{gathered} + \text{Bool} \rightarrow (b \rightarrow b) \rightarrow b \rightarrow b =\{a \mapsto (b \rightarrow b) \} \; \text{Bool} \rightarrow a \rightarrow a \\ + b \not \in \text{free}(\forall a \; . \; \text{Bool} \rightarrow a \rightarrow a) = \varnothing + \end{gathered} + } + {\forall a \; . \; \text{Bool} \rightarrow a \rightarrow a \sqsubseteq \forall b \; . \; \text{Bool} \rightarrow (b \rightarrow b) \rightarrow b \rightarrow b} +{{< /latex >}} + +In the above rule we: +1. Replaced \\(a\\) with \\(b \\rightarrow b\\), getting rid of \\(a\\) in the quantifier. +2. Observed that \\(b\\) is not a free variable in the original type, and thus can be generalized. +3. Added the generalization for \\(b\\) to the front of the resulting type. + +Now, __Inst__ just allows us to perform specialization / substitution as many times +as we need to get to the type we want. + +### A New Typechecking Algorithm + +Alright, now we have all these rules. How does this change our typechecking algorithm? +How about the following: + +1. To every declared function, assign the type \\(a \\rightarrow ... \\rightarrow y \\rightarrow z\\), +where +{{< sidenote "right" "arguments-note" "\(a\) through \(y\) are the types of the arguments to the function" >}} +Of course, there can be more or less than 25 arguments to any function. This is just a generalization; +we use as many input types as are needed. +{{< /sidenote >}}, and \\(z\\) is the function's +return type. +2. We typecheck each declared function, using the __Var__, __Case__, __App__, and __Inst__ rules. +3. Whatever type variables we don't fill in, we assume can be filled in with any type, +and use the __Gen__ rule to sprinkle polymorphism where it is needed. + +Maybe this is enough. Let's go through an example. Suppose we have three functions: + +``` +defn if c t e = { + case c of { + True -> { t } + False -> { e } + } +} +defn testOne = { if True False True } +defn testTwo = { if True 0 1 } +``` + +If we go through and type check them top-to-bottom, the following happens: + +1. We start by assuming \\(\\text{if} : a \\rightarrow b \\rightarrow c \\rightarrow d\\), +\\(\\text{testOne} : e\\) and \\(\\text{testTwo} : f\\). +2. We look at `if`. We infer the type of `c` to be \\(\\text{Bool}\\), and thus, \\(a = \\text{Bool}\\). +We also infer that \\(b = c\\), since they occur in two branches of the same case expression. +Finally, we infer that \\(c = d\\), since whatever the case expression returns becomes the return +value of the function. Thus, we come out knowing that \\(\\text{if} : \\text{Bool} \\rightarrow b +\\rightarrow b \\rightarrow b\\). +3. Now, since we never figured out \\(b\\), we use __Gen__: \\(\\text{if} : \\forall b \\; . \\; +\\text{Bool} \\rightarrow b \\rightarrow b \\rightarrow b\\). Like we'd want, `if` works with +all types, as long as both its inputs are of the same type. +4. When we typecheck the body of `testOne`, we use __Inst__ to turn the above type for `if` +into a single, monomorphic instance. Then, type inference proceeds as before, and all is well. +5. When we typecheck the body of `testTwo`, we use __Inst__ again, instantiating a new monotype, +and all is well again. + +So far, so good. But what if we started from the bottom, and went to the top? + +1. We start by assuming \\(\\text{if} : a \\rightarrow b \\rightarrow c \\rightarrow d\\), +\\(\\text{testOne} : e\\) and \\(\\text{testTwo} : f\\). +2. We look at `testTwo`. We infer that \\(a = \\text{Bool}\\) (since +we pass in `True` to `if`). We also infer that \\(b = \\text{Int}\\), and that \\(c = \\text{Int}\\). +Not yet sure of the return type of `if`, this is where we stop. We are left with +the knowledge that \\(f = d\\) (because the return type of `if` is the return type of `testTwo`), +but that's about it. Since \\(f\\) is no longer free, we don't generalize, and conclude that \\(\text{testTwo} : f\\). +3. We look at `testOne`. We infer that \\(a = \\text{Bool}\\) (already known). We also infer +that \\(b = \\text{Bool}\\), and that \\(c = \\text{Bool}\\). But wait a minute! This is not right. +We are back to where we started, with a unification error! + +What went wrong? I claim that we typechecked the functions that _used_ `if` before we typechecked `if` itself, +which led us to infer a less-than-general type for `if`. This less-than-general type was insufficient to +correctly check the whole program. + +To address this, we enforce a particular order of type inference on our declaration, guided by dependencies +between functions. Haskell, which has to deal with a similar issue, has +[a section in the 2010 report on this](https://www.haskell.org/onlinereport/haskell2010/haskellch4.html#x10-880004.5). +In short: + +1. We find the +{{< sidenote "right" "transitive-closure-note" "transitive closure" >}} +A transitive closure of a vertex in a graph is the list of all vertices reachable +from that original vertex. Check out the +Wikipedia page on this. +{{< /sidenote >}} +of the function dependencies. We define a function \\(f\\) to be dependent on another function \\(g\\) +if \\(f\\) calls \\(g\\). The transitive closure will help us find functions that are related indirectly. +For instance, if \\(g\\) also depends on \\(h\\), then the transitive closure of \\(f\\) will +include \\(h\\), even if \\(f\\) directly doesn't use \\(h\\). +2. We isolate groups of mutually dependent functions. If \\(f\\) depends on \\(g\\) and \\(g\\) depends \\(f\\), +they are placed in one group. We then construct a dependency graph __of these groups__. +3. We compute a topological order of the group graph. This helps us typecheck the dependencies +of functions before checking the functions themselves. In our specific case, this would ensure +we check `if` first, and only then move on to `testOne` and `testTwo`. The order of typechecking +within a group does not matter. +4. We typecheck the function groups, and functions within them, following the above topological order. + +To find the transitive closure of a graph, we can use [Warshall's Algorithm](https://cs.winona.edu/lin/cs440/ch08-2.pdf).