blog-static/10_compiler_polymorphism.md at e3834ed6ea6ef32e537321093265505093a7e078

23 KiB

Raw Blame History

title

date

Hindley-Milner Type System

One such powerful set of rules is the Hindley-Milner type system, which we have previously alluded to. In fact, the rules we came up with were already very close to Hindley-Milner, with the exception of two: generalization and instantiation. It's been quite a while since the last time we worked on typechecking, so I'm going to present a table with these new rules, as well as all of the ones that we previously used. I will also give a quick summary of each of these rules.

Rule	Name and Description
{{< latex >}}
\frac
{x:\sigma \in \Gamma}
{\Gamma \vdash x:\sigma}
{{< /latex >}}	Var: If the variable \(x\) is known to have some polymorphic type \(\sigma\) then an expression consisting only of that variable is of that type.
{{< latex >}}
\frac
{\Gamma \vdash e_1 : \tau_1 \rightarrow \tau_2 \quad \Gamma \vdash e_2 : \tau_1}
{\Gamma \vdash e_1 ; e_2 : \tau_2}
{{< /latex >}}	App: If an expression \(e_1\), which is a function from monomorphic type \(\tau_1\) to another monomorphic type \(\tau_2\), is applied to an argument \(e_2\) of type \(\tau_1\), then the result is of type \(\tau_2\).
{{< latex >}}
\frac
{\Gamma \vdash e : \tau \quad \text{matcht}(\tau, p_i) = b_i
\quad \Gamma,b_i \vdash e_i : \tau_c}
{\Gamma \vdash \text{case} ; e ; \text{of} ;
{ (p_1,e_1) \ldots (p_n, e_n) } : \tau_c }
{{< /latex >}}	Case: This rule is not part of Hindley-Milner, and is specific to our language. If the expression being case-analyzed is of type \(\tau\) and each branch \((p_i, e_i)\) is of the same type \(\tau_c\) when the pattern \(p_i\) works with type \(\tau\) producing extra bindings \(b_i\), the whole case expression is of type \(\tau_c\).
{{< latex >}}
\frac{\Gamma \vdash e : \sigma' \quad \sigma' \sqsubseteq \sigma}
{\Gamma \vdash e : \sigma}
{{< /latex >}}	Inst (New): If type \(\sigma\) is an instantiation (or specialization) of type \(\sigma'\) then an expression of type \(\sigma'\) is also an expression of type \(\sigma\).
{{< latex >}}
\frac
{\Gamma \vdash e : \sigma \quad \alpha \not \in \text{free}(\Gamma)}
{\Gamma \vdash e : \forall \alpha . \sigma}
{{< /latex >}}	Gen (New): If an expression has a type with free variables, this rule allows us generalize it to allow all possible types to be used for these free variables.

Here, there is a distinction between different forms of types. First, there are monomorphic types, or monotypes, \(\tau\), which are types such as \(\text{Int}\), \(\text{Int} \rightarrow \text{Bool}\), \(a \rightarrow b\) and so on. These types are what we've been working with so far. Each of them represents one (hence, "mono-"), concrete type. This is obvious in the case of \(\text{Int}\) and \(\text{Int} \rightarrow \text{Bool}\), but for \(a \rightarrow b\) things are slightly less clear. Does it really represent a single type, if we can put an arbtirary thing for \(a\)? The answer is "yes"! Although \(a\) is not currently known, it stands in place of another monotype, which is yet to be determined.

So, how do we represent polymorphic types, like that of \(\text{if}\)? We borrow some more notation from mathematics, and use the "forall" quantifier:

{{< latex >}} \text{if} : \forall a ; . ; \text{Bool} \rightarrow a \rightarrow a \rightarrow a {{< /latex >}}

This basically says, "the type of \(\text{if}\) is \(\text{Bool} \rightarrow a \rightarrow a \rightarrow a\) for all possible \(a\)". This new expression using "forall" is what we call a type scheme, or a polytype \(\sigma\). For simplicity, we only allow "forall" to be at the front of a polytype. That is, expressions like \(a \rightarrow \forall b \; . \; b \rightarrow b\) are not valid polytypes as far as we're concerned.

It's key to observe that only some of the typing rules in the above table use polytypes (\(\sigma\)). Whereas expressions consisting of a single variable can be polymorphically typed, this is not true for function applications and case expressions. In fact, according to our rules, there is no way to introduce a polytype anywhere into our system!

The reason for this is that we only allow polymorphism at certain locations. In the Hindley-Milner type system, this is called Let-Polymorphism, which means that only in let/in expressions can variables or expressions be given a polymorphic type. We, on the other hand, do not (yet) have let/in expressions, so our polymorphism is further limited: only functions (and data type constructors) can be polymorphically typed.

Let's talk about what Inst does, and what "\(\sqsubseteq\)" means. First, why don't we go ahead and write the formal inference rule for \(\sqsubseteq\):

{{< latex >}} \frac {\tau'={\alpha_i \mapsto \tau_i }\tau \quad \beta_i \not \in \text{free}(\forall \alpha_1...\forall \alpha_n ; . ; \tau)} {\forall \alpha_1 ... \forall \alpha_n ; . ; \tau \sqsubseteq \forall \beta_1 ... \forall \beta_m ; . ; \tau'} {{< /latex >}}

In my opinion, this is one of the more confusing inference rules we have to deal with in Hindley-Milner. In principle, though, it's not too hard to understand. \(\sigma' \sqsubseteq \sigma\) says "\(\sigma'\) is more general than \(\sigma\)". Alternatively, we can interpret it as "\(\sigma\) is an instance of \(\sigma'\)".

What does it mean for one polytype to be more general than another? Intuitively, we might say that \(\forall a \; . \; a \rightarrow a\) is more general than \(\text{Int} \rightarrow \text{Int}\), because the former type can represent the latter, and more. Formally, we define this in terms of substitutions. A substitution \(\{\alpha \mapsto \tau \}\) replaces a variable \(\alpha\) with a monotype \(\tau\). If we can use a substitution to convert one type into another, then the first type (the one on which the substitution was performed) is more general than the resulting type. In our previous example, since we can apply the substitution \(\{a \mapsto \text{Int}\}\) to get \(\text{Int} \rightarrow \text{Int}\) from \(\forall a \; . \; a \rightarrow a\), the latter type is more general; using our notation, \(\forall a \; . \; a \rightarrow a \sqsubseteq \text{Int} \rightarrow \text{Int}\).

That's pretty much all that the rule says, really. But what about the whole thing with \(\beta\) and \(\text{free}\)? The reason for that part of the rule is that, in principle, we can substitute polytypes into polytypes. However, we can't do this using \(\{ \alpha \mapsto \sigma \}\). Consider, for example:

{{< latex >}} { a \mapsto \forall b ; . ; b \rightarrow b } ; \text{Bool} \rightarrow a \rightarrow a \stackrel{?}{=} \text{Bool} \rightarrow (\forall b ; . ; b \rightarrow b) \rightarrow \forall b ; . ; b \rightarrow b {{< /latex >}}

Hmm, this is not good. Didn't we agree that we only want quantifiers at the front of our types? Instead, to make that substitution, we only substitute the monotype \(b \rightarrow b\), and then add the \(\forall b\) at the front. But to do this, we must make sure that \(b\) doesn't occur anywhere in the original type \(\forall a \; . \; \text{Bool} \rightarrow a \rightarrow a\) (otherwise we can accidentally generalize too much). So then, our concrete inference rule is as follows:

{{< latex >}} \frac { \begin{gathered} \text{Bool} \rightarrow (b \rightarrow b) \rightarrow b \rightarrow b ={a \mapsto (b \rightarrow b) } ; \text{Bool} \rightarrow a \rightarrow a \ b \not \in \text{free}(\forall a ; . ; \text{Bool} \rightarrow a \rightarrow a) = \varnothing \end{gathered} } {\forall a ; . ; \text{Bool} \rightarrow a \rightarrow a \sqsubseteq \forall b ; . ; \text{Bool} \rightarrow (b \rightarrow b) \rightarrow b \rightarrow b} {{< /latex >}}

In the above rule we:

Replaced \(a\) with \(b \rightarrow b\), getting rid of \(a\) in the quantifier.
Observed that \(b\) is not a free variable in the original type, and thus can be generalized.
Added the generalization for \(b\) to the front of the resulting type.

Now, Inst just allows us to perform specialization / substitution as many times as we need to get to the type we want.

A New Typechecking Algorithm

Alright, now we have all these rules. How does this change our typechecking algorithm? How about the following:

To every declared function, assign the type \(a \rightarrow ... \rightarrow y \rightarrow z\), where {{< sidenote "right" "arguments-note" "a through y are the types of the arguments to the function" >}} Of course, there can be more or less than 25 arguments to any function. This is just a generalization; we use as many input types as are needed. {{< /sidenote >}}, and \(z\) is the function's return type.
We typecheck each declared function, using the Var, Case, App, and Inst rules.
Whatever type variables we don't fill in, we assume can be filled in with any type, and use the Gen rule to sprinkle polymorphism where it is needed.

Maybe this is enough. Let's go through an example. Suppose we have three functions:

defn if c t e = {
    case c of {
        True -> { t }
        False -> { e }
    }
}
defn testOne = { if True False True }
defn testTwo = { if True 0 1 }

If we go through and typecheck them top-to-bottom, the following happens:

We start by assuming \(\text{if} : a \rightarrow b \rightarrow c \rightarrow d\), \(\text{testOne} : e\) and \(\text{testTwo} : f\).
We look at if. We infer the type of c to be \(\text{Bool}\), and thus, \(a = \text{Bool}\). We also infer that \(b = c\), since they occur in two branches of the same case expression. Finally, we infer that \(c = d\), since whatever the case expression returns becomes the return value of the function. Thus, we come out knowing that \(\text{if} : \text{Bool} \rightarrow b \rightarrow b \rightarrow b\).
Now, since we never figured out \(b\), we use Gen: \(\text{if} : \forall b \; . \; \text{Bool} \rightarrow b \rightarrow b \rightarrow b\). Like we'd want, if works with all types, as long as both its inputs are of the same type.
When we typecheck the body of testOne, we use Inst to turn the above type for if into a single, monomorphic instance. Then, type inference proceeds as before, and all is well.
When we typecheck the body of testTwo, we use Inst again, instantiating a new monotype, and all is well again.

So far, so good. But what if we started from the bottom, and went to the top?

We start by assuming \(\text{if} : a \rightarrow b \rightarrow c \rightarrow d\), \(\text{testOne} : e\) and \(\text{testTwo} : f\).
We look at testTwo. We infer that \(a = \text{Bool}\) (since we pass in True to if). We also infer that \(b = \text{Int}\), and that \(c = \text{Int}\). Not yet sure of the return type of if, this is where we stop. We are left with the knowledge that \(f = d\) (because the return type of if is the return type of testTwo), but that's about it. Since \(f\) is no longer free, we don't generalize, and conclude that \(\text{testTwo} : f\).
We look at testOne. We infer that \(a = \text{Bool}\) (already known). We also infer that \(b = \text{Bool}\), and that \(c = \text{Bool}\). But wait a minute! This is not right. We are back to where we started, with a unification error!

What went wrong? I claim that we typechecked the functions that used if before we typechecked if itself, which led us to infer a less-than-general type for if. This less-than-general type was insufficient to correctly check the whole program.

To address this, we enforce a particular order of type inference on our declaration, guided by dependencies between functions. Haskell, which has to deal with a similar issue, has a section in the 2010 report on this. In short:

We find the {{< sidenote "right" "transitive-closure-note" "transitive closure" >}} A transitive closure of a vertex in a graph is the list of all vertices reachable from that original vertex. Check out the Wikipedia page on this. {{< /sidenote >}} of the function dependencies. We define a function \(f\) to be dependent on another function \(g\) if \(f\) calls \(g\). The transitive closure will help us find functions that are related indirectly. For instance, if \(g\) also depends on \(h\), then the transitive closure of \(f\) will include \(h\), even if \(f\) directly doesn't use \(h\).
We isolate groups of mutually dependent functions. If \(f\) depends on \(g\) and \(g\) depends \(f\), they are placed in one group. We then construct a dependency graph of these groups.
We compute a topological order of the group graph. This helps us typecheck the dependencies of functions before checking the functions themselves. In our specific case, this would ensure we check if first, and only then move on to testOne and testTwo. The order of typechecking within a group does not matter.
We typecheck the function groups, and functions within them, following the above topological order.

To find the transitive closure of a graph, we can use Warshall's Algorithm. This algorithm, with complexity \(O(|V|^3)\), goes as follows: {{< latex >}} \begin{aligned} & A, R^{(i)} \in \mathbb{B}^{n \times n} \ & \ & R^{(0)} \leftarrow A \ & \textbf{for} ; k \leftarrow 1 ; \textbf{to} ; n ; \textbf{do} \ & \quad \textbf{for} ; i \leftarrow 1 ; \textbf{to} ; n ; \textbf{do} \ & \quad \quad \textbf{for} ; j \leftarrow 1 ; \textbf{to} ; n ; \textbf{do} \ & \quad \quad \quad R^{(k)}[i,j] \leftarrow R^{(k-1)}[i,j] ; \textbf{or} ; R^{(k-1)}[i,k] ; \textbf{and} ; R^{(k-1)}[k,j] \ & \textbf{return} ; R^{(n)} \end{aligned} {{< /latex >}}

In the above notation, \(R^{(i)}\) is the \(i\)th matrix \(R\), and \(A\) is the adjacency matrix of the graph in question. All matrices in the algorithm are from \(\mathbb{B}^{n \times n}\), the set of \(n\) by \(n\) boolean matrices. Once this algorithm is complete, we get as output a transitive closure adjacency matrix \(R^{(n)}\). Mutually dependent functions will be pretty easy to isolate from this matrix. If \(R^{(n)}[i,j]\) and \(R^{(n)}[j,i]\), then the functions represented by vertices \(i\) and \(j\) depend on each other.

Once we've identified the groups, and {{< sidenote "right" "group-graph-note" "constructed a group graph," >}} This might seem like a "draw the rest of the owl" situation, but it really isn't. We'll follow a naive algorithm for findings groups, and for translating function dependencies into group dependencies. This algorithm, in C++, will be presented later on. {{< /sidenote >}} it is time to compute the topological order. For this, we will use Kahn's Algorithm. The algorithm goes as follows:

{{< latex >}} \begin{aligned} & L \leftarrow \text{empty list} \ & S \leftarrow \text{set of all nodes with no incoming edges} \ & \ & \textbf{while} ; S ; \text{is non-empty} ; \textbf{do} \ & \quad \text{remove a node} ; n ; \text{from} ; S \ & \quad \text{add} ; n ; \text{to the end of} ; L \ & \quad \textbf{for each} ; \text{node} ; m ; \text{with edge} ; e ; \text{from} ; n ; \text{to} ; m ; \textbf{do} \ & \quad \quad \text{remove edge} ; e ; \text{from the graph} \ & \quad \quad \textbf{if} ; m ; \text{has no other incoming edges} ; \textbf{then} \ & \quad \quad \quad \text{insert} ; m ; \text{into} ; S \ & \ & \textbf{if} ; \text{the graph has edges} ; \textbf{then} \ & \quad \textbf{return} ; \text{error} \quad \textit{(graph has at least once cycle)} \ & \textbf{else} \ & \quad \textbf{return} ; L \quad \textit{(a topologically sorted order)} \end{aligned} {{< /latex >}}

Note that since we've already isolated all mutually dependent functions into groups, our graph will never have cycles, and this algorithm will always succeed. Also note that since we start with nodes with no incoming edges, our list will begin with the groups that should be checked last. This is because a node with no incoming edges might (and probably does) still have outgoing edges, and thus depends on other functions / groups. Like in our successful example, we want to typecheck functions that are depended on first.

Implementation

Let's start working on a C++ implementation of all of this now. First, I think that we should create a C++ class that will represent our function dependency graph. Let's call it function_graph. I propose the following definition:

There's a lot to unpack here. First of all, we create a type alias function that represents the label of a function in our graph. It is probably most convenient to work with std::string instances, so we settle for that. Next, we define a struct that will represent a single group of mutually dependent functions. Passing this struct by value seems wrong, so we'll settle for a C++ unique_pt to help carry instances around.

Finally, we arrive at the definition of function_graph. Inside this class, we define a helper struct, group_data, which holds information about an individual group as it is being constructed. This information includes the group's adjacency list and indegree (both used for Kahn's topological sorting algorithm), as well as the set of functions in the group (which we will eventually return).

The adjacency_lists and edges fields are the meat of the graph representation. Both of the variables provide a different view of the same graph: adjacency_lists associates with every function a list of functions it depends on, while edges holds a set of tuples describing edges in the graph. Having more than one representation makes it more convenient for us to perform different operations on our graphs.

Next up are some internal methods that perform the various steps we described above:

compute_transitive_edges applies Warshall's algorithm to find the graph's transitive closure.
create_groups creates two mappings, one from functions to their respective groups' IDs, and one from group IDs to information about the corresponding groups. This step is largely used to determine which functions belong to the same group, and as such, uses the set of transitive edges generated by compute_transitive_edges.
create_edges creates edges between groups. During this step, the indegrees of each group are computed, as well as their adjacency lists.
generate_order uses the indegrees and adjacency lists produced in the prior step to establish a topological order.

Finally, the add_edge method is used to add a new dependency between two functions, while the compute_order method uses the internal methods described above to convert the function dependency graph into a properly ordered list of groups.

Let's start by looking at how to implement the internal methods. compute_transitive_edges is a very straightforward implementation of Warshall's:

Next is create_groups, for each function, we iterate over all other functions. If the other function is mutually dependent with the first function, we add it to the same group. In the outer loop, we skip over functions that have already been added to the group. This is because {{< sidenote "right" "equivalence-note" "mutual dependence" >}} There is actually a slight difference between "mutual dependence" the way we defined it and "being in the same group", and it lies in the symmetric property of an equivalence relation. We defined a function to depend on another function if it calls that other function. Then, a recursive function depends on itself, but a non-recursive function does not, and therefore does not satisfy the symmetric property. However, as far as we're concerned, a function should be in a group with itself even if it's not recursive. Thus, the real equivalence relation we use is "in the same group as", and consists of "mutual dependence" extended with symmetry. {{< /sidenote >}} is an equivalence relation, which means that if we already added a function to a group, all its group members were also already visited and added.

Once groups have been created, we use their functions' edges to create edges for the groups themselves, using create_edges. We avoid creating edges from a group to itself, to avoid unnecessary cycles. While constructing the edges, we also increment the relevant indegree counter.

Finally, we apply Kahn's algorithm to create a topological order in generate_order:

These four steps are used in compute_order:

Finally, add_edge straightforwardly adds an edge to the graph:

With this, we can now properly order our typechecking. However, there are a few pieces of the puzzle missing. First of all, we need to actually insert function dependencies into the graph. Second, we need to think about how our existing language features and implementation will interact with polymorphism. Third, we have to come up with an implementation of polymorphic data types.

23 KiB Raw Blame History

Hindley-Milner Type System

A New Typechecking Algorithm

Implementation

23 KiB

Raw Blame History