diff --git a/content/blog/10_compiler_polymorphism.md b/content/blog/10_compiler_polymorphism.md index 744346c..53d9070 100644 --- a/content/blog/10_compiler_polymorphism.md +++ b/content/blog/10_compiler_polymorphism.md @@ -1,8 +1,7 @@ --- title: Compiling a Functional Language Using C++, Part 10 - Polymorphism -date: 2020-02-29T20:09:37-08:00 +date: 2020-03-25T17:14:20-07:00 tags: ["C and C++", "Functional Languages", "Compilers"] -draft: true --- [In part 8]({{< relref "08_compiler_llvm.md" >}}), we wrote some pretty interesting programs in our little language. @@ -54,7 +53,13 @@ One such powerful set of rules is the [Hindley-Milner type system](https://en.wi which we have previously alluded to. In fact, the rules we came up with were already very close to Hindley-Milner, with the exception of two: __generalization__ and __instantiation__. It's been quite a while since the last time we worked on typechecking, so I'm going -to present a table with these new rules, as well as all of the ones that we previously used. I will also give a quick +to present a table with these new rules, as well as all of the ones that we +{{< sidenote "right" "rules-note" "previously used." >}} +The rules aren't quite the same as the ones we used earlier; +note that \(\sigma\) is used in place of \(\tau\) in the first rule, +for instance. These changes are slight, and we'll talk about how the +rules work together below. +{{< /sidenote >}} I will also give a quick summary of each of these rules. Rule|Name and Description @@ -181,10 +186,10 @@ How about the following: 1. To every declared function, assign the type \\(a \\rightarrow ... \\rightarrow y \\rightarrow z\\), where -{{< sidenote "right" "arguments-note" "\(a\) through \(y\) are the types of the arguments to the function" >}} +{{< sidenote "right" "arguments-note" "\(a\) through \(y\) are the types of the arguments to the function," >}} Of course, there can be more or less than 25 arguments to any function. This is just a generalization; we use as many input types as are needed. -{{< /sidenote >}}, and \\(z\\) is the function's +{{< /sidenote >}} and \\(z\\) is the function's return type. 2. We typecheck each declared function, using the __Var__, __Case__, __App__, and __Inst__ rules. 3. Whatever type variables we don't fill in, we assume can be filled in with any type, @@ -367,8 +372,8 @@ to establish a topological order. Following these, we have three public function definitions: * `add_function` adds a vertex to the graph. Sometimes, a function does not reference any other functions, and would not appear in the list of edges. -We will call this function to make sure that the function graph is aware -of such functions. For convenience, this function returns the adjacency list +We will call `add_function` to make sure that the function graph is aware +of such independent functions. For convenience, `add_function` returns the adjacency list of the added function. * `add_edge` adds a new dependency between two functions. * `compute_order` method uses the internal methods described above to convert @@ -403,7 +408,7 @@ group members were also already visited and added. Once groups have been created, we use their functions' edges to create edges for the groups themselves, using `create_edges`. -We avoid creating edges from a group to itself, to avoid +We avoid creating edges from a group to itself, to prevent unnecessary cycles. While constructing the edges, we also increment the relevant indegree counter. @@ -540,7 +545,7 @@ tree node has a `type_env_ptr`. Furthermore, `typecheck` should no longer call Don't worry about `instantiate` for now; that's coming up. Similarly to `ast_lid`, `ast_case::typecheck` will no longer introduce new bindings, -and unify instead: +but unify existing types via the `pattern`: {{< codelines "C++" "compiler/10/ast.cpp" 152 169 >}} @@ -572,7 +577,16 @@ steps: it refers to "known" types. Add valid constructors to the global environment as functions. We don't currently verify that types are "known"; A user could declare a list of `Floobs`, -and never say what a `Floob` is. This isn't too big of an issue (good luck constructing +and never say what a `Floob` is. +{{< sidenote "right" "known-type-note" "This isn't too big of an issue" >}} +Curiously, this flaw did lead to some valid programs being rejected. Since +we had no notion of a "known" type, whenever data type constructors +were created, every argument type was marked a "base" type; +see +this line if you're curious. +This would cause pattern matching to fail on the tail of a list, with +the "attempt to pattern match on non-data argument" error. +{{< /sidenote >}}(good luck constructing a value of a non-existent type), but a mature compiler should prevent this from happening. On the other hand, here are the steps for function definitions: @@ -663,7 +677,7 @@ The separation of data and function definitions must be reconciled with code going back as far as the parser. While previously, we populated a single, global vector of definitions called `program`, we can no longer do that. Instead, we'll split our program into two maps, one for data types and one for functions. We -use maps for convenience: since the groups generated by our function graph refer +use maps for convenience: the groups generated by our function graph refer to functions by name, and it would be nice to quickly look up the data the names refer to. Rather than returning such maps, we change our semantic actions to simply insert new data into one of two global maps. Below @@ -729,9 +743,9 @@ possibility that the variable has a polymorphic type, which needs to be speciali (potentially differently in every occurrence of the variable). When talking about our new typechecking algorithm, we mentioned using __Gen__ to sprinkle -polymorphism wherever possible. Whenever possible, __Gen__ will add free variables +polymorphism into our program. If it can, __Gen__ will add free variables in a type to the "forall" quantifier at the front, making that type polymorphic. -We implement this using a new `generalize` added to the `type_env`, which (as per +We implement this using a new `generalize` method added to the `type_env`, which (as per convention) generalizes the type of a given variable as much as possible: {{< codelines "C++" "compiler/10/type_env.cpp" 31 41 >}}