Finish and publish part 10 of compiler series
continuous-integration/drone/push Build is passing Details

This commit is contained in:
Danila Fedorin 2020-03-25 17:15:53 -07:00
parent 5cccb97ede
commit c53a8ba68e
1 changed files with 27 additions and 13 deletions

View File

@ -1,8 +1,7 @@
--- ---
title: Compiling a Functional Language Using C++, Part 10 - Polymorphism title: Compiling a Functional Language Using C++, Part 10 - Polymorphism
date: 2020-02-29T20:09:37-08:00 date: 2020-03-25T17:14:20-07:00
tags: ["C and C++", "Functional Languages", "Compilers"] tags: ["C and C++", "Functional Languages", "Compilers"]
draft: true
--- ---
[In part 8]({{< relref "08_compiler_llvm.md" >}}), we wrote some pretty interesting programs in our little language. [In part 8]({{< relref "08_compiler_llvm.md" >}}), we wrote some pretty interesting programs in our little language.
@ -54,7 +53,13 @@ One such powerful set of rules is the [Hindley-Milner type system](https://en.wi
which we have previously alluded to. In fact, the rules we came up which we have previously alluded to. In fact, the rules we came up
with were already very close to Hindley-Milner, with the exception of two: with were already very close to Hindley-Milner, with the exception of two:
__generalization__ and __instantiation__. It's been quite a while since the last time we worked on typechecking, so I'm going __generalization__ and __instantiation__. It's been quite a while since the last time we worked on typechecking, so I'm going
to present a table with these new rules, as well as all of the ones that we previously used. I will also give a quick to present a table with these new rules, as well as all of the ones that we
{{< sidenote "right" "rules-note" "previously used." >}}
The rules aren't quite the same as the ones we used earlier;
note that \(\sigma\) is used in place of \(\tau\) in the first rule,
for instance. These changes are slight, and we'll talk about how the
rules work together below.
{{< /sidenote >}} I will also give a quick
summary of each of these rules. summary of each of these rules.
Rule|Name and Description Rule|Name and Description
@ -181,10 +186,10 @@ How about the following:
1. To every declared function, assign the type \\(a \\rightarrow ... \\rightarrow y \\rightarrow z\\), 1. To every declared function, assign the type \\(a \\rightarrow ... \\rightarrow y \\rightarrow z\\),
where where
{{< sidenote "right" "arguments-note" "\(a\) through \(y\) are the types of the arguments to the function" >}} {{< sidenote "right" "arguments-note" "\(a\) through \(y\) are the types of the arguments to the function," >}}
Of course, there can be more or less than 25 arguments to any function. This is just a generalization; Of course, there can be more or less than 25 arguments to any function. This is just a generalization;
we use as many input types as are needed. we use as many input types as are needed.
{{< /sidenote >}}, and \\(z\\) is the function's {{< /sidenote >}} and \\(z\\) is the function's
return type. return type.
2. We typecheck each declared function, using the __Var__, __Case__, __App__, and __Inst__ rules. 2. We typecheck each declared function, using the __Var__, __Case__, __App__, and __Inst__ rules.
3. Whatever type variables we don't fill in, we assume can be filled in with any type, 3. Whatever type variables we don't fill in, we assume can be filled in with any type,
@ -367,8 +372,8 @@ to establish a topological order.
Following these, we have three public function definitions: Following these, we have three public function definitions:
* `add_function` adds a vertex to the graph. Sometimes, a function does not * `add_function` adds a vertex to the graph. Sometimes, a function does not
reference any other functions, and would not appear in the list of edges. reference any other functions, and would not appear in the list of edges.
We will call this function to make sure that the function graph is aware We will call `add_function` to make sure that the function graph is aware
of such functions. For convenience, this function returns the adjacency list of such independent functions. For convenience, `add_function` returns the adjacency list
of the added function. of the added function.
* `add_edge` adds a new dependency between two functions. * `add_edge` adds a new dependency between two functions.
* `compute_order` method uses the internal methods described above to convert * `compute_order` method uses the internal methods described above to convert
@ -403,7 +408,7 @@ group members were also already visited and added.
Once groups have been created, we use their functions' edges Once groups have been created, we use their functions' edges
to create edges for the groups themselves, using `create_edges`. to create edges for the groups themselves, using `create_edges`.
We avoid creating edges from a group to itself, to avoid We avoid creating edges from a group to itself, to prevent
unnecessary cycles. While constructing the edges, we also unnecessary cycles. While constructing the edges, we also
increment the relevant indegree counter. increment the relevant indegree counter.
@ -540,7 +545,7 @@ tree node has a `type_env_ptr`. Furthermore, `typecheck` should no longer call
Don't worry about `instantiate` for now; that's coming up. Similarly to Don't worry about `instantiate` for now; that's coming up. Similarly to
`ast_lid`, `ast_case::typecheck` will no longer introduce new bindings, `ast_lid`, `ast_case::typecheck` will no longer introduce new bindings,
and unify instead: but unify existing types via the `pattern`:
{{< codelines "C++" "compiler/10/ast.cpp" 152 169 >}} {{< codelines "C++" "compiler/10/ast.cpp" 152 169 >}}
@ -572,7 +577,16 @@ steps:
it refers to "known" types. Add valid constructors to the global environment as functions. it refers to "known" types. Add valid constructors to the global environment as functions.
We don't currently verify that types are "known"; A user could declare a list of `Floobs`, We don't currently verify that types are "known"; A user could declare a list of `Floobs`,
and never say what a `Floob` is. This isn't too big of an issue (good luck constructing and never say what a `Floob` is.
{{< sidenote "right" "known-type-note" "This isn't too big of an issue" >}}
Curiously, this flaw did lead to some valid programs being rejected. Since
we had no notion of a "known" type, whenever data type constructors
were created, every argument type was marked a "base" type;
see <a href="https://dev.danilafe.com/Web-Projects/blog-static/src/branch/master/code/compiler/09/definition.cpp#L82">
this line</a> if you're curious.
This would cause pattern matching to fail on the tail of a list, with
the "attempt to pattern match on non-data argument" error.
{{< /sidenote >}}(good luck constructing
a value of a non-existent type), but a mature compiler should prevent this from happening. a value of a non-existent type), but a mature compiler should prevent this from happening.
On the other hand, here are the steps for function definitions: On the other hand, here are the steps for function definitions:
@ -663,7 +677,7 @@ The separation of data and function definitions must be reconciled with code
going back as far as the parser. While previously, we populated a single, global going back as far as the parser. While previously, we populated a single, global
vector of definitions called `program`, we can no longer do that. Instead, we'll vector of definitions called `program`, we can no longer do that. Instead, we'll
split our program into two maps, one for data types and one for functions. We split our program into two maps, one for data types and one for functions. We
use maps for convenience: since the groups generated by our function graph refer use maps for convenience: the groups generated by our function graph refer
to functions by name, and it would be nice to quickly look up the data to functions by name, and it would be nice to quickly look up the data
the names refer to. Rather than returning such maps, we change our semantic the names refer to. Rather than returning such maps, we change our semantic
actions to simply insert new data into one of two global maps. Below actions to simply insert new data into one of two global maps. Below
@ -729,9 +743,9 @@ possibility that the variable has a polymorphic type, which needs to be speciali
(potentially differently in every occurrence of the variable). (potentially differently in every occurrence of the variable).
When talking about our new typechecking algorithm, we mentioned using __Gen__ to sprinkle When talking about our new typechecking algorithm, we mentioned using __Gen__ to sprinkle
polymorphism wherever possible. Whenever possible, __Gen__ will add free variables polymorphism into our program. If it can, __Gen__ will add free variables
in a type to the "forall" quantifier at the front, making that type polymorphic. in a type to the "forall" quantifier at the front, making that type polymorphic.
We implement this using a new `generalize` added to the `type_env`, which (as per We implement this using a new `generalize` method added to the `type_env`, which (as per
convention) generalizes the type of a given variable as much as possible: convention) generalizes the type of a given variable as much as possible:
{{< codelines "C++" "compiler/10/type_env.cpp" 31 41 >}} {{< codelines "C++" "compiler/10/type_env.cpp" 31 41 >}}