Finish and publish part 10 of compiler series
All checks were successful
continuous-integration/drone/push Build is passing

This commit is contained in:
Danila Fedorin 2020-03-25 17:15:53 -07:00
parent 5cccb97ede
commit c53a8ba68e

View File

@ -1,8 +1,7 @@
---
title: Compiling a Functional Language Using C++, Part 10 - Polymorphism
date: 2020-02-29T20:09:37-08:00
date: 2020-03-25T17:14:20-07:00
tags: ["C and C++", "Functional Languages", "Compilers"]
draft: true
---
[In part 8]({{< relref "08_compiler_llvm.md" >}}), we wrote some pretty interesting programs in our little language.
@ -54,7 +53,13 @@ One such powerful set of rules is the [Hindley-Milner type system](https://en.wi
which we have previously alluded to. In fact, the rules we came up
with were already very close to Hindley-Milner, with the exception of two:
__generalization__ and __instantiation__. It's been quite a while since the last time we worked on typechecking, so I'm going
to present a table with these new rules, as well as all of the ones that we previously used. I will also give a quick
to present a table with these new rules, as well as all of the ones that we
{{< sidenote "right" "rules-note" "previously used." >}}
The rules aren't quite the same as the ones we used earlier;
note that \(\sigma\) is used in place of \(\tau\) in the first rule,
for instance. These changes are slight, and we'll talk about how the
rules work together below.
{{< /sidenote >}} I will also give a quick
summary of each of these rules.
Rule|Name and Description
@ -181,10 +186,10 @@ How about the following:
1. To every declared function, assign the type \\(a \\rightarrow ... \\rightarrow y \\rightarrow z\\),
where
{{< sidenote "right" "arguments-note" "\(a\) through \(y\) are the types of the arguments to the function" >}}
{{< sidenote "right" "arguments-note" "\(a\) through \(y\) are the types of the arguments to the function," >}}
Of course, there can be more or less than 25 arguments to any function. This is just a generalization;
we use as many input types as are needed.
{{< /sidenote >}}, and \\(z\\) is the function's
{{< /sidenote >}} and \\(z\\) is the function's
return type.
2. We typecheck each declared function, using the __Var__, __Case__, __App__, and __Inst__ rules.
3. Whatever type variables we don't fill in, we assume can be filled in with any type,
@ -367,8 +372,8 @@ to establish a topological order.
Following these, we have three public function definitions:
* `add_function` adds a vertex to the graph. Sometimes, a function does not
reference any other functions, and would not appear in the list of edges.
We will call this function to make sure that the function graph is aware
of such functions. For convenience, this function returns the adjacency list
We will call `add_function` to make sure that the function graph is aware
of such independent functions. For convenience, `add_function` returns the adjacency list
of the added function.
* `add_edge` adds a new dependency between two functions.
* `compute_order` method uses the internal methods described above to convert
@ -403,7 +408,7 @@ group members were also already visited and added.
Once groups have been created, we use their functions' edges
to create edges for the groups themselves, using `create_edges`.
We avoid creating edges from a group to itself, to avoid
We avoid creating edges from a group to itself, to prevent
unnecessary cycles. While constructing the edges, we also
increment the relevant indegree counter.
@ -540,7 +545,7 @@ tree node has a `type_env_ptr`. Furthermore, `typecheck` should no longer call
Don't worry about `instantiate` for now; that's coming up. Similarly to
`ast_lid`, `ast_case::typecheck` will no longer introduce new bindings,
and unify instead:
but unify existing types via the `pattern`:
{{< codelines "C++" "compiler/10/ast.cpp" 152 169 >}}
@ -572,7 +577,16 @@ steps:
it refers to "known" types. Add valid constructors to the global environment as functions.
We don't currently verify that types are "known"; A user could declare a list of `Floobs`,
and never say what a `Floob` is. This isn't too big of an issue (good luck constructing
and never say what a `Floob` is.
{{< sidenote "right" "known-type-note" "This isn't too big of an issue" >}}
Curiously, this flaw did lead to some valid programs being rejected. Since
we had no notion of a "known" type, whenever data type constructors
were created, every argument type was marked a "base" type;
see <a href="https://dev.danilafe.com/Web-Projects/blog-static/src/branch/master/code/compiler/09/definition.cpp#L82">
this line</a> if you're curious.
This would cause pattern matching to fail on the tail of a list, with
the "attempt to pattern match on non-data argument" error.
{{< /sidenote >}}(good luck constructing
a value of a non-existent type), but a mature compiler should prevent this from happening.
On the other hand, here are the steps for function definitions:
@ -663,7 +677,7 @@ The separation of data and function definitions must be reconciled with code
going back as far as the parser. While previously, we populated a single, global
vector of definitions called `program`, we can no longer do that. Instead, we'll
split our program into two maps, one for data types and one for functions. We
use maps for convenience: since the groups generated by our function graph refer
use maps for convenience: the groups generated by our function graph refer
to functions by name, and it would be nice to quickly look up the data
the names refer to. Rather than returning such maps, we change our semantic
actions to simply insert new data into one of two global maps. Below
@ -729,9 +743,9 @@ possibility that the variable has a polymorphic type, which needs to be speciali
(potentially differently in every occurrence of the variable).
When talking about our new typechecking algorithm, we mentioned using __Gen__ to sprinkle
polymorphism wherever possible. Whenever possible, __Gen__ will add free variables
polymorphism into our program. If it can, __Gen__ will add free variables
in a type to the "forall" quantifier at the front, making that type polymorphic.
We implement this using a new `generalize` added to the `type_env`, which (as per
We implement this using a new `generalize` method added to the `type_env`, which (as per
convention) generalizes the type of a given variable as much as possible:
{{< codelines "C++" "compiler/10/type_env.cpp" 31 41 >}}