Write up type code

2019-08-25 16:42:23 -07:00 · 2019-08-25 16:42:23 -07:00 · 1820a05fcc
commit 1820a05fcc
parent ac589a8b0a
3 changed files with 187 additions and 12 deletions
--- a/code/compiler/03/type.cpp
+++ b/code/compiler/03/type.cpp
@ -3,18 +3,17 @@
 #include <algorithm>
 std::string type_mgr::new_type_name() {
    std::ostringstream oss;
    int temp = last_id++;
    std::string str = "";
-    do {
+    while(temp != -1) {
-        oss << (char) ('a' + (temp % 26));
+        str += (char) ('a' + (temp % 26));
-        temp /= 26;
+        temp = temp / 26 - 1;
-    } while(temp);
+    }
    std::string str = oss.str();
    std::reverse(str.begin(), str.end());
    return str;
-};
+}
 type_ptr type_mgr::new_type() {
    return type_ptr(new type_var(new_type_name()));
@ -23,3 +22,57 @@ type_ptr type_mgr::new_type() {
 type_ptr type_mgr::new_arrow_type() {
    return type_ptr(new type_arr(new_type(), new_type()));
 }
 type_ptr type_mgr::resolve(type_ptr t, type_var*& var) {
    type_var* cast;
    var = nullptr;
    while((cast = dynamic_cast<type_var*>(t.get()))) {
        auto it = types.find(cast->name);
        if(it == types.end()) {
            var = cast;
            break;
        }
        t = it->second;
    }
    return t;
 }
 void type_mgr::unify(type_ptr l, type_ptr r) {
    type_var* lvar;
    type_var* rvar;
    type_arr* larr;
    type_arr* rarr;
    type_base* lid;
    type_base* rid;
    l = resolve(l, lvar);
    r = resolve(r, rvar);
    if(lvar) {
        bind(lvar->name, r);
        return;
    } else if(rvar) {
        bind(rvar->name, l);
        return;
    } else if((larr = dynamic_cast<type_arr*>(l.get())) &&
            (rarr = dynamic_cast<type_arr*>(r.get()))) {
        unify(larr->left, rarr->left);
        unify(larr->right, rarr->right);
        return;
    } else if((lid = dynamic_cast<type_base*>(l.get())) &&
            (rid = dynamic_cast<type_base*>(r.get()))) {
        if(lid->name == rid->name) return;
    }
    throw 0;
 }
 void type_mgr::bind(std::string s, type_ptr t) {
    type_var* other = dynamic_cast<type_var*>(t.get());
    if(other && other->name == s) return;
    types[s] = t;
 }
--- a/code/compiler/03/type.hpp
+++ b/code/compiler/03/type.hpp
@ -15,11 +15,11 @@ struct type_var : public type {
        : name(std::move(n)) {}
 };
-struct type_id : public type {
+struct type_base : public type {
-    int id;
+    std::string name;
-    type_id(int i) 
+    type_base(std::string n) 
-        : id(i) {}
+        : name(std::move(n)) {}
 };
 struct type_arr : public type {
@ -32,8 +32,13 @@ struct type_arr : public type {
 struct type_mgr {
    int last_id = 0;
    std::map<std::string, type_ptr> types;
    std::string new_type_name();
    type_ptr new_type();
    type_ptr new_arrow_type();
    void unify(type_ptr l, type_ptr r);
    type_ptr resolve(type_ptr t, type_var*& var);
    void bind(std::string s, type_ptr t);
 };
--- a/content/blog/03_compiler_trees.md
+++ b/content/blog/03_compiler_trees.md
@ -295,4 +295,121 @@ through \\(\\tau\_n\\), then the each variable will have a corresponding type.
 We didn't include lambda expressions in our syntax, and thus we won't need typing rules for them,
 so it actually seems like we're done with the first draft of our type rules.
-{{< todo >}}Cite 006_hindley_milner on implementation of bind. {{< /todo >}}.
+#### Implementation
 Let's work towards some code. Before we write anything down though, let's
 get a definition of what a "type" is, in the context of our type checker.
 Let's say a type is one of 3 things:
 1. A "base type", like `Int`, `Bool`, or `List`.
 2. A type that's a function from one type to another.
 3. A placeholder / type variable (like the kind we used for type inference).
 We represent a plceholder type with a unique string, such as "a", or "b",
 and this makes our placeholder type class very similar to the base type class.
 {{< codeblock "C++" "compiler/03/type.hpp" >}}
 As you can see, we also declared a `type_mgr`, or type manager class.
 This class will keep the state used for generating more placeholder
 type names, as well as the information about which
 placeholder type is mapped to what. We gave it 3 methods:
 * `unify`, to perform unification. It will take two types and
 find values for placeholder variables such that they can
 equal.
 * `resolve`, to get to the "bottom" of a chain of equations.
 For instance, we have placeholder `a` be mapped to a placeholder
 `b`, an finally, the placeholder `b` to be mapped to `Int`.
 `resolve` will return for us `Int`, and, if the "bottom"
 of the chain is a placeholder, it will set `var` to be a pointer
 to that placeholder.
 * `bind`, inspired by [this post](http://dev.stephendiehl.com/fun/006_hindley_milner.html),
 will map a type variable of some name to a type. This function will also check if
 the thing we're binding to is the same type variable and not do anything in that case,
 since `a = a` is not a very useful equation to have.
 To fit its original purpose, we also give the manager class the methods
 `new_type_name`, and two convenience methods to create placeholder types,
 `new_type` (in the form `a`) and `new_arrow_type` (in the form `a->b`).
 Let's take a look at the implementation now:
 {{< codeblock "C++" "compiler/03/type.cpp" >}}
 Here, `new_type_name` is actually pretty boring. My goal
 was to generate type names like `a`, then `b`, eventually getting
 to `z`, and then move on to `aa`. This provides is with an
 endless stream of placeholder types.
 Time for the interesting functions. `resolve` keeps
 trying `dynamic_cast` to a type variable, and if that succeeds,
 then either:
 1. It's a type variable that's already been set
 to something, in which case we try resolve the thing it was
 set to (`t = it->second`)
 2. It's a type variable that hasn't been set to something.
 We set `var` to it (the caller will use this info),
 and stop our resolution loop (`break`).
 In `unify`, we start by calling `resolve` - we don't want
 to accidentally compare `a` and `b` (and try to bind `a` to
 `b`) when `a` is already bound to something else (like `Int`).
 From resolve, we will have `lvar` and `rvar` set to
 something not NULL if `l` or `r` were type variables
 that haven't been yet mapped (we defined `resolve` to behave this way).
 So, if one of the variables is not NULL, we try to bind it.
 Next, `unify` checks if both types are either base types or
 arrow types. If they're base types, it compares their names,
 and if they're arrow types, it recursively unifies their children.
 We return in all cases when unification succeeds, and then throw
 an exception (currently 0) if all the cases fell thorugh, and thus,
 unification failed.
 Finally, `bind` places the type we're binding to into
 the `types` map, but not before it checks that the type
 we're binding is the same as the string we're binding it to
 (since, again, `a=a` is not a useful equation).
 We now have a unification algorithm, but we still
 need to implement our rules. Our rules
 usually include three things: an environment
 \\(\\Gamma\\), an expression \\(e\\),
 and a type \\(\\tau\\). We will
 represent this as a method on `ast`, which is
 our representation of an expression \\(e\\). This
 method will take an environment and return
 a type.
 But first, how should we implement our environment?
 Conceptually, an environment maps a name string
 to a type. So naively, we can implement this simply
 using an `std::map`. But observe
 that we only extend the environment in one case so far:
 a case expression. In a case expression, we have the base
 envrionment \\(\\Gamma\\), and for each branch,
 we extend it with the bindings produced by
 the pattern match. Each branch receives a modified
 copy of the original environment, one that
 doesn't see the effects of the other branches.
 Using our naive approach, we'd create a new `std::map` for
 each branch that's a copy of the original environment,
 and place into it the new pairs. But this means we'll
 need to copy a map for each branch of the pattern!
 There's a better way. We structure our environment like
 a linked list. Each node in the linked list
 contains an `std::map`. When we encounter a new
 scope (such as in a case branch), we create a new such node, and add all
 variables introduced in this scope to that node's map. We make
 it point to our current environment. Then, we pass around the new node
 as the environment.
 When we look up a variable name, we first look in this node we created. 
 If we don't find the variable we're looking for, we move on to the next
 node. The benefit of this is that we won't be re-creating a map
 for each branch, and just creating a node with the changes.
 Let's implement exactly that: