Do the first round of revisions on part 3
This commit is contained in:
parent
30c881ce3f
commit
619c346897
|
@ -28,28 +28,22 @@ type_ptr ast_binop::typecheck(type_mgr& mgr, const type_env& env) const {
|
||||||
type_ptr ftype = env.lookup(op_name(op));
|
type_ptr ftype = env.lookup(op_name(op));
|
||||||
if(!ftype) throw 0;
|
if(!ftype) throw 0;
|
||||||
|
|
||||||
type_ptr place_a = mgr.new_type();
|
type_ptr return_type = mgr.new_type();
|
||||||
type_ptr place_b = mgr.new_type();
|
type_ptr arrow_one = type_ptr(new type_arr(rtype, return_type));
|
||||||
type_ptr place_c = mgr.new_type();
|
type_ptr arrow_two = type_ptr(new type_arr(ltype, arrow_one));
|
||||||
type_ptr arrow_one = type_ptr(new type_arr(place_b, place_c));
|
|
||||||
type_ptr arrow_two = type_ptr(new type_arr(place_a, arrow_one));
|
|
||||||
|
|
||||||
mgr.unify(arrow_two, ftype);
|
mgr.unify(arrow_two, ftype);
|
||||||
mgr.unify(place_a, ltype);
|
return return_type;
|
||||||
mgr.unify(place_b, rtype);
|
|
||||||
return place_c;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
type_ptr ast_app::typecheck(type_mgr& mgr, const type_env& env) const {
|
type_ptr ast_app::typecheck(type_mgr& mgr, const type_env& env) const {
|
||||||
type_ptr ltype = left->typecheck(mgr, env);
|
type_ptr ltype = left->typecheck(mgr, env);
|
||||||
type_ptr rtype = right->typecheck(mgr, env);
|
type_ptr rtype = right->typecheck(mgr, env);
|
||||||
|
|
||||||
type_ptr place_a = mgr.new_type();
|
type_ptr return_type = mgr.new_type();
|
||||||
type_ptr place_b = mgr.new_type();
|
type_ptr arrow = type_ptr(new type_arr(rtype, return_type));
|
||||||
type_ptr arrow = type_ptr(new type_arr(place_a, place_b));
|
|
||||||
mgr.unify(arrow, ltype);
|
mgr.unify(arrow, ltype);
|
||||||
mgr.unify(place_a, rtype);
|
return return_type;
|
||||||
return place_b;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
type_ptr ast_case::typecheck(type_mgr& mgr, const type_env& env) const {
|
type_ptr ast_case::typecheck(type_mgr& mgr, const type_env& env) const {
|
||||||
|
|
|
@ -36,8 +36,8 @@ defn main = { 1 2 3 4 5 }
|
||||||
```
|
```
|
||||||
|
|
||||||
What is this? It's a sequence of applications, starting with `1 2`. Numbers
|
What is this? It's a sequence of applications, starting with `1 2`. Numbers
|
||||||
are not functions. Their type is `Int`, not `Int -> a`. I can't even think of a type `1`
|
are not functions. Their type is `Int`, not `Int -> a`. I can't even think of a type numbers
|
||||||
would need to have for this program to be valid.
|
would need to have for this program to be valid (though perhaps one could come up with one).
|
||||||
|
|
||||||
Before we give meaning to programs in our language, we'll need to toss away the ones
|
Before we give meaning to programs in our language, we'll need to toss away the ones
|
||||||
that don't make sense. To do so, we will use type checking. During the process of type
|
that don't make sense. To do so, we will use type checking. During the process of type
|
||||||
|
@ -47,20 +47,18 @@ to throw away blatantly incorrect programs.
|
||||||
|
|
||||||
### Basic Type Checking
|
### Basic Type Checking
|
||||||
You may have noticed in the very first post that I have chosen to avoid polymorphism.
|
You may have noticed in the very first post that I have chosen to avoid polymorphism.
|
||||||
This will significantly simplify our type checking algorithm. If a more robust
|
This will simplify our type checking algorithm. If a more robust
|
||||||
algorithm is desired, take a look at the
|
algorithm is desired, take a look at the
|
||||||
[Hindley-Milner type system](https://en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_system).
|
[Hindley-Milner type system](https://en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_system).
|
||||||
Personally, I enjoyed [this](http://dev.stephendiehl.com/fun/006_hindley_milner.html) section
|
Personally, I enjoyed [this](http://dev.stephendiehl.com/fun/006_hindley_milner.html) section
|
||||||
of _Write You a Haskell_.
|
of _Write You a Haskell_, which I used to sanity check this very post.
|
||||||
|
|
||||||
Let's start with the types of constants - those are pretty obvious. The constant `3` is an integer,
|
Let's start with the types of constants - those are pretty obvious. The constant `3` is an integer,
|
||||||
and we shall mark it as such: `3 :: Int`. Variables seem like the natural
|
and we shall mark it as such: `3 :: Int`. Variables seem like the natural
|
||||||
next step, but they're fairly different. Without outside knowledge, all we can do is say
|
next step, but they're fairly different. Without outside knowledge, we can't
|
||||||
that a variable has __some__ type. If we stick with Haskell's notation (used in polymorphic types),
|
tell what type a variable `x` is. If we know more information, like the fact that `x`
|
||||||
we can say a variable has a type `a`, where `a` can be replaced with any other type. If we
|
was declared to be an integer, we can instead say that. This tells us that throughout
|
||||||
know more information, like the fact that `x` was declared to be an integer, we can instead
|
type checking we'll have to keep some kind of record of names and their associated types.
|
||||||
say that. This tells us that throughout type checking we'll have to keep some kind
|
|
||||||
of record of names and their associated types.
|
|
||||||
|
|
||||||
Next, let's take a look at functions, which are admittedly more interesting
|
Next, let's take a look at functions, which are admittedly more interesting
|
||||||
than the previous two examples. I'm not talking about the case of seeing something
|
than the previous two examples. I'm not talking about the case of seeing something
|
||||||
|
@ -68,13 +66,16 @@ like a function name `f`. This is the same as the variable case - we don't even
|
||||||
it's a function unless there is context, and if there __is__ context, then that
|
it's a function unless there is context, and if there __is__ context, then that
|
||||||
context is probably the most useful information we have. I'm talking about
|
context is probably the most useful information we have. I'm talking about
|
||||||
something like the application of a function to another value, in the form
|
something like the application of a function to another value, in the form
|
||||||
`f x`. In this case, we know that `f :: a -> b`, a function from something
|
`f x`. In this case, we know that for the program to be valid,
|
||||||
to something. However, we know even more. For this program to be correct,
|
`f` must have the type `a -> b`, a function from something
|
||||||
the `a` in `f :: a -> b`, and the type of `x` (let's call it `c`), must
|
to something. Furthermore, since `f` is being applied to `x`,
|
||||||
be compatible. In order to do that, we will use what can be considered
|
the type of `x` (let's call it `c`) must be the same as the type `a`.
|
||||||
simplified [unification](https://en.wikipedia.org/wiki/Unification_(computer_science)).
|
|
||||||
Conceptually, what this means is that we will attempt to perform substitutions
|
Our rules are getting more complicated. In order to check that they hold, we will use
|
||||||
in various equations in search of a solution.
|
[unification](https://en.wikipedia.org/wiki/Unification_(computer_science)).
|
||||||
|
This means that we'll be creating various equations (such as "the type of `f` is equal to `a -> b`"),
|
||||||
|
and finding substitutions to solve these equations. If we're unable to
|
||||||
|
find a solution to our equations, the program is invalid and we throw it away.
|
||||||
|
|
||||||
#### Basic Examples
|
#### Basic Examples
|
||||||
Let's try an example. We'll try to determine the type of the following
|
Let's try an example. We'll try to determine the type of the following
|
||||||
|
@ -87,7 +88,7 @@ foo 320 6
|
||||||
|
|
||||||
In out parse tree, this will be represented as `(foo 320) 6`, meaning
|
In out parse tree, this will be represented as `(foo 320) 6`, meaning
|
||||||
that the outermost application will be at the top. Let's assume
|
that the outermost application will be at the top. Let's assume
|
||||||
we know __nothing__ about `foo`.
|
we know only that `foo` is defined.
|
||||||
|
|
||||||
To figure out the type of the application, we will need to know the types
|
To figure out the type of the application, we will need to know the types
|
||||||
of the thing being applied, and the thing that serves as the argument.
|
of the thing being applied, and the thing that serves as the argument.
|
||||||
|
@ -99,12 +100,13 @@ an argument, it has to be of type `a -> b`.
|
||||||
Let's proceed to the left child. It's another application, this time
|
Let's proceed to the left child. It's another application, this time
|
||||||
of `foo` to `320`. Again, the right child is simple: the type of
|
of `foo` to `320`. Again, the right child is simple: the type of
|
||||||
`320` is `Int`. Again, we know that `foo` has to have a type
|
`320` is `Int`. Again, we know that `foo` has to have a type
|
||||||
`c -> d` (we're using different variable names to avoid ambiguity).
|
`c -> d` (we're using different variable names since the types
|
||||||
|
of `foo` and `foo 320` are not the same).
|
||||||
|
|
||||||
Now, we need to combine the pieces of information that we have. Since
|
Now, we need to combine the pieces of information that we have. Since
|
||||||
`foo :: c -> d`, we know that its first parameter __must__ be of
|
`foo :: c -> d`, we know that its first parameter __must__ be of
|
||||||
type `c`. We also know that its first parameter is of type `Int`.
|
type `c`. We also know that its first parameter is of type `Int`.
|
||||||
The only way for both of these to be tree is for `c = Int`.
|
The only way for both of these to be true is for `c = Int`.
|
||||||
This also tells us that `foo :: Int -> d`. Finally,
|
This also tells us that `foo :: Int -> d`. Finally,
|
||||||
since `foo` has now been applied to its first argument,
|
since `foo` has now been applied to its first argument,
|
||||||
we know that the `foo 320 :: d`.
|
we know that the `foo 320 :: d`.
|
||||||
|
@ -116,7 +118,7 @@ is for `d = a -> b`. So, `foo 320` is a function from `a` to `b`.
|
||||||
Again, we can conclude the first parameter has to be of type
|
Again, we can conclude the first parameter has to be of type
|
||||||
`a`. We also know that the first parameter is of type `Int`. Clearly,
|
`a`. We also know that the first parameter is of type `Int`. Clearly,
|
||||||
this means that `a = Int`. After the application, we know
|
this means that `a = Int`. After the application, we know
|
||||||
that the whole expression has some type `d`.
|
that the whole expression has some type `b`.
|
||||||
|
|
||||||
Let's revisit what we know about `foo`. Last time we checked in on it,
|
Let's revisit what we know about `foo`. Last time we checked in on it,
|
||||||
`foo` was of type `Int -> d`. But since we know that `d = a -> b`,
|
`foo` was of type `Int -> d`. But since we know that `d = a -> b`,
|
||||||
|
@ -156,7 +158,7 @@ $$
|
||||||
\\frac{A\_1 \\ldots A\_n} {B\_1 \\ldots B\_m}
|
\\frac{A\_1 \\ldots A\_n} {B\_1 \\ldots B\_m}
|
||||||
$$
|
$$
|
||||||
This reads, "given that the premises \\(A\_1\\) through \\(A\_n\\) are true,
|
This reads, "given that the premises \\(A\_1\\) through \\(A\_n\\) are true,
|
||||||
it holds that the conclusions \\(B\_1\\) through \\(B\_n\\) are true".
|
it holds that the conclusions \\(B\_1\\) through \\(B\_m\\) are true".
|
||||||
|
|
||||||
For example, we can have the following inference rule:
|
For example, we can have the following inference rule:
|
||||||
$$
|
$$
|
||||||
|
@ -197,18 +199,27 @@ $$
|
||||||
{e_1 \\; e_2 : \\tau\_2}
|
{e_1 \\; e_2 : \\tau\_2}
|
||||||
$$
|
$$
|
||||||
|
|
||||||
|
This rule includes everything we've said before:
|
||||||
|
the thing being applied has to have a function type
|
||||||
|
(\\(\\tau\_1 \\rightarrow \\tau\_2\\)), and
|
||||||
|
the expression the function is applied to
|
||||||
|
has to have the same type \\(\\tau\_1\\) as the
|
||||||
|
left type of the function.
|
||||||
|
|
||||||
It's the variable rule that forces us to adjust our notation.
|
It's the variable rule that forces us to adjust our notation.
|
||||||
Our rules don't take into account the context that we've
|
Our rules don't take into account the context that we've
|
||||||
already discussed. Let's fix that! It's convention
|
already discussed, and thus, we can't bring
|
||||||
|
in any outside information. Let's fix that! It's convention
|
||||||
to use the symbol \\(\\Gamma\\) for the context. We then
|
to use the symbol \\(\\Gamma\\) for the context. We then
|
||||||
add notation to say, "using the context \\(\\Gamma\\),
|
add notation to say, "using the context \\(\\Gamma\\),
|
||||||
we can deduce that \\(e\\) has type \\(\\tau\\)". We will
|
we can deduce that \\(e\\) has type \\(\\tau\\)". We will
|
||||||
write this as \\(\\Gamma \\vdash e : \\tau\\).
|
write this as \\(\\Gamma \\vdash e : \\tau\\).
|
||||||
|
|
||||||
But what __is__ our context? It's just a set of pairs
|
But what __is__ our context? We can think of it
|
||||||
|
as a mapping from variable names to their known types. We
|
||||||
|
can represent such a mapping using a set of pairs
|
||||||
in the form \\(x : \\tau\\), where \\(x\\) represents
|
in the form \\(x : \\tau\\), where \\(x\\) represents
|
||||||
a variable name. Each pair tells us that the variable
|
a variable name.
|
||||||
\\(x\\) is known to have a type \\(\\tau\\).
|
|
||||||
|
|
||||||
Since \\(\\Gamma\\) is just a regular set, we can
|
Since \\(\\Gamma\\) is just a regular set, we can
|
||||||
write \\(x : \\tau \\in \\Gamma\\), meaning that
|
write \\(x : \\tau \\in \\Gamma\\), meaning that
|
||||||
|
@ -236,7 +247,7 @@ $$
|
||||||
|
|
||||||
In these three expressions, we've captured all of the rules
|
In these three expressions, we've captured all of the rules
|
||||||
that we've seen so far. It's important to know
|
that we've seen so far. It's important to know
|
||||||
that These rules leave out the process
|
that these rules leave out the process
|
||||||
of unification altogether: we use unification to find
|
of unification altogether: we use unification to find
|
||||||
types that satisfy the rules.
|
types that satisfy the rules.
|
||||||
|
|
||||||
|
@ -271,7 +282,7 @@ such as `Cons x xs` will introduce new type information, namely \\(\text{x} : \t
|
||||||
4. Finally, the conclusion is that the case expression, if all the premises are met, is of type \\(\\tau\_c\\).
|
4. Finally, the conclusion is that the case expression, if all the premises are met, is of type \\(\\tau\_c\\).
|
||||||
|
|
||||||
For completeness, let's add rules for \\(\\text{matcht}(\\tau, p\_i) = b\_i\\). We'll need two: one for
|
For completeness, let's add rules for \\(\\text{matcht}(\\tau, p\_i) = b\_i\\). We'll need two: one for
|
||||||
the "basic" pattern, which always matches the value and binds it to the variable, and one
|
the "basic" pattern, which always matches the value and binds a variable to it, and one
|
||||||
for a constructor pattern, that matches a constructor and its parameters.
|
for a constructor pattern, that matches a constructor and its parameters.
|
||||||
|
|
||||||
Let's define \\(v\\) to be a variable name in the context of a pattern. For the basic pattern:
|
Let's define \\(v\\) to be a variable name in the context of a pattern. For the basic pattern:
|
||||||
|
@ -325,7 +336,7 @@ of the chain is a placeholder, it will set `var` to be a pointer
|
||||||
to that placeholder.
|
to that placeholder.
|
||||||
* `bind`, inspired by [this post](http://dev.stephendiehl.com/fun/006_hindley_milner.html),
|
* `bind`, inspired by [this post](http://dev.stephendiehl.com/fun/006_hindley_milner.html),
|
||||||
will map a type variable of some name to a type. This function will also check if
|
will map a type variable of some name to a type. This function will also check if
|
||||||
the thing we're binding to is the same type variable and not do anything in that case,
|
we're binding a type variable to itself, and do nothing in that case,
|
||||||
since `a = a` is not a very useful equation to have.
|
since `a = a` is not a very useful equation to have.
|
||||||
|
|
||||||
To fit its original purpose, we also give the manager class the methods
|
To fit its original purpose, we also give the manager class the methods
|
||||||
|
@ -358,7 +369,7 @@ to accidentally compare `a` and `b` (and try to bind `a` to
|
||||||
|
|
||||||
From resolve, we will have `lvar` and `rvar` set to
|
From resolve, we will have `lvar` and `rvar` set to
|
||||||
something not NULL if `l` or `r` were type variables
|
something not NULL if `l` or `r` were type variables
|
||||||
that haven't been yet mapped (we defined `resolve` to behave this way).
|
that haven't yet been bound (we defined `resolve` to behave this way).
|
||||||
So, if one of the variables is not NULL, we try to bind it.
|
So, if one of the variables is not NULL, we try to bind it.
|
||||||
|
|
||||||
Next, `unify` checks if both types are either base types or
|
Next, `unify` checks if both types are either base types or
|
||||||
|
@ -378,12 +389,13 @@ need to implement our rules. Our rules
|
||||||
usually include three things: an environment
|
usually include three things: an environment
|
||||||
\\(\\Gamma\\), an expression \\(e\\),
|
\\(\\Gamma\\), an expression \\(e\\),
|
||||||
and a type \\(\\tau\\). We will
|
and a type \\(\\tau\\). We will
|
||||||
represent this as a method on `ast`, which is
|
represent this as a method on `ast`, our struct
|
||||||
our representation of an expression \\(e\\). This
|
for an expression tree. This
|
||||||
method will take an environment and return
|
method will take an environment and return
|
||||||
a type.
|
a type.
|
||||||
|
|
||||||
But first, how should we implement our environment?
|
##### Environment
|
||||||
|
How should we implement our environment?
|
||||||
Conceptually, an environment maps a name string
|
Conceptually, an environment maps a name string
|
||||||
to a type. So naively, we can implement this simply
|
to a type. So naively, we can implement this simply
|
||||||
using an `std::map`. But observe
|
using an `std::map`. But observe
|
||||||
|
@ -429,7 +441,8 @@ creating new environments on the stack, and we'll
|
||||||
take care not to let a parent go out of scope
|
take care not to let a parent go out of scope
|
||||||
before the child.
|
before the child.
|
||||||
|
|
||||||
At least, it's time to declare a new type checking method.
|
##### Typechecking Expressions
|
||||||
|
At last, it's time to declare a new type checking method.
|
||||||
We start with with a signature inside `ast`:
|
We start with with a signature inside `ast`:
|
||||||
```
|
```
|
||||||
virtual type_ptr typecheck(type_mgr& mgr, const type_env& env) const;
|
virtual type_ptr typecheck(type_mgr& mgr, const type_env& env) const;
|
||||||
|
@ -445,11 +458,68 @@ at the implementation now:
|
||||||
|
|
||||||
{{< codeblock "C++" "compiler/03/ast.cpp" >}}
|
{{< codeblock "C++" "compiler/03/ast.cpp" >}}
|
||||||
|
|
||||||
{{< todo >}}Explain implementation, at least a little.{{< /todo >}}
|
The `typecheck` implementation for `ast_int`, `ast_lid`, and `ast_uid`
|
||||||
|
is quite intuitive. The type of a number is always `Int`, and varible
|
||||||
|
names we simply look up in the environment.
|
||||||
|
|
||||||
|
For `ast_binop` and `ast_app`, we check the types of the children first,
|
||||||
|
and store their resulting types. In the case of `binop`, we assume
|
||||||
|
that we have the type of the binary operator (such as `+`) in the
|
||||||
|
environment, and throw if it isn't. Then, we know that
|
||||||
|
the operator has to be a function of at least two arguments,
|
||||||
|
with the types of the left and right children of the application.
|
||||||
|
We also know its actual type, as it's in the environment. We unify
|
||||||
|
these two types, and then return.
|
||||||
|
|
||||||
|
In the case of `app`, we know that the left side (the thing
|
||||||
|
being applied), has to be a function from the type of the right
|
||||||
|
child. We also know its computed type. Once again, we unify the two,
|
||||||
|
and then return.
|
||||||
|
|
||||||
|
The type checking for `case` offloads most of the work onto the
|
||||||
|
`match` method on patterns. We start by computing the type
|
||||||
|
of the expression we're matching. Then, for each branch,
|
||||||
|
we create a new scope, and populate it with the new bindings
|
||||||
|
created by the pattern match (`match` does this). Once
|
||||||
|
we have a new environment, we typecheck the body
|
||||||
|
of the branch. Finally, we unify the type of the branch's
|
||||||
|
body with the previous body types (since branches of the
|
||||||
|
case expression have to have the same type).
|
||||||
|
|
||||||
|
Let's take a look at `match` now. The `match` method
|
||||||
|
for a variable pattern is very simple: it always
|
||||||
|
introduces a new variable bindings, and stops there.
|
||||||
|
The `match` method for a constructor pattern is more
|
||||||
|
interesting. First, it looks up the constructor
|
||||||
|
that the pattern is trying to match. This needs to exist,
|
||||||
|
so if we don't find a type for it, we throw. Next,
|
||||||
|
for each variable name in the pattern, we know
|
||||||
|
that there has to be a corresponding parameter in the
|
||||||
|
constructor. Because of this, we cast the constructor type
|
||||||
|
to an function type (throwing if it isn't one),
|
||||||
|
and create a new mapping from the variable name to the
|
||||||
|
left side (parameter) of the function type. We then
|
||||||
|
move on to examine the right side of the function
|
||||||
|
type, and the remaining variables of the pattern.
|
||||||
|
|
||||||
|
Once we have gone through the parameters,
|
||||||
|
what remains __should__ be the type of the thing
|
||||||
|
the constructor creates. Not only that, but
|
||||||
|
the remaining type should match the type of the value
|
||||||
|
the pattern is being matched against. To ensure
|
||||||
|
this, we unify the two types.
|
||||||
|
|
||||||
|
There's only one thing that can still go wrong.
|
||||||
|
The value __and__ the pattern can __both__ be
|
||||||
|
a partial application. For ease of implementation,
|
||||||
|
we will not allow this case. Thus, we make sure
|
||||||
|
the final type is a base type, and throw otherwise.
|
||||||
|
|
||||||
|
##### Typechecking Definitions
|
||||||
This looks good, but we're not done yet. We can type
|
This looks good, but we're not done yet. We can type
|
||||||
check expressions, but our program ins't
|
check expressions, but our program ins't
|
||||||
made up of expressions. Rather, it's made up of
|
made up of expressions. Rather, it's made up of
|
||||||
declarations. Further, we can't look at the declarations
|
definitions. Further, we can't look at the definitions
|
||||||
in isolation. Consider these two functions:
|
in isolation. Consider these two functions:
|
||||||
|
|
||||||
```
|
```
|
||||||
|
@ -503,11 +573,11 @@ virtual void typecheck_second(type_mgr& mgr, const type_env& env) const;
|
||||||
Furthermore, in the `definition_defn`, we will keep an
|
Furthermore, in the `definition_defn`, we will keep an
|
||||||
`std::vector` of `type_ptr`, in which the first element is the
|
`std::vector` of `type_ptr`, in which the first element is the
|
||||||
type of the __last__ parameter, and so on. We switch around
|
type of the __last__ parameter, and so on. We switch around
|
||||||
the order of arguments because we build up the `a -> b -> ... -> x`
|
the order of arguments because we build up the `a -> b -> ...`
|
||||||
type signature from the right (`->` is right associative), and
|
type signature from the right (`->` is right associative), and
|
||||||
thus we'll be creating the types right-to-left, too. We also
|
thus we'll be creating the types right-to-left, too. We also
|
||||||
add a `type_ptr` field which holds the type for the function's
|
add a `type_ptr` field which holds the function's return type.
|
||||||
return value. We keep these two things in the `definition_defn` so
|
We keep these two things in the `definition_defn` so
|
||||||
that they persist between the two typechecking stages: we want to use
|
that they persist between the two typechecking stages: we want to use
|
||||||
the types from the first stage to aid in checking the body in the second stage.
|
the types from the first stage to aid in checking the body in the second stage.
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue
Block a user