Drafts of code and markdown.
This commit is contained in:
@@ -1,5 +1,5 @@
|
||||
---
|
||||
title: Compiling a Functional Language Using C++, Part 3 - Operations On Trees
|
||||
title: Compiling a Functional Language Using C++, Part 3 - Type Checking
|
||||
date: 2019-08-06T14:26:38-07:00
|
||||
draft: true
|
||||
tags: ["C and C++", "Functional Languages", "Compilers"]
|
||||
@@ -24,7 +24,7 @@ programs we get from the parser valid? See for yourself:
|
||||
|
||||
```
|
||||
data Bool = { True, False }
|
||||
defn main { 3 + True }
|
||||
defn main = { 3 + True }
|
||||
```
|
||||
|
||||
Obviously, that's not right. The parser accepts it - this matches our grammar.
|
||||
@@ -32,7 +32,7 @@ But giving meaning to this program is not easy, since we have no clear
|
||||
way of adding 3 and some data type. Similarly:
|
||||
|
||||
```
|
||||
defn main { 1 2 3 4 5 }
|
||||
defn main = { 1 2 3 4 5 }
|
||||
```
|
||||
|
||||
What is this? It's a sequence of applications, starting with `1 2`. Numbers
|
||||
@@ -412,4 +412,125 @@ When we look up a variable name, we first look in this node we created.
|
||||
If we don't find the variable we're looking for, we move on to the next
|
||||
node. The benefit of this is that we won't be re-creating a map
|
||||
for each branch, and just creating a node with the changes.
|
||||
Let's implement exactly that:
|
||||
Let's implement exactly that. the header:
|
||||
|
||||
{{< codeblock "C++" "compiler/03/env.hpp" >}}
|
||||
|
||||
And the source file:
|
||||
|
||||
{{< codeblock "C++" "compiler/03/env.cpp" >}}
|
||||
|
||||
Nothing should seem too surprising. Of note is the fact
|
||||
that we're not using smart pointers for `scope`,
|
||||
and that the child we create during the call
|
||||
would be invalid if the parent goes out of scope
|
||||
/ is released. We're gearing this towards
|
||||
creating new environments on the stack, and we'll
|
||||
take care not to let a parent go out of scope
|
||||
before the child.
|
||||
|
||||
At least, it's time to declare a new type checking method.
|
||||
We start with with a signature inside `ast`:
|
||||
```
|
||||
virtual type_ptr typecheck(type_mgr& mgr, const type_env& env) const;
|
||||
```
|
||||
We also implement the \\(\\text{matchp}\\) function
|
||||
as a method `match` on `pattern` with the following signature:
|
||||
```
|
||||
virtual void match(type_ptr t, type_mgr& mgr, type_env& env) const;
|
||||
```
|
||||
|
||||
We declare this in every subclass of `ast`. Let's take a look
|
||||
at the implementation now:
|
||||
|
||||
{{< codeblock "C++" "compiler/03/ast.cpp" >}}
|
||||
|
||||
This looks good, but we're not done yet. We can type
|
||||
check expressions, but our program ins't
|
||||
made up of expressions. Rather, it's made up of
|
||||
declarations. Further, we can't look at the declarations
|
||||
in isolation. Consider these two functions:
|
||||
|
||||
```
|
||||
defn double x = { x + x }
|
||||
defn quadruple x = { double (double x) }
|
||||
```
|
||||
|
||||
Assuming we have an environment containing `x` when we typecheck the body
|
||||
of `double`, our algorithm will work out fine. But what about
|
||||
`quadruple`? It needs to know what `double` is, or at least that it exists.
|
||||
|
||||
We could also envision two mutually recursive functions. Let's
|
||||
assume we have the functions `eq` and `if` in global scope. We can write
|
||||
two functions, `even` and `odd`:
|
||||
|
||||
```
|
||||
defn even x = { if (eq x 0) True (odd (x-1)) }
|
||||
defn odd x = { if (eq x 0) False (even (n-1)) }
|
||||
```
|
||||
|
||||
`odd` needs to know about `even`, and `even` needs
|
||||
to know about `odd`. Thus, before we do any checking,
|
||||
we need to populate a global environment with __some__
|
||||
type for each function we declare. We will
|
||||
use what we know about the function for our
|
||||
initial declaration: if the function
|
||||
takes two parameters, its type will be `a -> b -> c`.
|
||||
If it takes one parameter, its type will be `a -> b`.
|
||||
What's more, though, is that we need to make sure
|
||||
that the function's parameters are passed in the environment
|
||||
when checking its body, and that these parameters' types
|
||||
are the same as the placeholder types in the function's
|
||||
"declaration".
|
||||
|
||||
We'll typecheck the program in two passes. First,
|
||||
we'll go through each definition, and add any
|
||||
functions it declares to the global scope. Then,
|
||||
we will go through each definition again, and,
|
||||
if it's a function, typecheck its body using
|
||||
the previously fleshed out global scope.
|
||||
|
||||
We'll add two functions, `typecheck_first`
|
||||
and `typecheck_second` corresponding to
|
||||
these two stages. Their signatures:
|
||||
|
||||
```
|
||||
virtual void typecheck_first(type_mgr& mgr, type_env& env);
|
||||
virtual void typecheck_second(type_mgr& mgr, const type_env& env) const;
|
||||
```
|
||||
|
||||
Furthermore, in the `definition_defn`, we will keep an
|
||||
`std::vector` of `type_ptr`, in which the first element is the
|
||||
type of the __last__ parameter, and so on. We switch around
|
||||
the order of arguments because we build up the `a -> b -> ... -> x`
|
||||
type signature from the right (`->` is right associative), and
|
||||
thus we'll be creating the types right-to-left, too. We also
|
||||
add a `type_ptr` field which holds the type for the function's
|
||||
return value. We keep these two things in the `definition_defn` so
|
||||
that they persist between the two typechecking stages: we want to use
|
||||
the types from the first stage to aid in checking the body in the second stage.
|
||||
|
||||
Here's the code for the implementation:
|
||||
{{< codeblock "C++" "compiler/03/definition.cpp" >}}
|
||||
|
||||
And finally, our updated main:
|
||||
{{< codeblock "C++" "compiler/03/main.cpp" >}}
|
||||
|
||||
Notice that we manually add the types for our binary operators to the environment.
|
||||
|
||||
Let's run our project on a few examples. On our two "bad" examples, we get
|
||||
the very eloquent error:
|
||||
```
|
||||
terminate called after throwing an instance of 'int'
|
||||
[2] 9776 abort (core dumped) ./a.out < bad2.txt
|
||||
```
|
||||
That's what we get for throwing 0.
|
||||
|
||||
So far, our program has thrown in 100% of cases. Let's verify it actually
|
||||
accepts valid programs! We'll try our very first example from today,
|
||||
as well as these two:
|
||||
{{< rawblock "compiler/03/works2.txt" >}}
|
||||
{{< rawblock "compiler/03/works3.txt" >}}
|
||||
|
||||
All of our examples print the number of declarations in the program,
|
||||
which means they don't throw 0. And so, we have typechecking!
|
||||
|
||||
Reference in New Issue
Block a user