Finish implementation description in part 12.

2020-06-20 20:46:54 -07:00 · 2020-06-20 20:46:54 -07:00 · c496be1031
commit c496be1031
parent 21851e3a9c
1 changed files with 242 additions and 54 deletions
--- a/content/blog/12_compiler_let_in_lambda/index.md
+++ b/content/blog/12_compiler_let_in_lambda/index.md
@ -574,69 +574,257 @@ The
 observant reader will have noticed that we have a new method: `translate`.
 This is a new method for all `ast` descendants, and will implement the
 steps of moving definitions to the global scope and transforming the 
-program. Before we get to it, though, let's quickly see the parsing
-rules for `ast_let` and `ast_lambda`:
+program. Before we get to it, though, let's look at the other relevant
+pieces of code for `ast_let` and `ast_lambda`. First, their grammar
+rules in `parser.y`:

 {{< codelines "text" "compiler/12/parser.y" 107 115 >}}

 This is pretty similar to the rest of the grammar, so I will give this no
-further explanation.
+further explanation. Next, their `find_free` and `typecheck` code.
+We can start with `ast_let`:

-{{< todo >}}
-Explain typechecking for lambda functions and let/in expressions.
-{{< /todo >}}
+{{< codelines "C++" "compiler/12/ast.cpp" 275 289 >}}

-{{< todo >}}
-Explain free variable detection for lambda functions and let/in expressions.
-{{< /todo >}}
+As you can see, `ast_let::find_free` works in a similar manner to `ast_case::find_free`.
+It finds the free variables in the `in` node as well as in each of the definitions
+(taking advantage of the fact that `definition_group::find_free` populates the
+given set with "far away" free variables). It then filters out any variables bound in
+the `let` from the set of free variables in `in`, and returns the result.
+
+Typechecking in `ast_let` relies on `definition_group::typecheck`, which holds
+all of the required functionality for checking the new definitions.
+Once the definitions are typechecked, we use their type information to
+typecheck the `in` part of the expression (passing `definitions.env` to the
+call to `typecheck` to make the new definitions visible).
+
+Next, we look at `ast_lambda`:
+
+{{< codelines "C++" "compiler/12/ast.cpp" 344 366 >}}
+
+Again, `ast_lambda::find_free` works similarly to `definition_defn`, stripping
+the variables expected by the function from the body's list of free variables.
+Also like `definition_defn`, this new node remembers the free variables in
+its body, which we will later use for lifting.
+
+Typechecking in this node also proceeds similarly to `definition_defn`. We create
+new type variables for each parameter and for the return value, and build up
+a function type called `full_type`. We then typecheck the body using the
+new environment (which now includes the variables), and return the function type we came up with.

 #### Translation
-While collecting all of the definitions into a global list, we can
-also do some program transformations. Let's return to our earlier example:
+Recalling the transformations we described earlier, we can observe two
+major steps to what we have to do:

-```Haskell {linenos=table}
-fourthPower x = square * square
-    where
-        square = x * x
-```
-
-We said it should be translated into something like:
-
-```Haskell {linenos=table}
-fourthPower x = square * square
-    where square = square' x
-square' x = x * x
-```
-
-In our language, the original program above would be:
-
-```text {linenos=table}
-defn fourthPower x = { 
-    let {
-        defn square = { x * x }
-    } in {
-        square * square
-    }
-}
-```
-
-And the translated version would be:
-
-```text {linenos=table}
-defn fourthPower x = { 
-    let {
-        defn square = { square' x }
-    } in {
-        square * square
-    }
-}
-defn square' x = { x * x }
-```
-
-Setting aside for the moment the naming of `square'` and `square`, we observe
-that we want to perform the following operations:
-
-1. Move the body of the original definition of `square` into its own
+1. Move the body of the original definition into its own
 global definition, adding all the captured variables as arguments.
 2. Replace the right hand side of the `let/in` expression with an application
 of the global definition to the variables it requires.
+
+We will implement these in a new `translate` method, with the following
+signature:
+
+```C++
+void ast::translate(global_scope& scope);
+```
+
+The `scope` parameter and its `add_function` and `add_constructor` methods will
+be used to add definitions to the global scope. Each AST node will also
+uses this method to implement the second step. Currently, only
+`ast_let` and `ast_lambda` will need to modify themselves - all other
+nodes will simply recursively call this method on their children. Let's jump
+straight into implementing this method for `ast_let`:
+
+{{< codelines "C++" "compiler/12/ast.cpp" 291 316 >}}
+
+Since data type definitions don't really depend on anything else, we process
+them first. This amounts to simply calling the `definition_data::into_globals`
+methd, which itself simply calls `global_scope::add_constructor`:
+
+{{< codelines "C++" "compiler/12/definition.cpp" 86 92 >}}
+
+Note how `into_globals` updates the mangled name of its constructor
+via `set_mangled_name`. This will help us decide which global
+function to call during code generation. More on that later.
+
+Starting with line 295, we start processing the function definitions
+in the `let/in` expression. We remember how many arguments were
+explicitly added to the function definition, and then call the
+definition's `into_global` method. This method is implemented
+as follows:
+
+{{< codelines "C++" "compiler/12/definition.cpp" 40 49 >}}
+
+First, this method collects all the non-global free variables in
+its body, which will need to be passed to the global definition
+as arguments. It then combines this list with the arguments
+the user explicitly added to it, recursively translates
+its body, creates a new global definition using `add_function`.
+
+We return to `ast_let::translate` at line 299. Here,
+we determine how many variables ended up being captured, by
+subtracting the number of explicit parameters from the total
+number of parameters the new global definition has. This number,
+combined with the fact that we added all the 'implict' arguments
+to the function to the beginning of the list, will let us
+iterate over all implict arguments, creating a chain of partial
+function applications.
+
+But how do we build the application? We could use the mangled name
+of the function, but this seems inelegant, especially since we
+alreaady keep track of mangling information in `type_env`. Instead,
+we create a new, local environment, in which we place an updated
+binding for the function, marking it global, and setting
+its mangled name to one generated by `global_sope`. This work is done
+on lines 301-303. We create a reference to the global function
+using the new environment on lines 305 and 306, and apply it to
+all the implict arguments on lines 307-313. Finally, we
+add the new 'basic' equation into `translated_definitions`. 
+
+Let's take a look at translating `ast_lambda` next:
+
+{{< codelines "C++" "compiler/12/ast.cpp" 368 392 >}}
+
+Once again, on lines 369-375 we find all the arguments to the
+global definition. On lines 377-382 we create a new global
+function and a mangled environment, and start creating the
+chain of function applications. On lines 384-390, we actually
+create the arguments and apply the function to them. Finally,
+on line 391, we store this new chain of applications in the
+`translated` field.
+
+#### Compilation
+There's still another piece of the puzzle missing, and
+that's how we're going to compile `let/in` expressions into
+G-machine instructions. We have allowed these expressions
+to be recursive, and maybe even mutually recursive. This
+worked fine with global definitions; instead of specifying
+where on the stack we can find the reference to a global
+function, we just created a new global node, and called
+it good. Things are different now, though, because the definitions
+we're referencing aren't _just_ global functions; they are partial
+applications of a global function. And to reference themselves,
+or their neighbors, they have to have a handle on their own nodes. We do this
+using an instruction that we foreshadowed in part 5, but didn't use
+until just now: __Alloc__.
+
+__Alloc__ creates placeholder nodes on the stack. These nodes
+are indirections, the same kind that we use for lazy evaluation
+and sharing elsewhere. We create an indirection node for every
+definition that we then build; when an expression needs access
+to a definition, we give it the indirection node. After
+building the partial application graph for an expression,
+we use __Update__, making the corresponding indirection
+point to this new graph. This way, the 'handle' to a 
+definition is always accessible, and once the definition's expression
+is built, the handle correctly points to it. Here's the implementation:
+
+{{< codelines "C++" "compiler/12/ast.cpp" 319 332 >}}
+
+First, we create the __Alloc__ instruction. Then, we update
+our environment to map each definition name to a location
+within the newly allocated batch of nodes. Since we iterate
+the definitions in order, 'pushing' them into our environment,
+we end up with the convention of having the later definitions
+closer to the top of the G-machine stack. Thus, when we
+iterate the definitions again, this time to compile their
+bodies, we have to do so starting with the highest offset,
+and working our way down to __Update__-ing the top of the stack.
+One the definitions have been compiled, we proceed to compiling
+the `in` part of the expression as normal, using our updated
+environment. Finally, we use __Slide__ to get rid of the definition
+graphs, cleaning up the stack.
+
+Compiling the `ast_lambda` is far more straightforward. We just
+compile the resulting partial application as we normally would have:
+
+{{< codelines "C++" "compiler/12/ast.cpp" 393 395 >}}
+
+One more thing. Let's adopt the convention of storing __mangled__
+names into the environment. This way, rather than looking up
+mangled names only for global functions, which would be a 'gotcha'
+for anyone working on the compiler, we will always use the mangled
+names during compilation. To make this change, we make sure that
+`ast_case` also uses `mangled_name`:
+
+{{< codelines "C++" "compiler/12/ast.cpp" 228 228 >}}
+
+We also update the logic for `ast_lid::compile` to use the mangled
+name information:
+
+{{< codelines "C++" "compiler/12/ast.cpp" 52 58 >}}
+
+#### Fixing Type Generalization
+This is a rather serious bug that made its way into the codebase
+since part 10. Recall that we can only generalize type variables
+that are free in the environment. Thus far, we haven't done that,
+and we really should: I ran into incorrectly inferred types
+in my first test of the `let/in` language feature.
+
+We need to make our code capable of finding free variables in the
+type environment. This requires the `type_mgr`, which associates
+with type variables the real types they represent, if any. We
+thus create methods with signatures as follows:
+
+```C++
+void type_env::find_free(const type_mgr& mgr, std::set<std::string>& into) const;
+void type_env::find_free_except(const type_mgr& mgr, const std::string& avoid,
+        std::set<std::string>& into) const;
+```
+
+Why `find_free_except`? When generalizing a variable whose type was already
+stored in the environment, all the type variables we could generalize would
+not be 'free'. If they only occur in the type we're generalizing, though,
+we shouldn't let that stop us! Thus, when finding free type variables, we will
+avoid looking at the particular variable whose type is being generalized. The
+implementations of the two methods are straightforward:
+
+{{< codelines "C++" "compiler/12/type_env.cpp" 4 18 >}}
+
+Note that `find_free_except` calls `find_free` in its recursive call. This
+is not a bug: we _do_ want to include free type variables from bindings
+that have the same name as the variable we're generalizing, but aren't found
+in the same scope. As far as we're concerned, they're different variables!
+The two methods use another `find_free` method which we add to `type_mgr`:
+
+{{< codelines "C++" "compiler/12/type.cpp" 206 213 >}}
+
+Finally, `generalize` makes sure not to use variables that it finds free:
+
+{{< codelines "C++" "compiler/12/type_env.cpp" 68 81 >}}
+
+#### Putting It All Together
+All that's left is to tie the parts we've created into one coherent whole
+in `main.cpp`. First of all, since we moved all of the LLVM-related
+code into `global_scope`, we can safely replace that functionality
+in `main.cpp` with a method call:
+
+{{< codelines "C++" "compiler/12/main.cpp" 121 132 >}}
+
+On the other hand, we need top-level logic to handle `definition_group`s.
+This is pretty straightforward, and the main trick is to remember to
+update the function's mangled name. Right now, depending on the choice
+of manging algorithm, it's possible even for top-level functions to
+have their names changed, and we must account for that. The whole code is:
+
+{{< codelines "C++" "compiler/12/main.cpp" 52 62 >}}
+
+Finally, we call `global_scope`'s methods in `main()`:
+
+{{< codelines "C++" "compiler/12/main.cpp" 148 151 >}}
+
+That's it! Please note that I've mentioned or hinted at minor changes to the
+codebase. Detailing every single change this late into the project is
+needlessly time consuming and verbose; Gitea reports that I've made 677
+insertions into and 215 deletions from the code. As always, I provide
+the [source code for the compiler](https://dev.danilafe.com/Web-Projects/blog-static/src/branch/master/code/compiler/12), and you can also take a look at the
+[Gitea-generated diff](https://dev.danilafe.com/Web-Projects/blog-static/compare/1905601aaa96d11c771eae9c56bb9fc105050cda...21851e3a9c552383ee8c4bc878ea06e7d28c333e)
+at the time of writing. If you want to follow along, feel free to check
+them out!
+
+### Running Our Programs
+It's important to test all the language features that we just added. This
+includes recursive definitions, nested function dependency cycles, and
+uses of lambda functions. Some of the following examples will be rather
+silly, but they should do a good job of checking that everything works
+as we expect.