Add more content to part 12.

2020-06-19 02:22:08 -07:00
parent 600d5b91ea
commit 21851e3a9c
1 changed files with 172 additions and 0 deletions
--- a/content/blog/12_compiler_let_in_lambda/index.md
+++ b/content/blog/12_compiler_let_in_lambda/index.md
@@ -418,6 +418,178 @@ Recall that in this case, we need not have two methods for declaring
 and generating LLVM, since constructors don't reference other constructors,
 and are always generated before any function definitions.

+#### Visibility
+Should we really be turning _all_ free variables in a function definition
+into arguments? Consider the following piece of Haskell code:
+
+```Haskell {linenos=table}
+add x y = x + y
+mul x y = x * y
+something = mul (add 1 3) 3
+```
+
+In the definition of `something`, `mul` and `add` occur free.
+A very naive lifting algorithm might be tempted to rewrite such a program
+as follows:
+
+```Haskell {linenos=table}
+add x y = x + y
+mul x y = x * y
+something' add mul = mul (add 1 3) 3
+something = something' add mul
+```
+
+But that's absurd! Not only are `add` and `mul` available globally,
+but such a rewrite generates another definition with free variables,
+which means we didn't really improve our program in any way. From this
+example, we can see that we don't want to be turning reference to global
+variables into function parameters. But how can we tell if a variable
+we're trying to operate on is global or not? I propose a flag in our
+`type_env`, which we'll augment to be used as a symbol table. To do
+this, we update the implementation of `type_env` to map variables to
+values of a struct `variable_data`:
+
+{{< codelines "C++" "compiler/12/type_env.hpp" 13 22 >}}
+
+The `visibility` enum is defined as follows:
+
+{{< codelines "C++" "compiler/12/type_env.hpp" 10 10 >}}
+
+As you can see from the above snippet, we also added a `mangled_name` field
+to the new `variable_data` struct. We will be using this field shortly. We
+also add a few methods to our `type_env`, and end up with the following:
+
+{{< codelines "C++" "compiler/12/type_env.hpp" 31 44 >}}
+
+We will come back to `find_free` and `find_free_except`, as well as
+`set_mangled_name` and `get_mangled_name`. For now, we just adjust `bind` to
+take a visibility parameter that defaults to `local`, and implement
+`is_global`:
+
+{{< codelines "C++" "compiler/12/type_env.cpp" 27 32 >}}
+
+Remember the `visibility::global` in `parser.y`? This is where that comes in.
+Specifically, we recall that `definition_defn::insert_types` is responsible
+for placing function types into the environment, making them accessible
+during typechecking later. At this time, we already need to know whether
+or not the definitions are global or local (so that we can create the binding).
+Thus, we add `visibility` as a parameter to `insert_types`:
+
+{{< codelines "C++" "compiler/12/definition.hpp" 44 44 >}}
+
+Since we are now moving from manually wrangling definitions towards using
+`definition_group`, we make it so that the group itself provides this
+argument. To do this, we add the `visibility` field from before to it,
+and set it in the parser. One more thing: since constructors never
+capture variables, we can always move them straight to the global
+scope, and thus, we'll always mark them with `visibility::global`.
+
+#### Managing Mangled Names
+Just mangling names is not enough. Consider the following program:
+
+```text {linenos=table}
+defn packOne x = {
+    let {
+        data Packed a = { Pack a }
+    } in {
+        Pack x
+    }
+}
+defn packTwo x = {
+    let {
+        data Packed a = { Pack a }
+    } in {
+        Pack x
+    }
+}
+```
+
+{{< sidenote "right" "lifting-types-note" "Lifting the data type declarations" >}}
+We are actually not <em>quite</em> doing something like the following snippet.
+The reason for this is that we don't mangle the names for types. I pointed
+out this potential issue in a sidenote in the previous post. Since the size
+of this post is already balooning, I will not deal with this issue here.
+Even at the end of this post, our compiler will not be able to distinguish
+between the two <code>Packed</code> types. We will hopefully get to it later.
+{{< /sidenote >}} and their constructors into the global
+scope gives us something like:
+
+``` {linenos=table}
+data Packed a = { Pack a }
+data Packed_1 a = { Pack_1 a }
+defn packOne x = { Pack x }
+defn packTwo x = { Pack_1 x }
+```
+
+Notice that we had to rename one of the calls to `Pack` to be a call to
+be `Pack_1`. To actually change our AST to reference `Pack_1`, we'd have
+to traverse the whole tree, and make sure to keep track of definitions
+that could shadow `Pack` further down. This is cumbersome. Instead, we
+can mark a variable as referring to a mangled version of itself, and
+access this information when needed. To do this, we add the `mangled_name`
+field to the `variable_data` struct as we've seen above, and implement
+the `set_mangled_name` and `get_mangled_name` methods. The former:
+
+{{< codelines "C++" "compiler/12/type_env.cpp" 34 37 >}}
+
+And the latter:
+
+{{< codelines "C++" "compiler/12/type_env.cpp" 39 45 >}}
+
+We don't allow the `set_mangled_name` to affect variables that are declared
+above the receiving `type_env`, and use the empty string as a 'none' value.
+Now, when lifting data type constructors, we'll be able to use
+`set_mangled_name` to make sure constructor calls are made correctly. We
+will also be able to use this in other cases, like the translation
+of local function definitions.
+
+#### New AST Nodes
+Finally, it's time for us to add new AST nodes to our language.
+Specifically, these nodes are `ast_let` (for `let/in` expressions)
+and `ast_lambda` for lambda functions. We declare them as follows:
+
+{{< codelines "C++" "compiler/12/ast.hpp" 131 166 >}}
+
+In `ast_let`, the `definitions` field corresponds to the original definitions
+given by the user in the program, and the `in` field corresponds to the
+expression which uses these definitions. In the process of lifting, though,
+we eventually transfer each of the definitions to the global scope, replacing
+their right hand sides with partial applications. After this transformation,
+all the data type definitions are effectively gone, and all the function
+definitions are converted into the simple form `x = f a1 ... an`. We hold
+these post-transformation equations in the `translated_definitions` field,
+and it's them that we compile in this node's `compile` method.
+
+In `ast_lambda`, we allow multiple parameters (like Haskell's `\x y -> x + y`).
+We store these parameters in the `params` field, and we store the lambda's
+expression in the `body` field. Just like `definition_defn`,
+the `ast_lambda` node maintains a separate environment in which its children
+have been bound, and a list of variables that occur freely in its body. The
+former is used for typechecking, while the latter is used for lifting.
+Finally, the `translated` field holds the lambda function's form
+after its body has been transformed into a global function. Similarly to
+`ast_let`, this node will be in the form `f a1 ... an`.
+
+The
+observant reader will have noticed that we have a new method: `translate`.
+This is a new method for all `ast` descendants, and will implement the
+steps of moving definitions to the global scope and transforming the 
+program. Before we get to it, though, let's quickly see the parsing
+rules for `ast_let` and `ast_lambda`:
+
+{{< codelines "text" "compiler/12/parser.y" 107 115 >}}
+
+This is pretty similar to the rest of the grammar, so I will give this no
+further explanation.
+
+{{< todo >}}
+Explain typechecking for lambda functions and let/in expressions.
+{{< /todo >}}
+
+{{< todo >}}
+Explain free variable detection for lambda functions and let/in expressions.
+{{< /todo >}}
+
 #### Translation
 While collecting all of the definitions into a global list, we can
 also do some program transformations. Let's return to our earlier example: