|
|
@ -6,9 +6,9 @@ description: "In this post, we extend our language with let/in expressions and l
|
|
|
|
draft: true
|
|
|
|
draft: true
|
|
|
|
---
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
Now that our language's type system is more fleshed out and pleasant to use, it's time to shift our focus to the ergonomics of the language itself. I've been mentioning `let/in` and __lambda__ expressions for a while now. The former will let us create names for expressions that are limited to a certain scope (without having to create global variable bindings), while the latter will allow us to create functions without giving them any name at all.
|
|
|
|
Now that our language's type system is more fleshed out and pleasant to use, it's time to shift our focus to the ergonomics of the language itself. I've been mentioning `let/in` expressions and __lambda expressions__ for a while now. The former will let us create names for expressions that are limited to a certain scope (without having to create global variable bindings), while the latter will allow us to create functions without giving them any name at all.
|
|
|
|
|
|
|
|
|
|
|
|
Let's take a look at `let/in` expressions first, to make sure we're all on the same page about what it is we're trying to implement. Let's start with some rather basic examples, and then move on to more complex ones. A very basic use of a `let/in` expression is, in Haskell:
|
|
|
|
Let's take a look at `let/in` expressions first, to make sure we're all on the same page about what it is we're trying to implement. Let's start with some rather basic examples, and then move on to more complex examples. The most basic use of a `let/in` expression is, in Haskell:
|
|
|
|
|
|
|
|
|
|
|
|
```Haskell
|
|
|
|
```Haskell
|
|
|
|
let x = 5 in x + x
|
|
|
|
let x = 5 in x + x
|
|
|
@ -93,7 +93,7 @@ addSingle6 x = 6 + x
|
|
|
|
-- ... and so on ...
|
|
|
|
-- ... and so on ...
|
|
|
|
```
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
But now, we end up creating several functions with almost identical bodies, with the exception of the free variables themselves. Wouldn't it be better to perform the well-known strategy of reducing code duplication by factoring out parameters, and leaving only one instance of the repeated code? We would end up with:
|
|
|
|
But now, we end up creating several functions with almost identical bodies, with the exception of the free variables themselves. Wouldn't it be better to perform the well-known strategy of reducing code duplication by factoring out parameters, and leaving only instance of the repeated code? We would end up with:
|
|
|
|
|
|
|
|
|
|
|
|
```Haskell {linenos=table}
|
|
|
|
```Haskell {linenos=table}
|
|
|
|
addToAll n xs = map (addSingle n) xs
|
|
|
|
addToAll n xs = map (addSingle n) xs
|
|
|
@ -145,48 +145,11 @@ to `let/in`, and that's what we'll be using in our language.
|
|
|
|
|
|
|
|
|
|
|
|
This technique of replacing captured variables with arguments, and pulling closures into the global scope to aid compilation, is called [Lambda Lifting](https://en.wikipedia.org/wiki/Lambda_lifting). Its name is no coincidence - lambda functions need to undergo the same kind of transformation as our nested definitions (unlike nested definitions, though, lambda functions need to be named). This is why they are included in this post together with `let/in`!
|
|
|
|
This technique of replacing captured variables with arguments, and pulling closures into the global scope to aid compilation, is called [Lambda Lifting](https://en.wikipedia.org/wiki/Lambda_lifting). Its name is no coincidence - lambda functions need to undergo the same kind of transformation as our nested definitions (unlike nested definitions, though, lambda functions need to be named). This is why they are included in this post together with `let/in`!
|
|
|
|
|
|
|
|
|
|
|
|
What are lambda functions, by the way? A lambda function is just a function
|
|
|
|
|
|
|
|
expression that doesn't have a name. For example, if we had Haskell code like
|
|
|
|
|
|
|
|
this:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```Haskell
|
|
|
|
|
|
|
|
double x = x + x
|
|
|
|
|
|
|
|
doubleList xs = map double xs
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
We could rewrite it using a lambda function as follows:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```Haskell
|
|
|
|
|
|
|
|
doubleList xs = map (\x -> x + x) xs
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
As you can see, a lambda is an expression in the form `\x -> y` where `x` can
|
|
|
|
|
|
|
|
be any variable and `y` can be any expression (including another lambda).
|
|
|
|
|
|
|
|
This represents a function that, when applied to a value `x`, will perform
|
|
|
|
|
|
|
|
the computation given by `y`. Lambdas are useful when creating single-use
|
|
|
|
|
|
|
|
functions that we don't want to make globally available.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Lifting lambda functions will effectively rewrite our program in the
|
|
|
|
|
|
|
|
opposite direction to the one shown, replacing the lambda with a reference
|
|
|
|
|
|
|
|
to a global declaration which will hold the function's body. Just like
|
|
|
|
|
|
|
|
with `let/in`, we will represent captured variables using arguments
|
|
|
|
|
|
|
|
and partial appliciation. For instance, when starting with:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```Haskell
|
|
|
|
|
|
|
|
addToAll n xs = map (\x -> n + x) xs
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
We would output the following:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```Haskell
|
|
|
|
|
|
|
|
addToAll n xs = map (lambda n) xs
|
|
|
|
|
|
|
|
lambda n x = n + x
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### Implementation
|
|
|
|
### Implementation
|
|
|
|
Now that we understand what we have to do, it's time to jump straight into
|
|
|
|
Now that we understand what we have to do, it's time to jump straight into
|
|
|
|
doing it. First, we need to refactor our current code to allow for the changes
|
|
|
|
doing it. First, we need to refactor our current code so allow for the changes
|
|
|
|
we're going to make; then, we will use the new tools we defined to implement `let/in` expressions and lambda functions.
|
|
|
|
we're going to make; then, we can implement `let/in` expressions; finally,
|
|
|
|
|
|
|
|
we'll tackle lambda functions.
|
|
|
|
|
|
|
|
|
|
|
|
#### Infrastructure Changes
|
|
|
|
#### Infrastructure Changes
|
|
|
|
When finding captured variables, the notion of _free variables_ once again
|
|
|
|
When finding captured variables, the notion of _free variables_ once again
|
|
|
@ -205,8 +168,8 @@ since it's not defined locally.
|
|
|
|
The algorithm that we used for computing free variables was rather biased.
|
|
|
|
The algorithm that we used for computing free variables was rather biased.
|
|
|
|
Previously, we only cared about the difference between a local variable
|
|
|
|
Previously, we only cared about the difference between a local variable
|
|
|
|
(defined somewhere in a function's body, or referring to one of the function's
|
|
|
|
(defined somewhere in a function's body, or referring to one of the function's
|
|
|
|
parameters) and a global variable (referring to a global function).
|
|
|
|
parameters) and a global variable (referring to a function name). This shows in
|
|
|
|
This shows in our code for `find_free`. Consider, for example, this snippet:
|
|
|
|
our code for `find_free`. Consider, for example, this segment:
|
|
|
|
|
|
|
|
|
|
|
|
{{< codelines "C++" "compiler/11/ast.cpp" 33 36 >}}
|
|
|
|
{{< codelines "C++" "compiler/11/ast.cpp" 33 36 >}}
|
|
|
|
|
|
|
|
|
|
|
@ -486,17 +449,17 @@ we're trying to operate on is global or not? I propose a flag in our
|
|
|
|
this, we update the implementation of `type_env` to map variables to
|
|
|
|
this, we update the implementation of `type_env` to map variables to
|
|
|
|
values of a struct `variable_data`:
|
|
|
|
values of a struct `variable_data`:
|
|
|
|
|
|
|
|
|
|
|
|
{{< codelines "C++" "compiler/12/type_env.hpp" 14 23 >}}
|
|
|
|
{{< codelines "C++" "compiler/12/type_env.hpp" 13 22 >}}
|
|
|
|
|
|
|
|
|
|
|
|
The `visibility` enum is defined as follows:
|
|
|
|
The `visibility` enum is defined as follows:
|
|
|
|
|
|
|
|
|
|
|
|
{{< codelines "C++" "compiler/12/type_env.hpp" 11 11 >}}
|
|
|
|
{{< codelines "C++" "compiler/12/type_env.hpp" 10 10 >}}
|
|
|
|
|
|
|
|
|
|
|
|
As you can see from the above snippet, we also added a `mangled_name` field
|
|
|
|
As you can see from the above snippet, we also added a `mangled_name` field
|
|
|
|
to the new `variable_data` struct. We will be using this field shortly. We
|
|
|
|
to the new `variable_data` struct. We will be using this field shortly. We
|
|
|
|
also add a few methods to our `type_env`, and end up with the following:
|
|
|
|
also add a few methods to our `type_env`, and end up with the following:
|
|
|
|
|
|
|
|
|
|
|
|
{{< codelines "C++" "compiler/12/type_env.hpp" 32 45 >}}
|
|
|
|
{{< codelines "C++" "compiler/12/type_env.hpp" 31 44 >}}
|
|
|
|
|
|
|
|
|
|
|
|
We will come back to `find_free` and `find_free_except`, as well as
|
|
|
|
We will come back to `find_free` and `find_free_except`, as well as
|
|
|
|
`set_mangled_name` and `get_mangled_name`. For now, we just adjust `bind` to
|
|
|
|
`set_mangled_name` and `get_mangled_name`. For now, we just adjust `bind` to
|
|
|
@ -573,7 +536,7 @@ And the latter:
|
|
|
|
|
|
|
|
|
|
|
|
{{< codelines "C++" "compiler/12/type_env.cpp" 39 45 >}}
|
|
|
|
{{< codelines "C++" "compiler/12/type_env.cpp" 39 45 >}}
|
|
|
|
|
|
|
|
|
|
|
|
We don't allow `set_mangled_name` to affect variables that are declared
|
|
|
|
We don't allow the `set_mangled_name` to affect variables that are declared
|
|
|
|
above the receiving `type_env`, and use the empty string as a 'none' value.
|
|
|
|
above the receiving `type_env`, and use the empty string as a 'none' value.
|
|
|
|
Now, when lifting data type constructors, we'll be able to use
|
|
|
|
Now, when lifting data type constructors, we'll be able to use
|
|
|
|
`set_mangled_name` to make sure constructor calls are made correctly. We
|
|
|
|
`set_mangled_name` to make sure constructor calls are made correctly. We
|
|
|
@ -667,7 +630,7 @@ void ast::translate(global_scope& scope);
|
|
|
|
|
|
|
|
|
|
|
|
The `scope` parameter and its `add_function` and `add_constructor` methods will
|
|
|
|
The `scope` parameter and its `add_function` and `add_constructor` methods will
|
|
|
|
be used to add definitions to the global scope. Each AST node will also
|
|
|
|
be used to add definitions to the global scope. Each AST node will also
|
|
|
|
use this method to implement the second step. Currently, only
|
|
|
|
uses this method to implement the second step. Currently, only
|
|
|
|
`ast_let` and `ast_lambda` will need to modify themselves - all other
|
|
|
|
`ast_let` and `ast_lambda` will need to modify themselves - all other
|
|
|
|
nodes will simply recursively call this method on their children. Let's jump
|
|
|
|
nodes will simply recursively call this method on their children. Let's jump
|
|
|
|
straight into implementing this method for `ast_let`:
|
|
|
|
straight into implementing this method for `ast_let`:
|
|
|
@ -676,7 +639,7 @@ straight into implementing this method for `ast_let`:
|
|
|
|
|
|
|
|
|
|
|
|
Since data type definitions don't really depend on anything else, we process
|
|
|
|
Since data type definitions don't really depend on anything else, we process
|
|
|
|
them first. This amounts to simply calling the `definition_data::into_globals`
|
|
|
|
them first. This amounts to simply calling the `definition_data::into_globals`
|
|
|
|
method, which itself simply calls `global_scope::add_constructor`:
|
|
|
|
methd, which itself simply calls `global_scope::add_constructor`:
|
|
|
|
|
|
|
|
|
|
|
|
{{< codelines "C++" "compiler/12/definition.cpp" 86 92 >}}
|
|
|
|
{{< codelines "C++" "compiler/12/definition.cpp" 86 92 >}}
|
|
|
|
|
|
|
|
|
|
|
@ -696,7 +659,7 @@ First, this method collects all the non-global free variables in
|
|
|
|
its body, which will need to be passed to the global definition
|
|
|
|
its body, which will need to be passed to the global definition
|
|
|
|
as arguments. It then combines this list with the arguments
|
|
|
|
as arguments. It then combines this list with the arguments
|
|
|
|
the user explicitly added to it, recursively translates
|
|
|
|
the user explicitly added to it, recursively translates
|
|
|
|
its body, and creates a new global definition using `add_function`.
|
|
|
|
its body, creates a new global definition using `add_function`.
|
|
|
|
|
|
|
|
|
|
|
|
We return to `ast_let::translate` at line 299. Here,
|
|
|
|
We return to `ast_let::translate` at line 299. Here,
|
|
|
|
we determine how many variables ended up being captured, by
|
|
|
|
we determine how many variables ended up being captured, by
|
|
|
@ -712,7 +675,7 @@ of the function, but this seems inelegant, especially since we
|
|
|
|
alreaady keep track of mangling information in `type_env`. Instead,
|
|
|
|
alreaady keep track of mangling information in `type_env`. Instead,
|
|
|
|
we create a new, local environment, in which we place an updated
|
|
|
|
we create a new, local environment, in which we place an updated
|
|
|
|
binding for the function, marking it global, and setting
|
|
|
|
binding for the function, marking it global, and setting
|
|
|
|
its mangled name to the one generated by `global_sope`. This work is done
|
|
|
|
its mangled name to one generated by `global_sope`. This work is done
|
|
|
|
on lines 301-303. We create a reference to the global function
|
|
|
|
on lines 301-303. We create a reference to the global function
|
|
|
|
using the new environment on lines 305 and 306, and apply it to
|
|
|
|
using the new environment on lines 305 and 306, and apply it to
|
|
|
|
all the implict arguments on lines 307-313. Finally, we
|
|
|
|
all the implict arguments on lines 307-313. Finally, we
|
|
|
@ -767,7 +730,7 @@ closer to the top of the G-machine stack. Thus, when we
|
|
|
|
iterate the definitions again, this time to compile their
|
|
|
|
iterate the definitions again, this time to compile their
|
|
|
|
bodies, we have to do so starting with the highest offset,
|
|
|
|
bodies, we have to do so starting with the highest offset,
|
|
|
|
and working our way down to __Update__-ing the top of the stack.
|
|
|
|
and working our way down to __Update__-ing the top of the stack.
|
|
|
|
Once the definitions have been compiled, we proceed to compiling
|
|
|
|
One the definitions have been compiled, we proceed to compiling
|
|
|
|
the `in` part of the expression as normal, using our updated
|
|
|
|
the `in` part of the expression as normal, using our updated
|
|
|
|
environment. Finally, we use __Slide__ to get rid of the definition
|
|
|
|
environment. Finally, we use __Slide__ to get rid of the definition
|
|
|
|
graphs, cleaning up the stack.
|
|
|
|
graphs, cleaning up the stack.
|
|
|
@ -775,16 +738,16 @@ graphs, cleaning up the stack.
|
|
|
|
Compiling the `ast_lambda` is far more straightforward. We just
|
|
|
|
Compiling the `ast_lambda` is far more straightforward. We just
|
|
|
|
compile the resulting partial application as we normally would have:
|
|
|
|
compile the resulting partial application as we normally would have:
|
|
|
|
|
|
|
|
|
|
|
|
{{< codelines "C++" "compiler/12/ast.cpp" 394 396 >}}
|
|
|
|
{{< codelines "C++" "compiler/12/ast.cpp" 393 395 >}}
|
|
|
|
|
|
|
|
|
|
|
|
One more thing. Let's adopt the convention of storing __mangled__
|
|
|
|
One more thing. Let's adopt the convention of storing __mangled__
|
|
|
|
names into the compilation environment. This way, rather than looking up
|
|
|
|
names into the environment. This way, rather than looking up
|
|
|
|
mangled names only for global functions, which would be a 'gotcha'
|
|
|
|
mangled names only for global functions, which would be a 'gotcha'
|
|
|
|
for anyone working on the compiler, we will always use the mangled
|
|
|
|
for anyone working on the compiler, we will always use the mangled
|
|
|
|
names during compilation. To make this change, we make sure that
|
|
|
|
names during compilation. To make this change, we make sure that
|
|
|
|
`ast_case` also uses `mangled_name`:
|
|
|
|
`ast_case` also uses `mangled_name`:
|
|
|
|
|
|
|
|
|
|
|
|
{{< codelines "C++" "compiler/12/ast.cpp" 242 242 >}}
|
|
|
|
{{< codelines "C++" "compiler/12/ast.cpp" 228 228 >}}
|
|
|
|
|
|
|
|
|
|
|
|
We also update the logic for `ast_lid::compile` to use the mangled
|
|
|
|
We also update the logic for `ast_lid::compile` to use the mangled
|
|
|
|
name information:
|
|
|
|
name information:
|
|
|
@ -812,12 +775,9 @@ void type_env::find_free_except(const type_mgr& mgr, const std::string& avoid,
|
|
|
|
Why `find_free_except`? When generalizing a variable whose type was already
|
|
|
|
Why `find_free_except`? When generalizing a variable whose type was already
|
|
|
|
stored in the environment, all the type variables we could generalize would
|
|
|
|
stored in the environment, all the type variables we could generalize would
|
|
|
|
not be 'free'. If they only occur in the type we're generalizing, though,
|
|
|
|
not be 'free'. If they only occur in the type we're generalizing, though,
|
|
|
|
we shouldn't let that stop us! More generally, if we see type variables that
|
|
|
|
we shouldn't let that stop us! Thus, when finding free type variables, we will
|
|
|
|
are only found in the same mutually recursive group as the binding we're
|
|
|
|
avoid looking at the particular variable whose type is being generalized. The
|
|
|
|
generalizing, we are free to generalize them too. Thus, we pass in
|
|
|
|
implementations of the two methods are straightforward:
|
|
|
|
a reference to a `group`, and check if a variable is a member of that group
|
|
|
|
|
|
|
|
before searching it for free type variables. The implementations of the two
|
|
|
|
|
|
|
|
methods are straightforward:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
{{< codelines "C++" "compiler/12/type_env.cpp" 4 18 >}}
|
|
|
|
{{< codelines "C++" "compiler/12/type_env.cpp" 4 18 >}}
|
|
|
|
|
|
|
|
|
|
|
@ -827,15 +787,7 @@ that have the same name as the variable we're generalizing, but aren't found
|
|
|
|
in the same scope. As far as we're concerned, they're different variables!
|
|
|
|
in the same scope. As far as we're concerned, they're different variables!
|
|
|
|
The two methods use another `find_free` method which we add to `type_mgr`:
|
|
|
|
The two methods use another `find_free` method which we add to `type_mgr`:
|
|
|
|
|
|
|
|
|
|
|
|
{{< codelines "C++" "compiler/12/type.cpp" 206 219 >}}
|
|
|
|
{{< codelines "C++" "compiler/12/type.cpp" 206 213 >}}
|
|
|
|
|
|
|
|
|
|
|
|
This one is a bit of a hack. Typically, while running `find_free`, a
|
|
|
|
|
|
|
|
`type_mgr` will resolve any type variables. However, variables from the
|
|
|
|
|
|
|
|
`forall` quantifier of a type scheme should not be resolved, since they
|
|
|
|
|
|
|
|
are explicitly generic. To prevent the type manager from erroneously resolving
|
|
|
|
|
|
|
|
such type variables, we create a new type manager that does not have
|
|
|
|
|
|
|
|
these variables bound to anything, and thus marks them as free. We then
|
|
|
|
|
|
|
|
filter these variables out of the final list of free variables.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Finally, `generalize` makes sure not to use variables that it finds free:
|
|
|
|
Finally, `generalize` makes sure not to use variables that it finds free:
|
|
|
|
|
|
|
|
|
|
|
@ -907,7 +859,7 @@ in our language, perhaps to create an infinite list of ones:
|
|
|
|
|
|
|
|
|
|
|
|
We want `sumTwo` to take the first two elements from the list,
|
|
|
|
We want `sumTwo` to take the first two elements from the list,
|
|
|
|
and return their sum. For an infinite list of ones, we expect
|
|
|
|
and return their sum. For an infinite list of ones, we expect
|
|
|
|
this sum to be equal to 2, and it is:
|
|
|
|
this sum to equal to 2, and so it does:
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
```
|
|
|
|
Result: 2
|
|
|
|
Result: 2
|
|
|
@ -921,9 +873,9 @@ dependency tracking works as expected:
|
|
|
|
{{< codeblock "text" "compiler/12/examples/letin.txt" >}}
|
|
|
|
{{< codeblock "text" "compiler/12/examples/letin.txt" >}}
|
|
|
|
|
|
|
|
|
|
|
|
Here, we have a function `mergeUntil` which, given two lists
|
|
|
|
Here, we have a function `mergeUntil` which, given two lists
|
|
|
|
and a predicate, combines the two lists as long as
|
|
|
|
and a predicate, combines the two lists until as long as
|
|
|
|
the predicate returns `True`. It does so using a convoluted
|
|
|
|
the predicate returns `True`. It does so using a convoluted
|
|
|
|
pair of mutually recursive functions, one of which
|
|
|
|
pair of two mutually recursive functions, one of which
|
|
|
|
unpacks the left list, and the other the right. Each of the
|
|
|
|
unpacks the left list, and the other the right. Each of the
|
|
|
|
functions calls the global function `if`. We also use two
|
|
|
|
functions calls the global function `if`. We also use two
|
|
|
|
definitions inside of `main` to create the two lists we're
|
|
|
|
definitions inside of `main` to create the two lists we're
|
|
|
|