blog-static/content/blog/12_compiler_let_in_lambda/index.md

101 lines
6.9 KiB
Markdown
Raw Normal View History

---
title: Compiling a Functional Language Using C++, Part 12 - Let/In and Lambdas
date: 2020-04-20T20:15:16-07:00
tags: ["C and C++", "Functional Languages", "Compilers"]
draft: true
---
Now that our language's type system is more fleshed out and pleasant to use,
it's time to shift our focus to the ergonomics of the language itself. I've been
mentioning `let/in` expressions and __lambda expressions__ for a while now.
The former will let us create names for expressions that are limited to
a certain scope (without having to create global variable bindings), while
the latter will allow us to create functions without giving them any name at
all.
Let's take a look at `let/in` expressions first, to make sure we're all on
the same page about what it is we're trying to implement. Let's
start with some rather basic examples, and then move on to more
complex examples. The most basic use of a `let/in` expression is, in Haskell:
```Haskell
let x = 5 in x + x
```
In the above example, we bind the variable `x` to the value `5`, and then
refer to `x` twice in the expression after the `in`. The whole snippet is one
expression, evaluating to what the `in` part evaluates to. Additionally,
the variable `x` does not escape the expression -
{{< sidenote "right" "used-note" "it cannot be used anywhere else." >}}
Unless, of course, you bind it elsewhere; naturally, using <code>x</code>
here does not forbid you from re-using the variable.
{{< /sidenote >}}
Now, consider a slightly more complicated example:
```Haskell
let sum xs = foldl (+) 0 xs in sum [1,2,3]
```
Here, we're defining a _function_ `sum`,
{{< sidenote "right" "eta-note" "which takes a single argument:" >}}
Those who favor the
<a href="https://en.wikipedia.org/wiki/Tacit_programming#Functional_programming">point-free</a>
programming style may be slightly twitching right now, the words
<em>eta reduction</em> swirling in their mind. What do you know,
<code>fold</code>-based <code>sum</code> is even one of the examples
on the Wikipedia page! I assure you, I left the code as you see it
deliberately, to demonstrate a principle.
{{< /sidenote >}} the list to be summed. We will want this to be valid
in our language, as well. We will soon see how this particular feature
is related to lambda functions, and why I'm covering these two features
in the same post.
Let's step up the difficulty a bit more, with an example that,
{{< sidenote "left" "translate-note" "though it does not immediately translate to our language," >}}
The part that doesn't translate well is the whole deal with patterns in
function arguments, as well as the notion of having more than one equation
for a single function, as is the case with <code>safeTail</code>.
<br><br>
It's not that these things are <em>impossible</em> to translate; it's just
that translating them may be worthy of a post in and of itself, and would only
serve to bloat and complicate this part. What can be implemented with
pattern arguments can just as well be implemented using regular case expressions;
I dare say most "big" functional languages actually just convert from the
former to the latter as part of the compillation process.
{{< /sidenote >}} illustrates another important principle:
```Haskell
let
safeTail [] = Nothing
safeTail [x] = Just x
safeTail (_:xs) = safeTail xs
myTail = safeTail [1,2,3,4]
in
myTail
```
The principle here is that definitions in `let/in` can be __recursive and
polymorphic__. Remember the note in
[part 10]({{< relref "10_compiler_polymorphism.md" >}}) about
[let-polymorphism](https://en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_system#Let-polymorphism)? This is it: we're allowing polymorphic variable bindings,
but only when they're bound in a `let/in` expression (or at the top level).
The principles demonstrated by the last two snippets mean that compiling `let/in` expressions, at least with the power we want to give them, will require the same kind of dependency analysis we had to go through when we implemented polymorphically typed functions. That is, we will need to analyze which functions calls which other functions, and typecheck the callees before the callers. We will continue to represent callee-caller relationships using a dependency graph, in which nodes represent functions, and an edge from one function node to another means that the former function calls the latter. Below is an image of one such graph:
{{< figure src="fig_graph.png" caption="Example dependency graph without `let/in` expressions." >}}
Since we want to typecheck callees first, we effectively want to traverse the graph in reverse
topological order. However, there's a slight issue: a topological order is only defined for acyclic graphs, and it is very possible for functions in our language to mutually call each other. To deal with this, we have to find groups of mutually recursive functions, and and treat them as a single unit, thereby eliminating cycles. In the above graph, there are two groups, as follows:
{{< figure src="fig_colored_ordered.png" caption="Previous depndency graph with mutually recursive groups highlighted." >}}
As seen in the second image, according to the reverse topological order of the given graph, we will typecheck the blue group containing three functions first, since the sole function in the orange group calls one of the blue functions.
Things are more complicated now that `let/in` expressions are able to introduce their own, polymorphic and recursive declarations. However, there is a single invariant we can establish: function definitions can only depend on functions defined at the same time as them. That is, for our purposes, functions declared in the global scope can only depend on other functions declared in the global scope, and functions declared in a `let/in` expression can only depend on other functions declared in that same expression. That's not to say that a function declared in a `let/in` block inside some function `f` can't call another globally declared function `g` - rather, we allow this, but treat the situation as though `f` depends on `g`. In contrast, it's not at all possible for a global function to depend on a local function, because bindings created in a `let/in` expression do not escape the expression itself. This invariant tells us that in the presence of nested function definitions, the situation looks like this:
{{< figure src="fig_subgraphs.png" caption="Previous depndency graph augmented with `let/in` subgraphs." >}}
In the above image, some of the original nodes in our graph now contain other, smaller graphs. Those subgraphs are the graphs created by function declarations in `let/in` expressions. Just like our top-level nodes, the nodes of these smaller graphs can depend on other nodes, and even form cycles. Within each subgraph, we will have to perform the same kind of cycle detection, resulting in something like this:
{{< figure src="fig_subgraphs_colored_all.png" caption="Augmented dependency graph with mutually recursive groups highlighted." >}}