118 lines
4.9 KiB
Markdown
118 lines
4.9 KiB
Markdown
|
---
|
||
|
title: Compiling a Functional Language Using C++, Part 0 - Intro
|
||
|
date: 2019-08-03T01:02:30-07:00
|
||
|
tags: ["C and C++", "Functional Languages", "Compilers"]
|
||
|
draft: true
|
||
|
---
|
||
|
During my last academic term, I was enrolled in a compilers course.
|
||
|
We had a final project - develop a compiler for a basic Python subset,
|
||
|
using LLVM. It was a little boring - virtually nothing about the compiler
|
||
|
was __not__ covered in class, and it felt more like putting two puzzles
|
||
|
pieces together than building a real project.
|
||
|
|
||
|
Instead, I chose to implement a compiler for a functional programming language,
|
||
|
based on a wonderful book by Simon Peyton Jones, _Implementing functional languages:
|
||
|
a tutorial_. Since the class was requiring the use of tools based on C++,
|
||
|
that's what I used for my compiler. It was neat little project, and I
|
||
|
wanted to share with everyone else how one might go about writing their
|
||
|
own functional language.
|
||
|
|
||
|
### Motivation
|
||
|
There are two main motivating factors for this series.
|
||
|
|
||
|
First, whenever I stumble on a compiler implementation tutorial,
|
||
|
the language created is always imperative, inspired by C, C++, JavaScript,
|
||
|
or Python. There are many interesting things about compiling such a language.
|
||
|
However, I also think that the compilation of a functional language (including
|
||
|
features like lazy evaluation) is interesting enough, and rarely covered.
|
||
|
|
||
|
Second, I'm inspired by books such as _Software Foundations_ that use
|
||
|
source code as their text. The entire content of _Software Foundations_,
|
||
|
for instance, is written as comments in Coq source file. This means
|
||
|
that you can not only read the book, but also run the code and interact with it.
|
||
|
This makes it very engaging to read. Because of this, I want to provide for
|
||
|
each post a "snapshot" of the project code. All the code in the posts
|
||
|
will directly mirror that snapshot. The code you'll be reading will be
|
||
|
runnable and open.
|
||
|
|
||
|
### Overview
|
||
|
Let's go over some preliminary information before we embark on this journey.
|
||
|
|
||
|
#### The "classic" stages of a compiler
|
||
|
Let's take a look at the high level overview of what a compiler does.
|
||
|
Conceptually, the components of a compiler are pretty cleanly separated.
|
||
|
They are as gollows:
|
||
|
|
||
|
1. Tokenizing / lexical analysis
|
||
|
2. Parsing
|
||
|
3. Analysis / optimization
|
||
|
5. Code Generation
|
||
|
|
||
|
There are many variations on this structure. Some compilers don't optimize
|
||
|
at all, some translate the program text into an intermediate representation,
|
||
|
an alternative way of representing the program that isn't machine code.
|
||
|
In some compilers, the stages of parsing and analysis can overlap.
|
||
|
In short, just like the pirate's code, it's more of a guideline than a rule.
|
||
|
|
||
|
#### What we'll cover
|
||
|
We'll go through the stages of a compiler, starting from scratch
|
||
|
and building up our project. We'll cover:
|
||
|
|
||
|
* Tokenizing using regular expressions and Flex.
|
||
|
* Parsing using context free grammars and Bison.
|
||
|
* Monomorphic type checking (including typing rules).
|
||
|
* Evaluation using graph reduction and the G-Machine.
|
||
|
* Compiling G-Machine instructions to machine code using LLVM.
|
||
|
|
||
|
We'll be creating a __lazily evaluated__, __functional__ language.
|
||
|
|
||
|
#### The syntax of our language
|
||
|
Simon Peyton Jones, in his two works regarding compiling functional languages, remarks
|
||
|
that most functional languages are very similar, and vary largely in syntax. That's
|
||
|
our main degree of freedom. We want to represent the following things, for sure:
|
||
|
|
||
|
* Defining functions
|
||
|
* Applying functions
|
||
|
* Arithmetic
|
||
|
* Algebraic data types (to represent lists, pairs, and the like)
|
||
|
* Pattern matching (to operate on data types)
|
||
|
|
||
|
We can additionally support anonymous (lambda) functions, but compiling those
|
||
|
is actually a bit trickier, so we will skip those for now. Arithmetic is the simplest to
|
||
|
define - let's define it as we would expect: `3` is a number, `3+2*6` evaluates to 15.
|
||
|
Function application isn't much more difficult - `f x` means "apply f to x", and
|
||
|
`f x + g x` means sum the result of applying f to x and g to x. That is, function
|
||
|
application has higher precedence, or __binds tighter__ than binary operators like plus.
|
||
|
|
||
|
Next, let's define the syntax for declaring a function. Why not:
|
||
|
```
|
||
|
defn f x = { x + x }
|
||
|
```
|
||
|
|
||
|
As for declaring data types:
|
||
|
```
|
||
|
data List = { Nil, Cons Int List }
|
||
|
```
|
||
|
Notice that we are avoiding polymorphism here.
|
||
|
|
||
|
Let's also define a syntax for pattern matching:
|
||
|
```
|
||
|
case l of {
|
||
|
Nil -> { 0 }
|
||
|
Cons x xs -> { x }
|
||
|
}
|
||
|
```
|
||
|
The above means "if the list `l` is `Nil`, then return 0, otherwise, if it's
|
||
|
constructed from an integer and another list (as defined in our `data` example),
|
||
|
return the integer".
|
||
|
|
||
|
That's it for the introduction! In the next post, we'll cover tokenizng, which is
|
||
|
the first step in coverting source code into an executable program.
|
||
|
|
||
|
### Navigation
|
||
|
Here are the posts that I've written so far for this series:
|
||
|
|
||
|
* [Tokenizing]({{< relref "01_compiler_tokenizing.md" >}})
|
||
|
* [Parsing]({{< relref "02_compiler_parsing.md" >}})
|
||
|
* [Typechecking]({{< relref "03_compiler_typechecking.md" >}})
|