Add an introduction post, and update other posts to match
This commit is contained in:
117
content/blog/00_compiler_intro.md
Normal file
117
content/blog/00_compiler_intro.md
Normal file
@@ -0,0 +1,117 @@
|
||||
---
|
||||
title: Compiling a Functional Language Using C++, Part 0 - Intro
|
||||
date: 2019-08-03T01:02:30-07:00
|
||||
tags: ["C and C++", "Functional Languages", "Compilers"]
|
||||
draft: true
|
||||
---
|
||||
During my last academic term, I was enrolled in a compilers course.
|
||||
We had a final project - develop a compiler for a basic Python subset,
|
||||
using LLVM. It was a little boring - virtually nothing about the compiler
|
||||
was __not__ covered in class, and it felt more like putting two puzzles
|
||||
pieces together than building a real project.
|
||||
|
||||
Instead, I chose to implement a compiler for a functional programming language,
|
||||
based on a wonderful book by Simon Peyton Jones, _Implementing functional languages:
|
||||
a tutorial_. Since the class was requiring the use of tools based on C++,
|
||||
that's what I used for my compiler. It was neat little project, and I
|
||||
wanted to share with everyone else how one might go about writing their
|
||||
own functional language.
|
||||
|
||||
### Motivation
|
||||
There are two main motivating factors for this series.
|
||||
|
||||
First, whenever I stumble on a compiler implementation tutorial,
|
||||
the language created is always imperative, inspired by C, C++, JavaScript,
|
||||
or Python. There are many interesting things about compiling such a language.
|
||||
However, I also think that the compilation of a functional language (including
|
||||
features like lazy evaluation) is interesting enough, and rarely covered.
|
||||
|
||||
Second, I'm inspired by books such as _Software Foundations_ that use
|
||||
source code as their text. The entire content of _Software Foundations_,
|
||||
for instance, is written as comments in Coq source file. This means
|
||||
that you can not only read the book, but also run the code and interact with it.
|
||||
This makes it very engaging to read. Because of this, I want to provide for
|
||||
each post a "snapshot" of the project code. All the code in the posts
|
||||
will directly mirror that snapshot. The code you'll be reading will be
|
||||
runnable and open.
|
||||
|
||||
### Overview
|
||||
Let's go over some preliminary information before we embark on this journey.
|
||||
|
||||
#### The "classic" stages of a compiler
|
||||
Let's take a look at the high level overview of what a compiler does.
|
||||
Conceptually, the components of a compiler are pretty cleanly separated.
|
||||
They are as gollows:
|
||||
|
||||
1. Tokenizing / lexical analysis
|
||||
2. Parsing
|
||||
3. Analysis / optimization
|
||||
5. Code Generation
|
||||
|
||||
There are many variations on this structure. Some compilers don't optimize
|
||||
at all, some translate the program text into an intermediate representation,
|
||||
an alternative way of representing the program that isn't machine code.
|
||||
In some compilers, the stages of parsing and analysis can overlap.
|
||||
In short, just like the pirate's code, it's more of a guideline than a rule.
|
||||
|
||||
#### What we'll cover
|
||||
We'll go through the stages of a compiler, starting from scratch
|
||||
and building up our project. We'll cover:
|
||||
|
||||
* Tokenizing using regular expressions and Flex.
|
||||
* Parsing using context free grammars and Bison.
|
||||
* Monomorphic type checking (including typing rules).
|
||||
* Evaluation using graph reduction and the G-Machine.
|
||||
* Compiling G-Machine instructions to machine code using LLVM.
|
||||
|
||||
We'll be creating a __lazily evaluated__, __functional__ language.
|
||||
|
||||
#### The syntax of our language
|
||||
Simon Peyton Jones, in his two works regarding compiling functional languages, remarks
|
||||
that most functional languages are very similar, and vary largely in syntax. That's
|
||||
our main degree of freedom. We want to represent the following things, for sure:
|
||||
|
||||
* Defining functions
|
||||
* Applying functions
|
||||
* Arithmetic
|
||||
* Algebraic data types (to represent lists, pairs, and the like)
|
||||
* Pattern matching (to operate on data types)
|
||||
|
||||
We can additionally support anonymous (lambda) functions, but compiling those
|
||||
is actually a bit trickier, so we will skip those for now. Arithmetic is the simplest to
|
||||
define - let's define it as we would expect: `3` is a number, `3+2*6` evaluates to 15.
|
||||
Function application isn't much more difficult - `f x` means "apply f to x", and
|
||||
`f x + g x` means sum the result of applying f to x and g to x. That is, function
|
||||
application has higher precedence, or __binds tighter__ than binary operators like plus.
|
||||
|
||||
Next, let's define the syntax for declaring a function. Why not:
|
||||
```
|
||||
defn f x = { x + x }
|
||||
```
|
||||
|
||||
As for declaring data types:
|
||||
```
|
||||
data List = { Nil, Cons Int List }
|
||||
```
|
||||
Notice that we are avoiding polymorphism here.
|
||||
|
||||
Let's also define a syntax for pattern matching:
|
||||
```
|
||||
case l of {
|
||||
Nil -> { 0 }
|
||||
Cons x xs -> { x }
|
||||
}
|
||||
```
|
||||
The above means "if the list `l` is `Nil`, then return 0, otherwise, if it's
|
||||
constructed from an integer and another list (as defined in our `data` example),
|
||||
return the integer".
|
||||
|
||||
That's it for the introduction! In the next post, we'll cover tokenizng, which is
|
||||
the first step in coverting source code into an executable program.
|
||||
|
||||
### Navigation
|
||||
Here are the posts that I've written so far for this series:
|
||||
|
||||
* [Tokenizing]({{< relref "01_compiler_tokenizing.md" >}})
|
||||
* [Parsing]({{< relref "02_compiler_parsing.md" >}})
|
||||
* [Typechecking]({{< relref "03_compiler_typechecking.md" >}})
|
||||
Reference in New Issue
Block a user