Add design document
This commit is contained in:
parent
7ee084385c
commit
03581f7af0
87
DESIGN.md
Normal file
87
DESIGN.md
Normal file
|
@ -0,0 +1,87 @@
|
||||||
|
# Lily Design
|
||||||
|
Lily is a lazily evaluated functional language based on basic graph reduction. The graph construction is performed via a G-Machine (an abstract virtual stack machine), and is then mapped to LLVM constructs. Due to the fact that the G-Machine uses a stack, some stack operations are very common, and are written in C in a provided "runtime" file.
|
||||||
|
|
||||||
|
## Compiling The Lily Compiler
|
||||||
|
On my local machine, I use CMake with LLVM's config program. This doesn't seem to work on the server, so I have written a makefile that should be able to compile lily just the same.
|
||||||
|
|
||||||
|
## Compiling Lily Code
|
||||||
|
To compile a lily file in code.lily:
|
||||||
|
```
|
||||||
|
make # Build the compiler
|
||||||
|
./lily < code.lily
|
||||||
|
gcc runtime.c lily.o -o prog
|
||||||
|
./prog
|
||||||
|
```
|
||||||
|
Lily requires a "main" function to link properly, which has to have 0 parameters. Lily will only print the result of evaluating the program if it is a number.
|
||||||
|
|
||||||
|
## Quirks
|
||||||
|
1) The stack size is fixed to 16. That's pretty small, and might cause assertion errors.
|
||||||
|
2) Not having a main function will cause a linker error.
|
||||||
|
3) If a main function has arguments, this will cause a stack assertion error.
|
||||||
|
|
||||||
|
## Stages of the Compilation Process
|
||||||
|
### LALR(1) Parsing Via Pegasus
|
||||||
|
Pegasus is a project I worked on during fall term, which implements a language agnostic LALR parser generator. Since unlike Bioson, Pegasus is not available on OSU servers (or most other machines, for that matter, as it is a homebrew program), I provide a pre-generated parser file, as well as a source grammar file. A limitaton (or rather, a design choice) of pegasus is that it does not perform semantic actions, and simply generates a parse tree. Code generated by pegasus is in `src/parser.c`. After a parse tree is generated, it is converted into an Abstract Syntax Tree in `src/parser.cpp`.
|
||||||
|
### Type Checking and Inference
|
||||||
|
Lily does not require the user to speciy types for variables. As such, it needs to infer these types from the code, and verify that programs are type-correct, as this enables more efficient code generation later. For instance, the function
|
||||||
|
```
|
||||||
|
defn add x y = { x + y }
|
||||||
|
```
|
||||||
|
Is inferred to be a function of `Int->Int->Int`, as `(+)` requires its operands to be numbers. Lily's numbers are closed under all binary operators (that is, all binary operators are of type `Int->Int->Int`). Type checking is done using simple substitution. When a function is created and the types of the parameters are not known, it is assumed to be `? -> ?`. Then, as more information is provided (for instance, when a parameter is used in an addition, implying that it is a number), the gaps are filled in.
|
||||||
|
### G-Machine Code Generation
|
||||||
|
A G-Machine is used to quickly construct graphs. The evaluation of functional programs is frequently backed by graph reduction, and every time a function is called a new instance of a graph is to be constructed. Since this always follows the same steps (the function body is always the same structure), this can be optimized into a series of stack-based operations similar in a way to converting arithmetic to reverse polish notation. Lily's compiler uses an "environment" to keep track of the positions of variables on the ever-changing stack, which is represented simply by a linked list, with each node increasing the distance from the top of the stack. Generated instructions may look something like the following:
|
||||||
|
```
|
||||||
|
push 1
|
||||||
|
push 1
|
||||||
|
pushglobal 2 @add
|
||||||
|
mkapp
|
||||||
|
mkapp
|
||||||
|
```
|
||||||
|
In this case, a variable from the stack at offset 1 is pushed onto the stack, followed by the variable that was formerly at offset 0 (now at offset 1 after the first variable was pushed). Then, a "global variable" is pushed. mkapp pops two values from the stack and converts them into a graph node representing function application. Running mkapp twice results in the application of the function @add to two parameters.
|
||||||
|
### LLVM Code Generation
|
||||||
|
G-Machine instructions can fairly easy be converted into a sequence of LLVM IR instructions. The only problen is that LLVM is not stack-based, while the G-Machine is. This leads to a large amount of `push`/`pop` calls in the generated code, even ones that could potentially be optimized. Because `push`, `pop`, and memory allocation instructions are so common, and because they are always the same, they are placed in a separate C file called `runtime.c`. Stubs for these functions are generated on the LLVM side, allowing the IR to call these functions without knowing their body. These calls are resolved when the program is linked.
|
||||||
|
|
||||||
|
Because graph nodes can be of several different types, they are best represented using a tagged union. However, since LLVM does not support tagged unions (or unions in general), we simply declare several structs that are semantically not related. For instance:
|
||||||
|
```
|
||||||
|
struct node_parent {
|
||||||
|
char tag;
|
||||||
|
};
|
||||||
|
|
||||||
|
struct node_num {
|
||||||
|
char tag;
|
||||||
|
int value;
|
||||||
|
};
|
||||||
|
```
|
||||||
|
These structs all have the same "initial structure", allowing them to be, in a way, compatible with each other. The "tag" of a node is checked to properly cast a node (LLVM's pointer cast is used for this). To get the fields in the struct, LLVM's GetElementPtr instruction is used, which allows the IR to load / store a value inside the struct. The most complicated instructions are `op`, which unwraps two number nodes and adds their values, `eq`, which unwraps two number nodes, compares their values, and pushes depending on that value either a "True" or "False", and `jump`, which is used for case analysis.
|
||||||
|
|
||||||
|
### Linking
|
||||||
|
Finally, the `runtime.c` library expects there to be a function `main` in the Lily source code, which takes no parameters. This function will be used as the entry point - after the runtime sets up the stack, the execution will be transferred to the LLVM-generated code (which will frequently call back to C to use malloc or to manipulate the stack).
|
||||||
|
|
||||||
|
## Sample Programs
|
||||||
|
The most basic Lily program can just return 0:
|
||||||
|
```
|
||||||
|
main = { 0 }
|
||||||
|
```
|
||||||
|
This is boring. We can use addition and subtraction to spice it up!
|
||||||
|
```
|
||||||
|
main = { 1 - 1 + 3 }
|
||||||
|
```
|
||||||
|
Lily supports arbitrary data type definitions (if you've used Haskell, these should be familliar). For instance, we can define and use a `Maybe` data structure as follows:
|
||||||
|
```
|
||||||
|
data Maybe = { Ok(Int), Nothing }
|
||||||
|
defn maybeOr42 m = {
|
||||||
|
case m of {
|
||||||
|
Ok(i) -> { i }
|
||||||
|
Nothing -> { 42 }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
defn main = { maybeOr42 (Ok 3) }
|
||||||
|
```
|
||||||
|
Notice how during declaration, constructors for data types have the form "C(P1, P2)", while in an expression, they are treated like normal functions.
|
||||||
|
|
||||||
|
Finally, Lily has lazy evaluation, allowing for the construction of infinite data structures:
|
||||||
|
```
|
||||||
|
data IntList = { Nil, Cons(Int, IntList) }
|
||||||
|
defn ones = { Cons 1 ones }
|
||||||
|
```
|
||||||
|
This will not take any computation time unless something else uses the values from the list.
|
Loading…
Reference in New Issue
Block a user