Compare commits
4 Commits
17c5486551
...
master
| Author | SHA1 | Date | |
|---|---|---|---|
| bca70ce2a0 | |||
| 03581f7af0 | |||
| 7ee084385c | |||
| d32d92e241 |
87
DESIGN.md
Normal file
87
DESIGN.md
Normal file
@@ -0,0 +1,87 @@
|
||||
# Lily Design
|
||||
Lily is a lazily evaluated functional language based on basic graph reduction. The graph construction is performed via a G-Machine (an abstract virtual stack machine), and is then mapped to LLVM constructs. Due to the fact that the G-Machine uses a stack, some stack operations are very common, and are written in C in a provided "runtime" file.
|
||||
|
||||
## Compiling The Lily Compiler
|
||||
On my local machine, I use CMake with LLVM's config program. This doesn't seem to work on the server, so I have written a makefile that should be able to compile lily just the same.
|
||||
|
||||
## Compiling Lily Code
|
||||
To compile a lily file in code.lily:
|
||||
```
|
||||
make # Build the compiler
|
||||
./lily < code.lily
|
||||
gcc runtime.c lily.o -o prog
|
||||
./prog
|
||||
```
|
||||
Lily requires a "main" function to link properly, which has to have 0 parameters. Lily will only print the result of evaluating the program if it is a number.
|
||||
|
||||
## Quirks
|
||||
1) The stack size is fixed to 16. That's pretty small, and might cause assertion errors.
|
||||
2) Not having a main function will cause a linker error.
|
||||
3) If a main function has arguments, this will cause a stack assertion error.
|
||||
|
||||
## Stages of the Compilation Process
|
||||
### LALR(1) Parsing Via Pegasus
|
||||
Pegasus is a project I worked on during fall term, which implements a language agnostic LALR parser generator. Since unlike Bioson, Pegasus is not available on OSU servers (or most other machines, for that matter, as it is a homebrew program), I provide a pre-generated parser file, as well as a source grammar file. A limitaton (or rather, a design choice) of pegasus is that it does not perform semantic actions, and simply generates a parse tree. Code generated by pegasus is in `src/parser.c`. After a parse tree is generated, it is converted into an Abstract Syntax Tree in `src/parser.cpp`.
|
||||
### Type Checking and Inference
|
||||
Lily does not require the user to speciy types for variables. As such, it needs to infer these types from the code, and verify that programs are type-correct, as this enables more efficient code generation later. For instance, the function
|
||||
```
|
||||
defn add x y = { x + y }
|
||||
```
|
||||
Is inferred to be a function of `Int->Int->Int`, as `(+)` requires its operands to be numbers. Lily's numbers are closed under all binary operators (that is, all binary operators are of type `Int->Int->Int`). Type checking is done using simple substitution. When a function is created and the types of the parameters are not known, it is assumed to be `? -> ?`. Then, as more information is provided (for instance, when a parameter is used in an addition, implying that it is a number), the gaps are filled in.
|
||||
### G-Machine Code Generation
|
||||
A G-Machine is used to quickly construct graphs. The evaluation of functional programs is frequently backed by graph reduction, and every time a function is called a new instance of a graph is to be constructed. Since this always follows the same steps (the function body is always the same structure), this can be optimized into a series of stack-based operations similar in a way to converting arithmetic to reverse polish notation. Lily's compiler uses an "environment" to keep track of the positions of variables on the ever-changing stack, which is represented simply by a linked list, with each node increasing the distance from the top of the stack. Generated instructions may look something like the following:
|
||||
```
|
||||
push 1
|
||||
push 1
|
||||
pushglobal 2 @add
|
||||
mkapp
|
||||
mkapp
|
||||
```
|
||||
In this case, a variable from the stack at offset 1 is pushed onto the stack, followed by the variable that was formerly at offset 0 (now at offset 1 after the first variable was pushed). Then, a "global variable" is pushed. mkapp pops two values from the stack and converts them into a graph node representing function application. Running mkapp twice results in the application of the function @add to two parameters.
|
||||
### LLVM Code Generation
|
||||
G-Machine instructions can fairly easy be converted into a sequence of LLVM IR instructions. The only problen is that LLVM is not stack-based, while the G-Machine is. This leads to a large amount of `push`/`pop` calls in the generated code, even ones that could potentially be optimized. Because `push`, `pop`, and memory allocation instructions are so common, and because they are always the same, they are placed in a separate C file called `runtime.c`. Stubs for these functions are generated on the LLVM side, allowing the IR to call these functions without knowing their body. These calls are resolved when the program is linked.
|
||||
|
||||
Because graph nodes can be of several different types, they are best represented using a tagged union. However, since LLVM does not support tagged unions (or unions in general), we simply declare several structs that are semantically not related. For instance:
|
||||
```
|
||||
struct node_parent {
|
||||
char tag;
|
||||
};
|
||||
|
||||
struct node_num {
|
||||
char tag;
|
||||
int value;
|
||||
};
|
||||
```
|
||||
These structs all have the same "initial structure", allowing them to be, in a way, compatible with each other. The "tag" of a node is checked to properly cast a node (LLVM's pointer cast is used for this). To get the fields in the struct, LLVM's GetElementPtr instruction is used, which allows the IR to load / store a value inside the struct. The most complicated instructions are `op`, which unwraps two number nodes and adds their values, `eq`, which unwraps two number nodes, compares their values, and pushes depending on that value either a "True" or "False", and `jump`, which is used for case analysis.
|
||||
|
||||
### Linking
|
||||
Finally, the `runtime.c` library expects there to be a function `main` in the Lily source code, which takes no parameters. This function will be used as the entry point - after the runtime sets up the stack, the execution will be transferred to the LLVM-generated code (which will frequently call back to C to use malloc or to manipulate the stack).
|
||||
|
||||
## Sample Programs
|
||||
The most basic Lily program can just return 0:
|
||||
```
|
||||
main = { 0 }
|
||||
```
|
||||
This is boring. We can use addition and subtraction to spice it up!
|
||||
```
|
||||
main = { 1 - 1 + 3 }
|
||||
```
|
||||
Lily supports arbitrary data type definitions (if you've used Haskell, these should be familliar). For instance, we can define and use a `Maybe` data structure as follows:
|
||||
```
|
||||
data Maybe = { Ok(Int), Nothing }
|
||||
defn maybeOr42 m = {
|
||||
case m of {
|
||||
Ok(i) -> { i }
|
||||
Nothing -> { 42 }
|
||||
}
|
||||
}
|
||||
defn main = { maybeOr42 (Ok 3) }
|
||||
```
|
||||
Notice how during declaration, constructors for data types have the form "C(P1, P2)", while in an expression, they are treated like normal functions.
|
||||
|
||||
Finally, Lily has lazy evaluation, allowing for the construction of infinite data structures:
|
||||
```
|
||||
data IntList = { Nil, Cons(Int, IntList) }
|
||||
defn ones = { Cons 1 ones }
|
||||
```
|
||||
This will not take any computation time unless something else uses the values from the list.
|
||||
19
Makefile
Normal file
19
Makefile
Normal file
@@ -0,0 +1,19 @@
|
||||
CPPFLAGS := -std=c++11 `llvm-config-7.0-64 --cppflags --ldflags --libs --system-libs all`
|
||||
CC := g++
|
||||
TARGET := lily
|
||||
|
||||
# $(wildcard *.cpp /xxx/xxx/*.cpp): get all .cpp files from the current directory and dir "/xxx/xxx/"
|
||||
SRCS := $(wildcard src/*.cpp)
|
||||
# # $(patsubst %.cpp,%.o,$(SRCS)): substitute all ".cpp" file name strings to ".o" file name strings
|
||||
OBJS := $(patsubst src/%.cpp,%.o,$(SRCS))
|
||||
|
||||
all: $(TARGET)
|
||||
$(TARGET): $(OBJS) parser-c.o
|
||||
$(CC) $(CPPFLAGS) -o $@ $^
|
||||
%.o: src/%.cpp
|
||||
$(CC) $(CPPFLAGS) -c $<
|
||||
clean:
|
||||
rm -rf $(TARGET) *.o
|
||||
parser-c.o:
|
||||
gcc -c -o parser-c.o src/parser.c
|
||||
.PHONY: all clean
|
||||
14
programs/fib.lily
Normal file
14
programs/fib.lily
Normal file
@@ -0,0 +1,14 @@
|
||||
defn if c t e = {
|
||||
case c of {
|
||||
True -> { t }
|
||||
False -> { e }
|
||||
}
|
||||
}
|
||||
|
||||
defn fibtr a b n = {
|
||||
if (eq n 0) a (fibtr b (a+b) (n-1))
|
||||
}
|
||||
|
||||
defn fib n = { fibtr 1 1 n }
|
||||
|
||||
defn main = { fib 40 }
|
||||
@@ -145,7 +145,7 @@ namespace lily {
|
||||
public:
|
||||
template <typename T, typename ... Ts>
|
||||
T* add_instruction(Ts ... ts) {
|
||||
auto new_inst = std::make_unique<T>(ts...);
|
||||
auto new_inst = std::unique_ptr<T>(new T(ts...));
|
||||
T* raw = new_inst.get();
|
||||
instructions.push_back(std::move(new_inst));
|
||||
return raw;
|
||||
|
||||
@@ -216,7 +216,7 @@ namespace lily {
|
||||
}
|
||||
|
||||
static program_ptr build_program(pgs_tree* tree, const char* source) {
|
||||
program_ptr prog = std::make_unique<program>();
|
||||
program_ptr prog = std::unique_ptr<program>(new program);
|
||||
pgs_tree* program = PGS_TREE_NT_CHILD(*tree, 0);
|
||||
|
||||
do {
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
namespace lily {
|
||||
type_data::constructor* type_data::create_constructor(const std::string& name,
|
||||
std::vector<type*>&& params) {
|
||||
auto new_constructor = std::make_unique<constructor>();
|
||||
auto new_constructor = std::unique_ptr<constructor>(new constructor());
|
||||
new_constructor->id = constructors.size();
|
||||
new_constructor->parent = this;
|
||||
new_constructor->params = std::move(params);
|
||||
|
||||
@@ -2,6 +2,7 @@
|
||||
#include <string>
|
||||
#include <vector>
|
||||
#include <map>
|
||||
#include <memory>
|
||||
|
||||
namespace lily {
|
||||
enum reserved_types {
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
#pragma once
|
||||
#include <map>
|
||||
#include <memory>
|
||||
#include "type.hpp"
|
||||
#include "ast.hpp"
|
||||
|
||||
|
||||
@@ -12,7 +12,7 @@ namespace lily {
|
||||
}
|
||||
|
||||
type_internal* type_manager::create_int_type() {
|
||||
auto new_type = std::make_unique<type_internal>(next_id++);
|
||||
auto new_type = std::unique_ptr<type_internal>(new type_internal(next_id++));
|
||||
type_internal* raw_ptr = new_type.get();
|
||||
types.push_back(std::move(new_type));
|
||||
type_names["Int"] = raw_ptr;
|
||||
@@ -20,7 +20,7 @@ namespace lily {
|
||||
}
|
||||
|
||||
type_internal* type_manager::create_str_type() {
|
||||
auto new_type = std::make_unique<type_internal>(next_id++);
|
||||
auto new_type = std::unique_ptr<type_internal>(new type_internal(next_id++));
|
||||
type_internal* raw_ptr = new_type.get();
|
||||
types.push_back(std::move(new_type));
|
||||
type_names["Str"] = raw_ptr;
|
||||
@@ -29,7 +29,7 @@ namespace lily {
|
||||
|
||||
type_data* type_manager::create_data_type(const std::string& name) {
|
||||
if(type_names.count(name)) throw error("redefinition of type");
|
||||
auto new_type = std::make_unique<type_data>(next_id++);
|
||||
auto new_type = std::unique_ptr<type_data>(new type_data(next_id++));
|
||||
type_data* raw_ptr = new_type.get();
|
||||
types.push_back(std::move(new_type));
|
||||
type_names[name] = raw_ptr;
|
||||
|
||||
@@ -21,7 +21,7 @@ namespace lily {
|
||||
type_data* create_data_type(const std::string& name);
|
||||
template <typename T, typename ... Args>
|
||||
T* create_type(Args...as) {
|
||||
auto new_type = std::make_unique<T>(as...);
|
||||
auto new_type = std::unique_ptr<T>(new T(as...));
|
||||
T* raw_ptr = new_type.get();
|
||||
types.push_back(std::move(new_type));
|
||||
return raw_ptr;
|
||||
|
||||
Reference in New Issue
Block a user