60 Commits

Author SHA1 Message Date
f75a47e273 Add post about sidenotes 2019-12-08 23:47:52 -08:00
9eae560cae Make sidenotes mobile-friendly 2019-12-07 00:16:59 -08:00
b0529a9124 Add initial implementation of sidenotes 2019-12-06 00:10:26 -08:00
3df9c57482 Fix naming issue (this is really a compiler bug) 2019-12-05 19:36:47 -08:00
cb5163e1d9 Add gitignore file 2019-12-04 14:35:23 -08:00
c309ac4c14 Rename page and add pop instruction to part 5 of compiler series 2019-11-14 11:05:17 -08:00
58c9d5f982 Fix another typo in compiler series 2019-11-13 13:51:32 -08:00
dc9a68ad10 Fix mistakes in blog posts 2019-11-13 13:49:47 -08:00
db16dbda18 Fix incorrect CMakeLists.txt 2019-11-13 13:47:04 -08:00
172630c2ee Stop justifying titles and add lang attribute 2019-11-11 15:36:59 -08:00
6dc7734c70 Remove more draft labels 2019-11-06 22:32:57 -08:00
19a1ffbc98 Remove draft label 2019-11-06 22:32:21 -08:00
2cce2859bb Fix some typos 2019-11-06 22:27:52 -08:00
654239e29f Fix last sentence of compiler post 2019-11-06 21:34:29 -08:00
50fbe3e196 Finish draft of post 8 in compiler series 2019-11-06 21:10:53 -08:00
1a8a1c3052 Work on writing up the rest of part 8 in compiler series 2019-11-06 14:44:53 -08:00
2994f8983d Add the push operation in code in compiler series 2019-11-06 13:23:59 -08:00
64227f2873 Finish implementation of compiler 2019-11-06 12:52:42 -08:00
9aef499deb Factor out definition into separate file in compiler series 2019-11-05 10:40:51 -08:00
c79b5a4120 Start writing actual compillation code in compiler series 2019-11-05 00:42:33 -08:00
81ee50d0d4 Implement function and type creation, add text to blog in compiler series 2019-11-04 18:25:54 -08:00
43b140285f Fix missing line in runtime header in compiler series 2019-11-04 13:30:18 -08:00
adb894869e Remove code border. 2019-11-02 17:53:31 -07:00
1f6032a30e Start work on chapter 8 code for compilers 2019-11-02 17:53:15 -07:00
9531f4d8e3 Add chapter 8 starting code for compiler series 2019-11-02 16:38:11 -07:00
37097d3a40 Change SCSS to use darken, and remove input styles. 2019-10-31 14:41:52 -07:00
3aa468c2f6 Remove debug printf 2019-10-31 14:38:06 -07:00
c704187012 Use darken to specify link color 2019-10-31 14:32:34 -07:00
a834fd578e Finish initial draft of runtime posts. 2019-10-30 14:21:13 -07:00
4b5e2f4454 Write some more about runetime 2019-10-30 00:19:56 -07:00
7812b1064b Make progress on compiler posts 2019-10-26 20:30:29 -07:00
65b9f385cf Start working on runtime chapter 2019-10-15 11:13:13 -07:00
ed88d54aa6 Add post about LSP idea 2019-10-12 13:12:18 -07:00
d1b515ec5b Make max width higher 2019-10-12 00:31:43 -07:00
1ffc43af98 Link compiler posts together 2019-10-10 18:15:37 -07:00
b27dc19e57 Finish draft of part 6 of compiler series 2019-10-10 18:00:13 -07:00
df0b819b0e Fix bug from small improvements 2019-10-10 17:59:44 -07:00
21f90d85c5 Add finishing touches to code for part 6 of compiler series 2019-10-10 13:14:00 -07:00
18e3f2af55 Fix definition to resolve its own types 2019-10-09 22:51:19 -07:00
3901c9b115 Add print methods to instructions 2019-10-09 22:46:17 -07:00
d9486d08ae Fix type in compiler blog 2019-10-08 23:50:21 -07:00
d90993a93c Implement ast_case::compile for compiler series and reference code 2019-10-08 23:46:35 -07:00
7e9bd95846 Write explanations of AST refactor in compiler series 2019-10-08 21:42:25 -07:00
d3d73e0e9c Fix up compile in compiler blog part 6, and add more text. 2019-10-08 14:10:05 -07:00
d9c151d774 Continue implementation of compilation 2019-10-01 23:23:52 -07:00
64f4abb8d6 Start writing up the implementation 2019-10-01 14:35:28 -07:00
bcaa67cc7a Begin implementation of new environment 2019-10-01 14:34:38 -07:00
8c0a6c834e Create new 'branch' for part 6 of compiler series 2019-10-01 11:05:21 -07:00
a69f9f633e Continue work on part 6 of compiler series 2019-09-16 01:57:15 -07:00
77cfeda60d Add another paragraph to the 6th part of the compiler series 2019-09-04 21:45:39 -07:00
44abf877b2 Add beginning of part 6 of compiler series 2019-09-04 21:10:06 -07:00
1fdccd6fae Add links to part 5 in compiler series 2019-09-04 01:55:48 -07:00
e3fd13c0c1 Remove TODOS from part 5 of compiler series 2019-09-04 01:31:31 -07:00
3bce9743e5 Add dump to other rules in compiler posts 2019-09-04 00:27:58 -07:00
c0b5d67c3d Add missing shortcodes for G-machine display 2019-09-03 21:23:45 -07:00
216e9e89b4 Finish draft of part 5 of compiler series. 2019-09-02 23:38:27 -07:00
a1244f201a Add G-machine graph creation instructions to Part 5 2019-09-02 17:51:36 -07:00
4d8d806706 Start working on Part 5 of compiler posts. 2019-08-28 21:11:34 -07:00
05af1350c8 Add errors ection to Part 4 of compiler posts 2019-08-28 15:34:13 -07:00
df1101a14c Make changes suggested by Ryan 2019-08-28 13:34:35 -07:00
114 changed files with 7753 additions and 68 deletions

1
.gitignore vendored Normal file
View File

@@ -0,0 +1 @@
**/build/*

44
assets/scss/gmachine.scss Normal file
View File

@@ -0,0 +1,44 @@
@import "style.scss";
.gmachine-instruction {
display: flex;
@include bordered-block;
}
.gmachine-instruction-name {
padding: 10px;
border-right: $standard-border;
flex-grow: 1;
flex-basis: 20%;
text-align: center;
}
.gmachine-instruction-sem {
width: 100%;
flex-grow: 4;
}
.gmachine-inner {
border-bottom: $standard-border;
width: 100%;
&:last-child {
border-bottom: none;
}
}
.gmachine-inner-label {
padding: 10px;
font-weight: bold;
}
.gmachine-inner-text {
padding: 10px;
text-align: right;
flex-grow: 1;
}
.gmachine-instruction-name, .gmachine-inner-label, .gmachine-inner {
display: flex;
align-items: center;
}

View File

@@ -16,6 +16,7 @@ add_executable(compiler
ast.cpp ast.hpp definition.cpp ast.cpp ast.hpp definition.cpp
env.cpp env.hpp env.cpp env.hpp
type.cpp type.hpp type.cpp type.hpp
error.cpp error.hpp
${BISON_parser_OUTPUTS} ${BISON_parser_OUTPUTS}
${FLEX_scanner_OUTPUTS} ${FLEX_scanner_OUTPUTS}
main.cpp main.cpp

View File

@@ -1,5 +1,6 @@
#include "ast.hpp" #include "ast.hpp"
#include <ostream> #include <ostream>
#include "error.hpp"
std::string op_name(binop op) { std::string op_name(binop op) {
switch(op) { switch(op) {
@@ -8,7 +9,7 @@ std::string op_name(binop op) {
case TIMES: return "*"; case TIMES: return "*";
case DIVIDE: return "/"; case DIVIDE: return "/";
} }
throw 0; return "??";
} }
void print_indent(int n, std::ostream& to) { void print_indent(int n, std::ostream& to) {
@@ -53,7 +54,7 @@ type_ptr ast_binop::typecheck(type_mgr& mgr, const type_env& env) const {
type_ptr ltype = left->typecheck(mgr, env); type_ptr ltype = left->typecheck(mgr, env);
type_ptr rtype = right->typecheck(mgr, env); type_ptr rtype = right->typecheck(mgr, env);
type_ptr ftype = env.lookup(op_name(op)); type_ptr ftype = env.lookup(op_name(op));
if(!ftype) throw 0; if(!ftype) throw type_error(std::string("unknown binary operator ") + op_name(op));
type_ptr return_type = mgr.new_type(); type_ptr return_type = mgr.new_type();
type_ptr arrow_one = type_ptr(new type_arr(rtype, return_type)); type_ptr arrow_one = type_ptr(new type_arr(rtype, return_type));
@@ -92,7 +93,8 @@ void ast_case::print(int indent, std::ostream& to) const {
} }
type_ptr ast_case::typecheck(type_mgr& mgr, const type_env& env) const { type_ptr ast_case::typecheck(type_mgr& mgr, const type_env& env) const {
type_ptr case_type = of->typecheck(mgr, env); type_var* var;
type_ptr case_type = mgr.resolve(of->typecheck(mgr, env), var);
type_ptr branch_type = mgr.new_type(); type_ptr branch_type = mgr.new_type();
for(auto& branch : branches) { for(auto& branch : branches) {
@@ -102,6 +104,11 @@ type_ptr ast_case::typecheck(type_mgr& mgr, const type_env& env) const {
mgr.unify(branch_type, curr_branch_type); mgr.unify(branch_type, curr_branch_type);
} }
case_type = mgr.resolve(case_type, var);
if(!dynamic_cast<type_base*>(case_type.get())) {
throw type_error("attempting case analysis of non-data type");
}
return branch_type; return branch_type;
} }
@@ -122,17 +129,17 @@ void pattern_constr::print(std::ostream& to) const {
void pattern_constr::match(type_ptr t, type_mgr& mgr, type_env& env) const { void pattern_constr::match(type_ptr t, type_mgr& mgr, type_env& env) const {
type_ptr constructor_type = env.lookup(constr); type_ptr constructor_type = env.lookup(constr);
if(!constructor_type) throw 0; if(!constructor_type) {
throw type_error(std::string("pattern using unknown constructor ") + constr);
}
for(int i = 0; i < params.size(); i++) { for(int i = 0; i < params.size(); i++) {
type_arr* arr = dynamic_cast<type_arr*>(constructor_type.get()); type_arr* arr = dynamic_cast<type_arr*>(constructor_type.get());
if(!arr) throw 0; if(!arr) throw type_error("too many parameters in constructor pattern");
env.bind(params[i], arr->left); env.bind(params[i], arr->left);
constructor_type = arr->right; constructor_type = arr->right;
} }
mgr.unify(t, constructor_type); mgr.unify(t, constructor_type);
type_base* result_type = dynamic_cast<type_base*>(constructor_type.get());
if(!result_type) throw 0;
} }

View File

@@ -0,0 +1,5 @@
#include "error.hpp"
const char* type_error::what() const noexcept {
return "an error occured while checking the types of the program";
}

View File

@@ -0,0 +1,21 @@
#pragma once
#include <exception>
#include "type.hpp"
struct type_error : std::exception {
std::string description;
type_error(std::string d)
: description(std::move(d)) {}
const char* what() const noexcept override;
};
struct unification_error : public type_error {
type_ptr left;
type_ptr right;
unification_error(type_ptr l, type_ptr r)
: left(std::move(l)), right(std::move(r)),
type_error("failed to unify types") {}
};

View File

@@ -1,7 +1,8 @@
#include "ast.hpp" #include "ast.hpp"
#include "parser.hpp"
#include "type.hpp"
#include <iostream> #include <iostream>
#include "parser.hpp"
#include "error.hpp"
#include "type.hpp"
void yy::parser::error(const std::string& msg) { void yy::parser::error(const std::string& msg) {
std::cout << "An error occured: " << msg << std::endl; std::cout << "An error occured: " << msg << std::endl;
@@ -9,10 +10,9 @@ void yy::parser::error(const std::string& msg) {
extern std::vector<definition_ptr> program; extern std::vector<definition_ptr> program;
void typecheck_program(const std::vector<definition_ptr>& prog) { void typecheck_program(
type_mgr mgr; const std::vector<definition_ptr>& prog,
type_env env; type_mgr& mgr, type_env& env) {
type_ptr int_type = type_ptr(new type_base("Int")); type_ptr int_type = type_ptr(new type_base("Int"));
type_ptr binop_type = type_ptr(new type_arr( type_ptr binop_type = type_ptr(new type_arr(
int_type, int_type,
@@ -40,6 +40,9 @@ void typecheck_program(const std::vector<definition_ptr>& prog) {
int main() { int main() {
yy::parser parser; yy::parser parser;
type_mgr mgr;
type_env env;
parser.parse(); parser.parse();
for(auto& definition : program) { for(auto& definition : program) {
definition_defn* def = dynamic_cast<definition_defn*>(definition.get()); definition_defn* def = dynamic_cast<definition_defn*>(definition.get());
@@ -51,6 +54,17 @@ int main() {
def->body->print(1, std::cout); def->body->print(1, std::cout);
} }
typecheck_program(program); try {
std::cout << program.size() << std::endl; typecheck_program(program, mgr, env);
} catch(unification_error& err) {
std::cout << "failed to unify types: " << std::endl;
std::cout << " (1) \033[34m";
err.left->print(mgr, std::cout);
std::cout << "\033[0m" << std::endl;
std::cout << " (2) \033[32m";
err.right->print(mgr, std::cout);
std::cout << "\033[0m" << std::endl;
} catch(type_error& err) {
std::cout << "failed to type check program: " << err.description << std::endl;
}
} }

View File

@@ -1,6 +1,7 @@
#include "type.hpp" #include "type.hpp"
#include <sstream> #include <sstream>
#include <algorithm> #include <algorithm>
#include "error.hpp"
void type_var::print(const type_mgr& mgr, std::ostream& to) const { void type_var::print(const type_mgr& mgr, std::ostream& to) const {
auto it = mgr.types.find(name); auto it = mgr.types.find(name);
@@ -87,7 +88,7 @@ void type_mgr::unify(type_ptr l, type_ptr r) {
if(lid->name == rid->name) return; if(lid->name == rid->name) return;
} }
throw 0; throw unification_error(l, r);
} }
void type_mgr::bind(const std::string& s, type_ptr t) { void type_mgr::bind(const std::string& s, type_ptr t) {

View File

@@ -0,0 +1,28 @@
cmake_minimum_required(VERSION 3.1)
project(compiler)
find_package(BISON)
find_package(FLEX)
bison_target(parser
${CMAKE_CURRENT_SOURCE_DIR}/parser.y
${CMAKE_CURRENT_BINARY_DIR}/parser.cpp
COMPILE_FLAGS "-d")
flex_target(scanner
${CMAKE_CURRENT_SOURCE_DIR}/scanner.l
${CMAKE_CURRENT_BINARY_DIR}/scanner.cpp)
add_flex_bison_dependency(scanner parser)
add_executable(compiler
ast.cpp ast.hpp definition.cpp
type_env.cpp type_env.hpp
env.cpp env.hpp
type.cpp type.hpp
error.cpp error.hpp
binop.cpp binop.hpp
instruction.cpp instruction.hpp
${BISON_parser_OUTPUTS}
${FLEX_scanner_OUTPUTS}
main.cpp
)
target_include_directories(compiler PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})
target_include_directories(compiler PUBLIC ${CMAKE_CURRENT_BINARY_DIR})

262
code/compiler/06/ast.cpp Normal file
View File

@@ -0,0 +1,262 @@
#include "ast.hpp"
#include <ostream>
#include "error.hpp"
static void print_indent(int n, std::ostream& to) {
while(n--) to << " ";
}
type_ptr ast::typecheck_common(type_mgr& mgr, const type_env& env) {
node_type = typecheck(mgr, env);
return node_type;
}
void ast::resolve_common(const type_mgr& mgr) {
type_var* var;
type_ptr resolved_type = mgr.resolve(node_type, var);
if(var) throw type_error("ambiguously typed program");
resolve(mgr);
node_type = std::move(resolved_type);
}
void ast_int::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "INT: " << value << std::endl;
}
type_ptr ast_int::typecheck(type_mgr& mgr, const type_env& env) const {
return type_ptr(new type_base("Int"));
}
void ast_int::resolve(const type_mgr& mgr) const {
}
void ast_int::compile(const env_ptr& env, std::vector<instruction_ptr>& into) const {
into.push_back(instruction_ptr(new instruction_pushint(value)));
}
void ast_lid::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "LID: " << id << std::endl;
}
type_ptr ast_lid::typecheck(type_mgr& mgr, const type_env& env) const {
return env.lookup(id);
}
void ast_lid::resolve(const type_mgr& mgr) const {
}
void ast_lid::compile(const env_ptr& env, std::vector<instruction_ptr>& into) const {
into.push_back(instruction_ptr(
env->has_variable(id) ?
(instruction*) new instruction_push(env->get_offset(id)) :
(instruction*) new instruction_pushglobal(id)));
}
void ast_uid::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "UID: " << id << std::endl;
}
type_ptr ast_uid::typecheck(type_mgr& mgr, const type_env& env) const {
return env.lookup(id);
}
void ast_uid::resolve(const type_mgr& mgr) const {
}
void ast_uid::compile(const env_ptr& env, std::vector<instruction_ptr>& into) const {
into.push_back(instruction_ptr(new instruction_pushglobal(id)));
}
void ast_binop::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "BINOP: " << op_name(op) << std::endl;
left->print(indent + 1, to);
right->print(indent + 1, to);
}
type_ptr ast_binop::typecheck(type_mgr& mgr, const type_env& env) const {
type_ptr ltype = left->typecheck_common(mgr, env);
type_ptr rtype = right->typecheck_common(mgr, env);
type_ptr ftype = env.lookup(op_name(op));
if(!ftype) throw type_error(std::string("unknown binary operator ") + op_name(op));
type_ptr return_type = mgr.new_type();
type_ptr arrow_one = type_ptr(new type_arr(rtype, return_type));
type_ptr arrow_two = type_ptr(new type_arr(ltype, arrow_one));
mgr.unify(arrow_two, ftype);
return return_type;
}
void ast_binop::resolve(const type_mgr& mgr) const {
left->resolve_common(mgr);
right->resolve_common(mgr);
}
void ast_binop::compile(const env_ptr& env, std::vector<instruction_ptr>& into) const {
right->compile(env, into);
left->compile(env_ptr(new env_offset(1, env)), into);
into.push_back(instruction_ptr(new instruction_pushglobal(op_action(op))));
into.push_back(instruction_ptr(new instruction_mkapp()));
into.push_back(instruction_ptr(new instruction_mkapp()));
}
void ast_app::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "APP:" << std::endl;
left->print(indent + 1, to);
right->print(indent + 1, to);
}
type_ptr ast_app::typecheck(type_mgr& mgr, const type_env& env) const {
type_ptr ltype = left->typecheck_common(mgr, env);
type_ptr rtype = right->typecheck_common(mgr, env);
type_ptr return_type = mgr.new_type();
type_ptr arrow = type_ptr(new type_arr(rtype, return_type));
mgr.unify(arrow, ltype);
return return_type;
}
void ast_app::resolve(const type_mgr& mgr) const {
left->resolve_common(mgr);
right->resolve_common(mgr);
}
void ast_app::compile(const env_ptr& env, std::vector<instruction_ptr>& into) const {
right->compile(env, into);
left->compile(env_ptr(new env_offset(1, env)), into);
into.push_back(instruction_ptr(new instruction_mkapp()));
}
void ast_case::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "CASE: " << std::endl;
for(auto& branch : branches) {
print_indent(indent + 1, to);
branch->pat->print(to);
to << std::endl;
branch->expr->print(indent + 2, to);
}
}
type_ptr ast_case::typecheck(type_mgr& mgr, const type_env& env) const {
type_var* var;
type_ptr case_type = mgr.resolve(of->typecheck_common(mgr, env), var);
type_ptr branch_type = mgr.new_type();
for(auto& branch : branches) {
type_env new_env = env.scope();
branch->pat->match(case_type, mgr, new_env);
type_ptr curr_branch_type = branch->expr->typecheck_common(mgr, new_env);
mgr.unify(branch_type, curr_branch_type);
}
case_type = mgr.resolve(case_type, var);
if(!dynamic_cast<type_data*>(case_type.get())) {
throw type_error("attempting case analysis of non-data type");
}
return branch_type;
}
void ast_case::resolve(const type_mgr& mgr) const {
of->resolve_common(mgr);
for(auto& branch : branches) {
branch->expr->resolve_common(mgr);
}
}
void ast_case::compile(const env_ptr& env, std::vector<instruction_ptr>& into) const {
type_data* type = dynamic_cast<type_data*>(of->node_type.get());
of->compile(env, into);
into.push_back(instruction_ptr(new instruction_eval()));
instruction_jump* jump_instruction = new instruction_jump();
into.push_back(instruction_ptr(jump_instruction));
for(auto& branch : branches) {
std::vector<instruction_ptr> branch_instructions;
pattern_var* vpat;
pattern_constr* cpat;
if((vpat = dynamic_cast<pattern_var*>(branch->pat.get()))) {
branch->expr->compile(env_ptr(new env_offset(1, env)), branch_instructions);
for(auto& constr_pair : type->constructors) {
if(jump_instruction->tag_mappings.find(constr_pair.second.tag) !=
jump_instruction->tag_mappings.end())
break;
jump_instruction->tag_mappings[constr_pair.second.tag] =
jump_instruction->branches.size();
}
jump_instruction->branches.push_back(std::move(branch_instructions));
} else if((cpat = dynamic_cast<pattern_constr*>(branch->pat.get()))) {
env_ptr new_env = env;
for(auto it = cpat->params.rbegin(); it != cpat->params.rend(); it++) {
new_env = env_ptr(new env_var(*it, new_env));
}
branch_instructions.push_back(instruction_ptr(new instruction_split()));
branch->expr->compile(new_env, branch_instructions);
branch_instructions.push_back(instruction_ptr(new instruction_slide(
cpat->params.size())));
int new_tag = type->constructors[cpat->constr].tag;
if(jump_instruction->tag_mappings.find(new_tag) !=
jump_instruction->tag_mappings.end())
throw type_error("technically not a type error: duplicate pattern");
jump_instruction->tag_mappings[new_tag] =
jump_instruction->branches.size();
jump_instruction->branches.push_back(std::move(branch_instructions));
}
}
for(auto& constr_pair : type->constructors) {
if(jump_instruction->tag_mappings.find(constr_pair.second.tag) ==
jump_instruction->tag_mappings.end())
throw type_error("non-total pattern");
}
}
void pattern_var::print(std::ostream& to) const {
to << var;
}
void pattern_var::match(type_ptr t, type_mgr& mgr, type_env& env) const {
env.bind(var, t);
}
void pattern_constr::print(std::ostream& to) const {
to << constr;
for(auto& param : params) {
to << " " << param;
}
}
void pattern_constr::match(type_ptr t, type_mgr& mgr, type_env& env) const {
type_ptr constructor_type = env.lookup(constr);
if(!constructor_type) {
throw type_error(std::string("pattern using unknown constructor ") + constr);
}
for(int i = 0; i < params.size(); i++) {
type_arr* arr = dynamic_cast<type_arr*>(constructor_type.get());
if(!arr) throw type_error("too many parameters in constructor pattern");
env.bind(params[i], arr->left);
constructor_type = arr->right;
}
mgr.unify(t, constructor_type);
}

197
code/compiler/06/ast.hpp Normal file
View File

@@ -0,0 +1,197 @@
#pragma once
#include <memory>
#include <vector>
#include "type.hpp"
#include "type_env.hpp"
#include "binop.hpp"
#include "instruction.hpp"
#include "env.hpp"
struct ast {
type_ptr node_type;
virtual ~ast() = default;
virtual void print(int indent, std::ostream& to) const = 0;
virtual type_ptr typecheck(type_mgr& mgr, const type_env& env) const = 0;
virtual void resolve(const type_mgr& mgr) const = 0;
virtual void compile(const env_ptr& env,
std::vector<instruction_ptr>& into) const = 0;
type_ptr typecheck_common(type_mgr& mgr, const type_env& env);
void resolve_common(const type_mgr& mgr);
};
using ast_ptr = std::unique_ptr<ast>;
struct pattern {
virtual ~pattern() = default;
virtual void print(std::ostream& to) const = 0;
virtual void match(type_ptr t, type_mgr& mgr, type_env& env) const = 0;
};
using pattern_ptr = std::unique_ptr<pattern>;
struct branch {
pattern_ptr pat;
ast_ptr expr;
branch(pattern_ptr p, ast_ptr a)
: pat(std::move(p)), expr(std::move(a)) {}
};
using branch_ptr = std::unique_ptr<branch>;
struct constructor {
std::string name;
std::vector<std::string> types;
int8_t tag;
constructor(std::string n, std::vector<std::string> ts)
: name(std::move(n)), types(std::move(ts)) {}
};
using constructor_ptr = std::unique_ptr<constructor>;
struct definition {
virtual ~definition() = default;
virtual void typecheck_first(type_mgr& mgr, type_env& env) = 0;
virtual void typecheck_second(type_mgr& mgr, const type_env& env) const = 0;
virtual void resolve(const type_mgr& mgr) = 0;
virtual void compile() = 0;
};
using definition_ptr = std::unique_ptr<definition>;
struct ast_int : public ast {
int value;
explicit ast_int(int v)
: value(v) {}
void print(int indent, std::ostream& to) const;
type_ptr typecheck(type_mgr& mgr, const type_env& env) const;
void resolve(const type_mgr& mgr) const;
void compile(const env_ptr& env, std::vector<instruction_ptr>& into) const;
};
struct ast_lid : public ast {
std::string id;
explicit ast_lid(std::string i)
: id(std::move(i)) {}
void print(int indent, std::ostream& to) const;
type_ptr typecheck(type_mgr& mgr, const type_env& env) const;
void resolve(const type_mgr& mgr) const;
void compile(const env_ptr& env, std::vector<instruction_ptr>& into) const;
};
struct ast_uid : public ast {
std::string id;
explicit ast_uid(std::string i)
: id(std::move(i)) {}
void print(int indent, std::ostream& to) const;
type_ptr typecheck(type_mgr& mgr, const type_env& env) const;
void resolve(const type_mgr& mgr) const;
void compile(const env_ptr& env, std::vector<instruction_ptr>& into) const;
};
struct ast_binop : public ast {
binop op;
ast_ptr left;
ast_ptr right;
ast_binop(binop o, ast_ptr l, ast_ptr r)
: op(o), left(std::move(l)), right(std::move(r)) {}
void print(int indent, std::ostream& to) const;
type_ptr typecheck(type_mgr& mgr, const type_env& env) const;
void resolve(const type_mgr& mgr) const;
void compile(const env_ptr& env, std::vector<instruction_ptr>& into) const;
};
struct ast_app : public ast {
ast_ptr left;
ast_ptr right;
ast_app(ast_ptr l, ast_ptr r)
: left(std::move(l)), right(std::move(r)) {}
void print(int indent, std::ostream& to) const;
type_ptr typecheck(type_mgr& mgr, const type_env& env) const;
void resolve(const type_mgr& mgr) const;
void compile(const env_ptr& env, std::vector<instruction_ptr>& into) const;
};
struct ast_case : public ast {
ast_ptr of;
std::vector<branch_ptr> branches;
ast_case(ast_ptr o, std::vector<branch_ptr> b)
: of(std::move(o)), branches(std::move(b)) {}
void print(int indent, std::ostream& to) const;
type_ptr typecheck(type_mgr& mgr, const type_env& env) const;
void resolve(const type_mgr& mgr) const;
void compile(const env_ptr& env, std::vector<instruction_ptr>& into) const;
};
struct pattern_var : public pattern {
std::string var;
pattern_var(std::string v)
: var(std::move(v)) {}
void print(std::ostream &to) const;
void match(type_ptr t, type_mgr& mgr, type_env& env) const;
};
struct pattern_constr : public pattern {
std::string constr;
std::vector<std::string> params;
pattern_constr(std::string c, std::vector<std::string> p)
: constr(std::move(c)), params(std::move(p)) {}
void print(std::ostream &to) const;
void match(type_ptr t, type_mgr&, type_env& env) const;
};
struct definition_defn : public definition {
std::string name;
std::vector<std::string> params;
ast_ptr body;
type_ptr return_type;
std::vector<type_ptr> param_types;
std::vector<instruction_ptr> instructions;
definition_defn(std::string n, std::vector<std::string> p, ast_ptr b)
: name(std::move(n)), params(std::move(p)), body(std::move(b)) {
}
void typecheck_first(type_mgr& mgr, type_env& env);
void typecheck_second(type_mgr& mgr, const type_env& env) const;
void resolve(const type_mgr& mgr);
void compile();
};
struct definition_data : public definition {
std::string name;
std::vector<constructor_ptr> constructors;
definition_data(std::string n, std::vector<constructor_ptr> cs)
: name(std::move(n)), constructors(std::move(cs)) {}
void typecheck_first(type_mgr& mgr, type_env& env);
void typecheck_second(type_mgr& mgr, const type_env& env) const;
void resolve(const type_mgr& mgr);
void compile();
};

View File

@@ -0,0 +1,21 @@
#include "binop.hpp"
std::string op_name(binop op) {
switch(op) {
case PLUS: return "+";
case MINUS: return "-";
case TIMES: return "*";
case DIVIDE: return "/";
}
return "??";
}
std::string op_action(binop op) {
switch(op) {
case PLUS: return "plus";
case MINUS: return "minus";
case TIMES: return "times";
case DIVIDE: return "divide";
}
return "??";
}

View File

@@ -0,0 +1,12 @@
#pragma once
#include <string>
enum binop {
PLUS,
MINUS,
TIMES,
DIVIDE
};
std::string op_name(binop op);
std::string op_action(binop op);

View File

@@ -0,0 +1,83 @@
#include "ast.hpp"
#include "error.hpp"
void definition_defn::typecheck_first(type_mgr& mgr, type_env& env) {
return_type = mgr.new_type();
type_ptr full_type = return_type;
for(auto it = params.rbegin(); it != params.rend(); it++) {
type_ptr param_type = mgr.new_type();
full_type = type_ptr(new type_arr(param_type, full_type));
param_types.push_back(param_type);
}
env.bind(name, full_type);
}
void definition_defn::typecheck_second(type_mgr& mgr, const type_env& env) const {
type_env new_env = env.scope();
auto param_it = params.begin();
auto type_it = param_types.rbegin();
while(param_it != params.end() && type_it != param_types.rend()) {
new_env.bind(*param_it, *type_it);
param_it++;
type_it++;
}
type_ptr body_type = body->typecheck_common(mgr, new_env);
mgr.unify(return_type, body_type);
}
void definition_defn::resolve(const type_mgr& mgr) {
type_var* var;
body->resolve_common(mgr);
return_type = mgr.resolve(return_type, var);
if(var) throw type_error("ambiguously typed program");
for(auto& param_type : param_types) {
param_type = mgr.resolve(param_type, var);
if(var) throw type_error("ambiguously typed program");
}
}
void definition_defn::compile() {
env_ptr new_env = env_ptr(new env_offset(0, nullptr));
for(auto it = params.rbegin(); it != params.rend(); it++) {
new_env = env_ptr(new env_var(*it, new_env));
}
body->compile(new_env, instructions);
instructions.push_back(instruction_ptr(new instruction_update(params.size())));
instructions.push_back(instruction_ptr(new instruction_pop(params.size())));
}
void definition_data::typecheck_first(type_mgr& mgr, type_env& env) {
type_data* this_type = new type_data(name);
type_ptr return_type = type_ptr(this_type);
int next_tag = 0;
for(auto& constructor : constructors) {
constructor->tag = next_tag;
this_type->constructors[constructor->name] = { next_tag++ };
type_ptr full_type = return_type;
for(auto it = constructor->types.rbegin(); it != constructor->types.rend(); it++) {
type_ptr type = type_ptr(new type_base(*it));
full_type = type_ptr(new type_arr(type, full_type));
}
env.bind(constructor->name, full_type);
}
}
void definition_data::typecheck_second(type_mgr& mgr, const type_env& env) const {
// Nothing
}
void definition_data::resolve(const type_mgr& mgr) {
// Nothing
}
void definition_data::compile() {
}

23
code/compiler/06/env.cpp Normal file
View File

@@ -0,0 +1,23 @@
#include "env.hpp"
int env_var::get_offset(const std::string& name) const {
if(name == this->name) return 0;
if(parent) return parent->get_offset(name) + 1;
throw 0;
}
bool env_var::has_variable(const std::string& name) const {
if(name == this->name) return true;
if(parent) return parent->has_variable(name);
return false;
}
int env_offset::get_offset(const std::string& name) const {
if(parent) return parent->get_offset(name) + offset;
throw 0;
}
bool env_offset::has_variable(const std::string& name) const {
if(parent) return parent->has_variable(name);
return false;
}

34
code/compiler/06/env.hpp Normal file
View File

@@ -0,0 +1,34 @@
#pragma once
#include <memory>
#include <string>
struct env {
virtual ~env() = default;
virtual int get_offset(const std::string& name) const = 0;
virtual bool has_variable(const std::string& name) const = 0;
};
using env_ptr = std::shared_ptr<env>;
struct env_var : public env {
std::string name;
env_ptr parent;
env_var(std::string& n, env_ptr p)
: name(std::move(n)), parent(std::move(p)) {}
int get_offset(const std::string& name) const;
bool has_variable(const std::string& name) const;
};
struct env_offset : public env {
int offset;
env_ptr parent;
env_offset(int o, env_ptr p)
: offset(o), parent(std::move(p)) {}
int get_offset(const std::string& name) const;
bool has_variable(const std::string& name) const;
};

View File

@@ -0,0 +1,5 @@
#include "error.hpp"
const char* type_error::what() const noexcept {
return "an error occured while checking the types of the program";
}

View File

@@ -0,0 +1,21 @@
#pragma once
#include <exception>
#include "type.hpp"
struct type_error : std::exception {
std::string description;
type_error(std::string d)
: description(std::move(d)) {}
const char* what() const noexcept override;
};
struct unification_error : public type_error {
type_ptr left;
type_ptr right;
unification_error(type_ptr l, type_ptr r)
: left(std::move(l)), right(std::move(r)),
type_error("failed to unify types") {}
};

View File

@@ -0,0 +1,2 @@
data Bool = { True, False }
defn main = { 3 + True }

View File

@@ -0,0 +1 @@
defn main = { 1 2 3 4 5 }

View File

@@ -0,0 +1,8 @@
data List = { Nil, Cons Int List }
defn head l = {
case l of {
Nil -> { 0 }
Cons x y z -> { x }
}
}

View File

@@ -0,0 +1,2 @@
defn main = { plus 320 6 }
defn plus x y = { x + y }

View File

@@ -0,0 +1,3 @@
defn add x y = { x + y }
defn double x = { add x x }
defn main = { double 163 }

View File

@@ -0,0 +1,7 @@
data List = { Nil, Cons Int List }
defn length l = {
case l of {
Nil -> { 0 }
Cons x xs -> { 1 + length xs }
}
}

View File

@@ -0,0 +1,83 @@
#include "instruction.hpp"
static void print_indent(int n, std::ostream& to) {
while(n--) to << " ";
}
void instruction_pushint::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "PushInt(" << value << ")" << std::endl;
}
void instruction_pushglobal::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "PushGlobal(" << name << ")" << std::endl;
}
void instruction_push::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Push(" << offset << ")" << std::endl;
}
void instruction_pop::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Pop(" << count << ")" << std::endl;
}
void instruction_mkapp::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "MkApp()" << std::endl;
}
void instruction_update::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Update(" << offset << ")" << std::endl;
}
void instruction_pack::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Pack(" << tag << ", " << size << ")" << std::endl;
}
void instruction_split::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Split()" << std::endl;
}
void instruction_jump::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Jump(" << std::endl;
for(auto& instruction_set : branches) {
for(auto& instruction : instruction_set) {
instruction->print(indent + 2, to);
}
to << std::endl;
}
print_indent(indent, to);
to << ")" << std::endl;
}
void instruction_slide::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Slide(" << offset << ")" << std::endl;
}
void instruction_binop::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "BinOp(" << op_action(op) << ")" << std::endl;
}
void instruction_eval::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Eval()" << std::endl;
}
void instruction_alloc::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Alloc(" << amount << ")" << std::endl;
}
void instruction_unwind::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Unwind()" << std::endl;
}

View File

@@ -0,0 +1,120 @@
#pragma once
#include <string>
#include <memory>
#include <vector>
#include <map>
#include <ostream>
#include "binop.hpp"
struct instruction {
virtual ~instruction() = default;
virtual void print(int indent, std::ostream& to) const = 0;
};
using instruction_ptr = std::unique_ptr<instruction>;
struct instruction_pushint : public instruction {
int value;
instruction_pushint(int v)
: value(v) {}
void print(int indent, std::ostream& to) const;
};
struct instruction_pushglobal : public instruction {
std::string name;
instruction_pushglobal(std::string n)
: name(std::move(n)) {}
void print(int indent, std::ostream& to) const;
};
struct instruction_push : public instruction {
int offset;
instruction_push(int o)
: offset(o) {}
void print(int indent, std::ostream& to) const;
};
struct instruction_pop : public instruction {
int count;
instruction_pop(int c)
: count(c) {}
void print(int indent, std::ostream& to) const;
};
struct instruction_mkapp : public instruction {
void print(int indent, std::ostream& to) const;
};
struct instruction_update : public instruction {
int offset;
instruction_update(int o)
: offset(o) {}
void print(int indent, std::ostream& to) const;
};
struct instruction_pack : public instruction {
int tag;
int size;
instruction_pack(int t, int s)
: tag(t), size(s) {}
void print(int indent, std::ostream& to) const;
};
struct instruction_split : public instruction {
void print(int indent, std::ostream& to) const;
};
struct instruction_jump : public instruction {
std::vector<std::vector<instruction_ptr>> branches;
std::map<int, int> tag_mappings;
void print(int indent, std::ostream& to) const;
};
struct instruction_slide : public instruction {
int offset;
instruction_slide(int o)
: offset(o) {}
void print(int indent, std::ostream& to) const;
};
struct instruction_binop : public instruction {
binop op;
instruction_binop(binop o)
: op(o) {}
void print(int indent, std::ostream& to) const;
};
struct instruction_eval : public instruction {
void print(int indent, std::ostream& to) const;
};
struct instruction_alloc : public instruction {
int amount;
instruction_alloc(int a)
: amount(a) {}
void print(int indent, std::ostream& to) const;
};
struct instruction_unwind : public instruction {
void print(int indent, std::ostream& to) const;
};

88
code/compiler/06/main.cpp Normal file
View File

@@ -0,0 +1,88 @@
#include "ast.hpp"
#include <iostream>
#include "parser.hpp"
#include "error.hpp"
#include "type.hpp"
void yy::parser::error(const std::string& msg) {
std::cout << "An error occured: " << msg << std::endl;
}
extern std::vector<definition_ptr> program;
void typecheck_program(
const std::vector<definition_ptr>& prog,
type_mgr& mgr, type_env& env) {
type_ptr int_type = type_ptr(new type_base("Int"));
type_ptr binop_type = type_ptr(new type_arr(
int_type,
type_ptr(new type_arr(int_type, int_type))));
env.bind("+", binop_type);
env.bind("-", binop_type);
env.bind("*", binop_type);
env.bind("/", binop_type);
for(auto& def : prog) {
def->typecheck_first(mgr, env);
}
for(auto& def : prog) {
def->typecheck_second(mgr, env);
}
for(auto& pair : env.names) {
std::cout << pair.first << ": ";
pair.second->print(mgr, std::cout);
std::cout << std::endl;
}
for(auto& def : prog) {
def->resolve(mgr);
}
}
void compile_program(const std::vector<definition_ptr>& prog) {
for(auto& def : prog) {
def->compile();
definition_defn* defn = dynamic_cast<definition_defn*>(def.get());
if(!defn) continue;
for(auto& instruction : defn->instructions) {
instruction->print(0, std::cout);
}
std::cout << std::endl;
}
}
int main() {
yy::parser parser;
type_mgr mgr;
type_env env;
parser.parse();
for(auto& definition : program) {
definition_defn* def = dynamic_cast<definition_defn*>(definition.get());
if(!def) continue;
std::cout << def->name;
for(auto& param : def->params) std::cout << " " << param;
std::cout << ":" << std::endl;
def->body->print(1, std::cout);
}
try {
typecheck_program(program, mgr, env);
compile_program(program);
} catch(unification_error& err) {
std::cout << "failed to unify types: " << std::endl;
std::cout << " (1) \033[34m";
err.left->print(mgr, std::cout);
std::cout << "\033[0m" << std::endl;
std::cout << " (2) \033[32m";
err.right->print(mgr, std::cout);
std::cout << "\033[0m" << std::endl;
} catch(type_error& err) {
std::cout << "failed to type check program: " << err.description << std::endl;
}
}

140
code/compiler/06/parser.y Normal file
View File

@@ -0,0 +1,140 @@
%{
#include <string>
#include <iostream>
#include "ast.hpp"
#include "parser.hpp"
std::vector<definition_ptr> program;
extern yy::parser::symbol_type yylex();
%}
%token PLUS
%token TIMES
%token MINUS
%token DIVIDE
%token <int> INT
%token DEFN
%token DATA
%token CASE
%token OF
%token OCURLY
%token CCURLY
%token OPAREN
%token CPAREN
%token COMMA
%token ARROW
%token EQUAL
%token <std::string> LID
%token <std::string> UID
%language "c++"
%define api.value.type variant
%define api.token.constructor
%type <std::vector<std::string>> lowercaseParams uppercaseParams
%type <std::vector<definition_ptr>> program definitions
%type <std::vector<branch_ptr>> branches
%type <std::vector<constructor_ptr>> constructors
%type <ast_ptr> aAdd aMul case app appBase
%type <definition_ptr> definition defn data
%type <branch_ptr> branch
%type <pattern_ptr> pattern
%type <constructor_ptr> constructor
%start program
%%
program
: definitions { program = std::move($1); }
;
definitions
: definitions definition { $$ = std::move($1); $$.push_back(std::move($2)); }
| definition { $$ = std::vector<definition_ptr>(); $$.push_back(std::move($1)); }
;
definition
: defn { $$ = std::move($1); }
| data { $$ = std::move($1); }
;
defn
: DEFN LID lowercaseParams EQUAL OCURLY aAdd CCURLY
{ $$ = definition_ptr(
new definition_defn(std::move($2), std::move($3), std::move($6))); }
;
lowercaseParams
: %empty { $$ = std::vector<std::string>(); }
| lowercaseParams LID { $$ = std::move($1); $$.push_back(std::move($2)); }
;
uppercaseParams
: %empty { $$ = std::vector<std::string>(); }
| uppercaseParams UID { $$ = std::move($1); $$.push_back(std::move($2)); }
;
aAdd
: aAdd PLUS aMul { $$ = ast_ptr(new ast_binop(PLUS, std::move($1), std::move($3))); }
| aAdd MINUS aMul { $$ = ast_ptr(new ast_binop(MINUS, std::move($1), std::move($3))); }
| aMul { $$ = std::move($1); }
;
aMul
: aMul TIMES app { $$ = ast_ptr(new ast_binop(TIMES, std::move($1), std::move($3))); }
| aMul DIVIDE app { $$ = ast_ptr(new ast_binop(DIVIDE, std::move($1), std::move($3))); }
| app { $$ = std::move($1); }
;
app
: app appBase { $$ = ast_ptr(new ast_app(std::move($1), std::move($2))); }
| appBase { $$ = std::move($1); }
;
appBase
: INT { $$ = ast_ptr(new ast_int($1)); }
| LID { $$ = ast_ptr(new ast_lid(std::move($1))); }
| UID { $$ = ast_ptr(new ast_uid(std::move($1))); }
| OPAREN aAdd CPAREN { $$ = std::move($2); }
| case { $$ = std::move($1); }
;
case
: CASE aAdd OF OCURLY branches CCURLY
{ $$ = ast_ptr(new ast_case(std::move($2), std::move($5))); }
;
branches
: branches branch { $$ = std::move($1); $$.push_back(std::move($2)); }
| branch { $$ = std::vector<branch_ptr>(); $$.push_back(std::move($1));}
;
branch
: pattern ARROW OCURLY aAdd CCURLY
{ $$ = branch_ptr(new branch(std::move($1), std::move($4))); }
;
pattern
: LID { $$ = pattern_ptr(new pattern_var(std::move($1))); }
| UID lowercaseParams
{ $$ = pattern_ptr(new pattern_constr(std::move($1), std::move($2))); }
;
data
: DATA UID EQUAL OCURLY constructors CCURLY
{ $$ = definition_ptr(new definition_data(std::move($2), std::move($5))); }
;
constructors
: constructors COMMA constructor { $$ = std::move($1); $$.push_back(std::move($3)); }
| constructor
{ $$ = std::vector<constructor_ptr>(); $$.push_back(std::move($1)); }
;
constructor
: UID uppercaseParams
{ $$ = constructor_ptr(new constructor(std::move($1), std::move($2))); }
;

View File

@@ -0,0 +1,34 @@
%option noyywrap
%{
#include <iostream>
#include "ast.hpp"
#include "parser.hpp"
#define YY_DECL yy::parser::symbol_type yylex()
%}
%%
[ \n]+ {}
\+ { return yy::parser::make_PLUS(); }
\* { return yy::parser::make_TIMES(); }
- { return yy::parser::make_MINUS(); }
\/ { return yy::parser::make_DIVIDE(); }
[0-9]+ { return yy::parser::make_INT(atoi(yytext)); }
defn { return yy::parser::make_DEFN(); }
data { return yy::parser::make_DATA(); }
case { return yy::parser::make_CASE(); }
of { return yy::parser::make_OF(); }
\{ { return yy::parser::make_OCURLY(); }
\} { return yy::parser::make_CCURLY(); }
\( { return yy::parser::make_OPAREN(); }
\) { return yy::parser::make_CPAREN(); }
, { return yy::parser::make_COMMA(); }
-> { return yy::parser::make_ARROW(); }
= { return yy::parser::make_EQUAL(); }
[a-z][a-zA-Z]* { return yy::parser::make_LID(std::string(yytext)); }
[A-Z][a-zA-Z]* { return yy::parser::make_UID(std::string(yytext)); }
%%

99
code/compiler/06/type.cpp Normal file
View File

@@ -0,0 +1,99 @@
#include "type.hpp"
#include <sstream>
#include <algorithm>
#include "error.hpp"
void type_var::print(const type_mgr& mgr, std::ostream& to) const {
auto it = mgr.types.find(name);
if(it != mgr.types.end()) {
it->second->print(mgr, to);
} else {
to << name;
}
}
void type_base::print(const type_mgr& mgr, std::ostream& to) const {
to << name;
}
void type_arr::print(const type_mgr& mgr, std::ostream& to) const {
left->print(mgr, to);
to << " -> (";
right->print(mgr, to);
to << ")";
}
std::string type_mgr::new_type_name() {
int temp = last_id++;
std::string str = "";
while(temp != -1) {
str += (char) ('a' + (temp % 26));
temp = temp / 26 - 1;
}
std::reverse(str.begin(), str.end());
return str;
}
type_ptr type_mgr::new_type() {
return type_ptr(new type_var(new_type_name()));
}
type_ptr type_mgr::new_arrow_type() {
return type_ptr(new type_arr(new_type(), new_type()));
}
type_ptr type_mgr::resolve(type_ptr t, type_var*& var) const {
type_var* cast;
var = nullptr;
while((cast = dynamic_cast<type_var*>(t.get()))) {
auto it = types.find(cast->name);
if(it == types.end()) {
var = cast;
break;
}
t = it->second;
}
return t;
}
void type_mgr::unify(type_ptr l, type_ptr r) {
type_var* lvar;
type_var* rvar;
type_arr* larr;
type_arr* rarr;
type_base* lid;
type_base* rid;
l = resolve(l, lvar);
r = resolve(r, rvar);
if(lvar) {
bind(lvar->name, r);
return;
} else if(rvar) {
bind(rvar->name, l);
return;
} else if((larr = dynamic_cast<type_arr*>(l.get())) &&
(rarr = dynamic_cast<type_arr*>(r.get()))) {
unify(larr->left, rarr->left);
unify(larr->right, rarr->right);
return;
} else if((lid = dynamic_cast<type_base*>(l.get())) &&
(rid = dynamic_cast<type_base*>(r.get()))) {
if(lid->name == rid->name) return;
}
throw unification_error(l, r);
}
void type_mgr::bind(const std::string& s, type_ptr t) {
type_var* other = dynamic_cast<type_var*>(t.get());
if(other && other->name == s) return;
types[s] = t;
}

65
code/compiler/06/type.hpp Normal file
View File

@@ -0,0 +1,65 @@
#pragma once
#include <memory>
#include <map>
struct type_mgr;
struct type {
virtual ~type() = default;
virtual void print(const type_mgr& mgr, std::ostream& to) const = 0;
};
using type_ptr = std::shared_ptr<type>;
struct type_var : public type {
std::string name;
type_var(std::string n)
: name(std::move(n)) {}
void print(const type_mgr& mgr, std::ostream& to) const;
};
struct type_base : public type {
std::string name;
type_base(std::string n)
: name(std::move(n)) {}
void print(const type_mgr& mgr, std::ostream& to) const;
};
struct type_data : public type_base {
struct constructor {
int tag;
};
std::map<std::string, constructor> constructors;
type_data(std::string n)
: type_base(std::move(n)) {}
};
struct type_arr : public type {
type_ptr left;
type_ptr right;
type_arr(type_ptr l, type_ptr r)
: left(std::move(l)), right(std::move(r)) {}
void print(const type_mgr& mgr, std::ostream& to) const;
};
struct type_mgr {
int last_id = 0;
std::map<std::string, type_ptr> types;
std::string new_type_name();
type_ptr new_type();
type_ptr new_arrow_type();
void unify(type_ptr l, type_ptr r);
type_ptr resolve(type_ptr t, type_var*& var) const;
void bind(const std::string& s, type_ptr t);
};

View File

@@ -0,0 +1,16 @@
#include "type_env.hpp"
type_ptr type_env::lookup(const std::string& name) const {
auto it = names.find(name);
if(it != names.end()) return it->second;
if(parent) return parent->lookup(name);
return nullptr;
}
void type_env::bind(const std::string& name, type_ptr t) {
names[name] = t;
}
type_env type_env::scope() const {
return type_env(this);
}

View File

@@ -0,0 +1,16 @@
#pragma once
#include <map>
#include "type.hpp"
struct type_env {
std::map<std::string, type_ptr> names;
type_env const* parent = nullptr;
type_env(type_env const* p)
: parent(p) {}
type_env() : type_env(nullptr) {}
type_ptr lookup(const std::string& name) const;
void bind(const std::string& name, type_ptr t);
type_env scope() const;
};

View File

@@ -0,0 +1,28 @@
cmake_minimum_required(VERSION 3.1)
project(compiler)
find_package(BISON)
find_package(FLEX)
bison_target(parser
${CMAKE_CURRENT_SOURCE_DIR}/parser.y
${CMAKE_CURRENT_BINARY_DIR}/parser.cpp
COMPILE_FLAGS "-d")
flex_target(scanner
${CMAKE_CURRENT_SOURCE_DIR}/scanner.l
${CMAKE_CURRENT_BINARY_DIR}/scanner.cpp)
add_flex_bison_dependency(scanner parser)
add_executable(compiler
ast.cpp ast.hpp definition.cpp
type_env.cpp type_env.hpp
env.cpp env.hpp
type.cpp type.hpp
error.cpp error.hpp
binop.cpp binop.hpp
instruction.cpp instruction.hpp
${BISON_parser_OUTPUTS}
${FLEX_scanner_OUTPUTS}
main.cpp
)
target_include_directories(compiler PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})
target_include_directories(compiler PUBLIC ${CMAKE_CURRENT_BINARY_DIR})

262
code/compiler/07/ast.cpp Normal file
View File

@@ -0,0 +1,262 @@
#include "ast.hpp"
#include <ostream>
#include "error.hpp"
static void print_indent(int n, std::ostream& to) {
while(n--) to << " ";
}
type_ptr ast::typecheck_common(type_mgr& mgr, const type_env& env) {
node_type = typecheck(mgr, env);
return node_type;
}
void ast::resolve_common(const type_mgr& mgr) {
type_var* var;
type_ptr resolved_type = mgr.resolve(node_type, var);
if(var) throw type_error("ambiguously typed program");
resolve(mgr);
node_type = std::move(resolved_type);
}
void ast_int::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "INT: " << value << std::endl;
}
type_ptr ast_int::typecheck(type_mgr& mgr, const type_env& env) const {
return type_ptr(new type_base("Int"));
}
void ast_int::resolve(const type_mgr& mgr) const {
}
void ast_int::compile(const env_ptr& env, std::vector<instruction_ptr>& into) const {
into.push_back(instruction_ptr(new instruction_pushint(value)));
}
void ast_lid::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "LID: " << id << std::endl;
}
type_ptr ast_lid::typecheck(type_mgr& mgr, const type_env& env) const {
return env.lookup(id);
}
void ast_lid::resolve(const type_mgr& mgr) const {
}
void ast_lid::compile(const env_ptr& env, std::vector<instruction_ptr>& into) const {
into.push_back(instruction_ptr(
env->has_variable(id) ?
(instruction*) new instruction_push(env->get_offset(id)) :
(instruction*) new instruction_pushglobal(id)));
}
void ast_uid::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "UID: " << id << std::endl;
}
type_ptr ast_uid::typecheck(type_mgr& mgr, const type_env& env) const {
return env.lookup(id);
}
void ast_uid::resolve(const type_mgr& mgr) const {
}
void ast_uid::compile(const env_ptr& env, std::vector<instruction_ptr>& into) const {
into.push_back(instruction_ptr(new instruction_pushglobal(id)));
}
void ast_binop::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "BINOP: " << op_name(op) << std::endl;
left->print(indent + 1, to);
right->print(indent + 1, to);
}
type_ptr ast_binop::typecheck(type_mgr& mgr, const type_env& env) const {
type_ptr ltype = left->typecheck_common(mgr, env);
type_ptr rtype = right->typecheck_common(mgr, env);
type_ptr ftype = env.lookup(op_name(op));
if(!ftype) throw type_error(std::string("unknown binary operator ") + op_name(op));
type_ptr return_type = mgr.new_type();
type_ptr arrow_one = type_ptr(new type_arr(rtype, return_type));
type_ptr arrow_two = type_ptr(new type_arr(ltype, arrow_one));
mgr.unify(arrow_two, ftype);
return return_type;
}
void ast_binop::resolve(const type_mgr& mgr) const {
left->resolve_common(mgr);
right->resolve_common(mgr);
}
void ast_binop::compile(const env_ptr& env, std::vector<instruction_ptr>& into) const {
right->compile(env, into);
left->compile(env_ptr(new env_offset(1, env)), into);
into.push_back(instruction_ptr(new instruction_pushglobal(op_action(op))));
into.push_back(instruction_ptr(new instruction_mkapp()));
into.push_back(instruction_ptr(new instruction_mkapp()));
}
void ast_app::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "APP:" << std::endl;
left->print(indent + 1, to);
right->print(indent + 1, to);
}
type_ptr ast_app::typecheck(type_mgr& mgr, const type_env& env) const {
type_ptr ltype = left->typecheck_common(mgr, env);
type_ptr rtype = right->typecheck_common(mgr, env);
type_ptr return_type = mgr.new_type();
type_ptr arrow = type_ptr(new type_arr(rtype, return_type));
mgr.unify(arrow, ltype);
return return_type;
}
void ast_app::resolve(const type_mgr& mgr) const {
left->resolve_common(mgr);
right->resolve_common(mgr);
}
void ast_app::compile(const env_ptr& env, std::vector<instruction_ptr>& into) const {
right->compile(env, into);
left->compile(env_ptr(new env_offset(1, env)), into);
into.push_back(instruction_ptr(new instruction_mkapp()));
}
void ast_case::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "CASE: " << std::endl;
for(auto& branch : branches) {
print_indent(indent + 1, to);
branch->pat->print(to);
to << std::endl;
branch->expr->print(indent + 2, to);
}
}
type_ptr ast_case::typecheck(type_mgr& mgr, const type_env& env) const {
type_var* var;
type_ptr case_type = mgr.resolve(of->typecheck_common(mgr, env), var);
type_ptr branch_type = mgr.new_type();
for(auto& branch : branches) {
type_env new_env = env.scope();
branch->pat->match(case_type, mgr, new_env);
type_ptr curr_branch_type = branch->expr->typecheck_common(mgr, new_env);
mgr.unify(branch_type, curr_branch_type);
}
case_type = mgr.resolve(case_type, var);
if(!dynamic_cast<type_data*>(case_type.get())) {
throw type_error("attempting case analysis of non-data type");
}
return branch_type;
}
void ast_case::resolve(const type_mgr& mgr) const {
of->resolve_common(mgr);
for(auto& branch : branches) {
branch->expr->resolve_common(mgr);
}
}
void ast_case::compile(const env_ptr& env, std::vector<instruction_ptr>& into) const {
type_data* type = dynamic_cast<type_data*>(of->node_type.get());
of->compile(env, into);
into.push_back(instruction_ptr(new instruction_eval()));
instruction_jump* jump_instruction = new instruction_jump();
into.push_back(instruction_ptr(jump_instruction));
for(auto& branch : branches) {
std::vector<instruction_ptr> branch_instructions;
pattern_var* vpat;
pattern_constr* cpat;
if((vpat = dynamic_cast<pattern_var*>(branch->pat.get()))) {
branch->expr->compile(env_ptr(new env_offset(1, env)), branch_instructions);
for(auto& constr_pair : type->constructors) {
if(jump_instruction->tag_mappings.find(constr_pair.second.tag) !=
jump_instruction->tag_mappings.end())
break;
jump_instruction->tag_mappings[constr_pair.second.tag] =
jump_instruction->branches.size();
}
jump_instruction->branches.push_back(std::move(branch_instructions));
} else if((cpat = dynamic_cast<pattern_constr*>(branch->pat.get()))) {
env_ptr new_env = env;
for(auto it = cpat->params.rbegin(); it != cpat->params.rend(); it++) {
new_env = env_ptr(new env_var(*it, new_env));
}
branch_instructions.push_back(instruction_ptr(new instruction_split()));
branch->expr->compile(new_env, branch_instructions);
branch_instructions.push_back(instruction_ptr(new instruction_slide(
cpat->params.size())));
int new_tag = type->constructors[cpat->constr].tag;
if(jump_instruction->tag_mappings.find(new_tag) !=
jump_instruction->tag_mappings.end())
throw type_error("technically not a type error: duplicate pattern");
jump_instruction->tag_mappings[new_tag] =
jump_instruction->branches.size();
jump_instruction->branches.push_back(std::move(branch_instructions));
}
}
for(auto& constr_pair : type->constructors) {
if(jump_instruction->tag_mappings.find(constr_pair.second.tag) ==
jump_instruction->tag_mappings.end())
throw type_error("non-total pattern");
}
}
void pattern_var::print(std::ostream& to) const {
to << var;
}
void pattern_var::match(type_ptr t, type_mgr& mgr, type_env& env) const {
env.bind(var, t);
}
void pattern_constr::print(std::ostream& to) const {
to << constr;
for(auto& param : params) {
to << " " << param;
}
}
void pattern_constr::match(type_ptr t, type_mgr& mgr, type_env& env) const {
type_ptr constructor_type = env.lookup(constr);
if(!constructor_type) {
throw type_error(std::string("pattern using unknown constructor ") + constr);
}
for(int i = 0; i < params.size(); i++) {
type_arr* arr = dynamic_cast<type_arr*>(constructor_type.get());
if(!arr) throw type_error("too many parameters in constructor pattern");
env.bind(params[i], arr->left);
constructor_type = arr->right;
}
mgr.unify(t, constructor_type);
}

197
code/compiler/07/ast.hpp Normal file
View File

@@ -0,0 +1,197 @@
#pragma once
#include <memory>
#include <vector>
#include "type.hpp"
#include "type_env.hpp"
#include "binop.hpp"
#include "instruction.hpp"
#include "env.hpp"
struct ast {
type_ptr node_type;
virtual ~ast() = default;
virtual void print(int indent, std::ostream& to) const = 0;
virtual type_ptr typecheck(type_mgr& mgr, const type_env& env) const = 0;
virtual void resolve(const type_mgr& mgr) const = 0;
virtual void compile(const env_ptr& env,
std::vector<instruction_ptr>& into) const = 0;
type_ptr typecheck_common(type_mgr& mgr, const type_env& env);
void resolve_common(const type_mgr& mgr);
};
using ast_ptr = std::unique_ptr<ast>;
struct pattern {
virtual ~pattern() = default;
virtual void print(std::ostream& to) const = 0;
virtual void match(type_ptr t, type_mgr& mgr, type_env& env) const = 0;
};
using pattern_ptr = std::unique_ptr<pattern>;
struct branch {
pattern_ptr pat;
ast_ptr expr;
branch(pattern_ptr p, ast_ptr a)
: pat(std::move(p)), expr(std::move(a)) {}
};
using branch_ptr = std::unique_ptr<branch>;
struct constructor {
std::string name;
std::vector<std::string> types;
int8_t tag;
constructor(std::string n, std::vector<std::string> ts)
: name(std::move(n)), types(std::move(ts)) {}
};
using constructor_ptr = std::unique_ptr<constructor>;
struct definition {
virtual ~definition() = default;
virtual void typecheck_first(type_mgr& mgr, type_env& env) = 0;
virtual void typecheck_second(type_mgr& mgr, const type_env& env) const = 0;
virtual void resolve(const type_mgr& mgr) = 0;
virtual void compile() = 0;
};
using definition_ptr = std::unique_ptr<definition>;
struct ast_int : public ast {
int value;
explicit ast_int(int v)
: value(v) {}
void print(int indent, std::ostream& to) const;
type_ptr typecheck(type_mgr& mgr, const type_env& env) const;
void resolve(const type_mgr& mgr) const;
void compile(const env_ptr& env, std::vector<instruction_ptr>& into) const;
};
struct ast_lid : public ast {
std::string id;
explicit ast_lid(std::string i)
: id(std::move(i)) {}
void print(int indent, std::ostream& to) const;
type_ptr typecheck(type_mgr& mgr, const type_env& env) const;
void resolve(const type_mgr& mgr) const;
void compile(const env_ptr& env, std::vector<instruction_ptr>& into) const;
};
struct ast_uid : public ast {
std::string id;
explicit ast_uid(std::string i)
: id(std::move(i)) {}
void print(int indent, std::ostream& to) const;
type_ptr typecheck(type_mgr& mgr, const type_env& env) const;
void resolve(const type_mgr& mgr) const;
void compile(const env_ptr& env, std::vector<instruction_ptr>& into) const;
};
struct ast_binop : public ast {
binop op;
ast_ptr left;
ast_ptr right;
ast_binop(binop o, ast_ptr l, ast_ptr r)
: op(o), left(std::move(l)), right(std::move(r)) {}
void print(int indent, std::ostream& to) const;
type_ptr typecheck(type_mgr& mgr, const type_env& env) const;
void resolve(const type_mgr& mgr) const;
void compile(const env_ptr& env, std::vector<instruction_ptr>& into) const;
};
struct ast_app : public ast {
ast_ptr left;
ast_ptr right;
ast_app(ast_ptr l, ast_ptr r)
: left(std::move(l)), right(std::move(r)) {}
void print(int indent, std::ostream& to) const;
type_ptr typecheck(type_mgr& mgr, const type_env& env) const;
void resolve(const type_mgr& mgr) const;
void compile(const env_ptr& env, std::vector<instruction_ptr>& into) const;
};
struct ast_case : public ast {
ast_ptr of;
std::vector<branch_ptr> branches;
ast_case(ast_ptr o, std::vector<branch_ptr> b)
: of(std::move(o)), branches(std::move(b)) {}
void print(int indent, std::ostream& to) const;
type_ptr typecheck(type_mgr& mgr, const type_env& env) const;
void resolve(const type_mgr& mgr) const;
void compile(const env_ptr& env, std::vector<instruction_ptr>& into) const;
};
struct pattern_var : public pattern {
std::string var;
pattern_var(std::string v)
: var(std::move(v)) {}
void print(std::ostream &to) const;
void match(type_ptr t, type_mgr& mgr, type_env& env) const;
};
struct pattern_constr : public pattern {
std::string constr;
std::vector<std::string> params;
pattern_constr(std::string c, std::vector<std::string> p)
: constr(std::move(c)), params(std::move(p)) {}
void print(std::ostream &to) const;
void match(type_ptr t, type_mgr&, type_env& env) const;
};
struct definition_defn : public definition {
std::string name;
std::vector<std::string> params;
ast_ptr body;
type_ptr return_type;
std::vector<type_ptr> param_types;
std::vector<instruction_ptr> instructions;
definition_defn(std::string n, std::vector<std::string> p, ast_ptr b)
: name(std::move(n)), params(std::move(p)), body(std::move(b)) {
}
void typecheck_first(type_mgr& mgr, type_env& env);
void typecheck_second(type_mgr& mgr, const type_env& env) const;
void resolve(const type_mgr& mgr);
void compile();
};
struct definition_data : public definition {
std::string name;
std::vector<constructor_ptr> constructors;
definition_data(std::string n, std::vector<constructor_ptr> cs)
: name(std::move(n)), constructors(std::move(cs)) {}
void typecheck_first(type_mgr& mgr, type_env& env);
void typecheck_second(type_mgr& mgr, const type_env& env) const;
void resolve(const type_mgr& mgr);
void compile();
};

View File

@@ -0,0 +1,21 @@
#include "binop.hpp"
std::string op_name(binop op) {
switch(op) {
case PLUS: return "+";
case MINUS: return "-";
case TIMES: return "*";
case DIVIDE: return "/";
}
return "??";
}
std::string op_action(binop op) {
switch(op) {
case PLUS: return "plus";
case MINUS: return "minus";
case TIMES: return "times";
case DIVIDE: return "divide";
}
return "??";
}

View File

@@ -0,0 +1,12 @@
#pragma once
#include <string>
enum binop {
PLUS,
MINUS,
TIMES,
DIVIDE
};
std::string op_name(binop op);
std::string op_action(binop op);

View File

@@ -0,0 +1,83 @@
#include "ast.hpp"
#include "error.hpp"
void definition_defn::typecheck_first(type_mgr& mgr, type_env& env) {
return_type = mgr.new_type();
type_ptr full_type = return_type;
for(auto it = params.rbegin(); it != params.rend(); it++) {
type_ptr param_type = mgr.new_type();
full_type = type_ptr(new type_arr(param_type, full_type));
param_types.push_back(param_type);
}
env.bind(name, full_type);
}
void definition_defn::typecheck_second(type_mgr& mgr, const type_env& env) const {
type_env new_env = env.scope();
auto param_it = params.begin();
auto type_it = param_types.rbegin();
while(param_it != params.end() && type_it != param_types.rend()) {
new_env.bind(*param_it, *type_it);
param_it++;
type_it++;
}
type_ptr body_type = body->typecheck_common(mgr, new_env);
mgr.unify(return_type, body_type);
}
void definition_defn::resolve(const type_mgr& mgr) {
type_var* var;
body->resolve_common(mgr);
return_type = mgr.resolve(return_type, var);
if(var) throw type_error("ambiguously typed program");
for(auto& param_type : param_types) {
param_type = mgr.resolve(param_type, var);
if(var) throw type_error("ambiguously typed program");
}
}
void definition_defn::compile() {
env_ptr new_env = env_ptr(new env_offset(0, nullptr));
for(auto it = params.rbegin(); it != params.rend(); it++) {
new_env = env_ptr(new env_var(*it, new_env));
}
body->compile(new_env, instructions);
instructions.push_back(instruction_ptr(new instruction_update(params.size())));
instructions.push_back(instruction_ptr(new instruction_pop(params.size())));
}
void definition_data::typecheck_first(type_mgr& mgr, type_env& env) {
type_data* this_type = new type_data(name);
type_ptr return_type = type_ptr(this_type);
int next_tag = 0;
for(auto& constructor : constructors) {
constructor->tag = next_tag;
this_type->constructors[constructor->name] = { next_tag++ };
type_ptr full_type = return_type;
for(auto it = constructor->types.rbegin(); it != constructor->types.rend(); it++) {
type_ptr type = type_ptr(new type_base(*it));
full_type = type_ptr(new type_arr(type, full_type));
}
env.bind(constructor->name, full_type);
}
}
void definition_data::typecheck_second(type_mgr& mgr, const type_env& env) const {
// Nothing
}
void definition_data::resolve(const type_mgr& mgr) {
// Nothing
}
void definition_data::compile() {
}

23
code/compiler/07/env.cpp Normal file
View File

@@ -0,0 +1,23 @@
#include "env.hpp"
int env_var::get_offset(const std::string& name) const {
if(name == this->name) return 0;
if(parent) return parent->get_offset(name) + 1;
throw 0;
}
bool env_var::has_variable(const std::string& name) const {
if(name == this->name) return true;
if(parent) return parent->has_variable(name);
return false;
}
int env_offset::get_offset(const std::string& name) const {
if(parent) return parent->get_offset(name) + offset;
throw 0;
}
bool env_offset::has_variable(const std::string& name) const {
if(parent) return parent->has_variable(name);
return false;
}

34
code/compiler/07/env.hpp Normal file
View File

@@ -0,0 +1,34 @@
#pragma once
#include <memory>
#include <string>
struct env {
virtual ~env() = default;
virtual int get_offset(const std::string& name) const = 0;
virtual bool has_variable(const std::string& name) const = 0;
};
using env_ptr = std::shared_ptr<env>;
struct env_var : public env {
std::string name;
env_ptr parent;
env_var(std::string& n, env_ptr p)
: name(std::move(n)), parent(std::move(p)) {}
int get_offset(const std::string& name) const;
bool has_variable(const std::string& name) const;
};
struct env_offset : public env {
int offset;
env_ptr parent;
env_offset(int o, env_ptr p)
: offset(o), parent(std::move(p)) {}
int get_offset(const std::string& name) const;
bool has_variable(const std::string& name) const;
};

View File

@@ -0,0 +1,5 @@
#include "error.hpp"
const char* type_error::what() const noexcept {
return "an error occured while checking the types of the program";
}

View File

@@ -0,0 +1,21 @@
#pragma once
#include <exception>
#include "type.hpp"
struct type_error : std::exception {
std::string description;
type_error(std::string d)
: description(std::move(d)) {}
const char* what() const noexcept override;
};
struct unification_error : public type_error {
type_ptr left;
type_ptr right;
unification_error(type_ptr l, type_ptr r)
: left(std::move(l)), right(std::move(r)),
type_error("failed to unify types") {}
};

View File

@@ -0,0 +1,2 @@
data Bool = { True, False }
defn main = { 3 + True }

View File

@@ -0,0 +1 @@
defn main = { 1 2 3 4 5 }

View File

@@ -0,0 +1,8 @@
data List = { Nil, Cons Int List }
defn head l = {
case l of {
Nil -> { 0 }
Cons x y z -> { x }
}
}

View File

@@ -0,0 +1,31 @@
#include "../runtime.h"
void f_add(struct stack* s) {
struct node_num* left = (struct node_num*) eval(stack_peek(s, 0));
struct node_num* right = (struct node_num*) eval(stack_peek(s, 1));
stack_push(s, (struct node_base*) alloc_num(left->value + right->value));
}
void f_main(struct stack* s) {
// PushInt 320
stack_push(s, (struct node_base*) alloc_num(320));
// PushInt 6
stack_push(s, (struct node_base*) alloc_num(6));
// PushGlobal f_add (the function for +)
stack_push(s, (struct node_base*) alloc_global(f_add, 2));
struct node_base* left;
struct node_base* right;
// MkApp
left = stack_pop(s);
right = stack_pop(s);
stack_push(s, (struct node_base*) alloc_app(left, right));
// MkApp
left = stack_pop(s);
right = stack_pop(s);
stack_push(s, (struct node_base*) alloc_app(left, right));
}

View File

@@ -0,0 +1,2 @@
defn main = { plus 320 6 }
defn plus x y = { x + y }

View File

@@ -0,0 +1,3 @@
defn add x y = { x + y }
defn double x = { add x x }
defn main = { double 163 }

View File

@@ -0,0 +1,7 @@
data List = { Nil, Cons Int List }
defn length l = {
case l of {
Nil -> { 0 }
Cons x xs -> { 1 + length xs }
}
}

View File

@@ -0,0 +1,83 @@
#include "instruction.hpp"
static void print_indent(int n, std::ostream& to) {
while(n--) to << " ";
}
void instruction_pushint::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "PushInt(" << value << ")" << std::endl;
}
void instruction_pushglobal::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "PushGlobal(" << name << ")" << std::endl;
}
void instruction_push::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Push(" << offset << ")" << std::endl;
}
void instruction_pop::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Pop(" << count << ")" << std::endl;
}
void instruction_mkapp::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "MkApp()" << std::endl;
}
void instruction_update::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Update(" << offset << ")" << std::endl;
}
void instruction_pack::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Pack(" << tag << ", " << size << ")" << std::endl;
}
void instruction_split::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Split()" << std::endl;
}
void instruction_jump::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Jump(" << std::endl;
for(auto& instruction_set : branches) {
for(auto& instruction : instruction_set) {
instruction->print(indent + 2, to);
}
to << std::endl;
}
print_indent(indent, to);
to << ")" << std::endl;
}
void instruction_slide::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Slide(" << offset << ")" << std::endl;
}
void instruction_binop::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "BinOp(" << op_action(op) << ")" << std::endl;
}
void instruction_eval::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Eval()" << std::endl;
}
void instruction_alloc::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Alloc(" << amount << ")" << std::endl;
}
void instruction_unwind::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Unwind()" << std::endl;
}

View File

@@ -0,0 +1,120 @@
#pragma once
#include <string>
#include <memory>
#include <vector>
#include <map>
#include <ostream>
#include "binop.hpp"
struct instruction {
virtual ~instruction() = default;
virtual void print(int indent, std::ostream& to) const = 0;
};
using instruction_ptr = std::unique_ptr<instruction>;
struct instruction_pushint : public instruction {
int value;
instruction_pushint(int v)
: value(v) {}
void print(int indent, std::ostream& to) const;
};
struct instruction_pushglobal : public instruction {
std::string name;
instruction_pushglobal(std::string n)
: name(std::move(n)) {}
void print(int indent, std::ostream& to) const;
};
struct instruction_push : public instruction {
int offset;
instruction_push(int o)
: offset(o) {}
void print(int indent, std::ostream& to) const;
};
struct instruction_pop : public instruction {
int count;
instruction_pop(int c)
: count(c) {}
void print(int indent, std::ostream& to) const;
};
struct instruction_mkapp : public instruction {
void print(int indent, std::ostream& to) const;
};
struct instruction_update : public instruction {
int offset;
instruction_update(int o)
: offset(o) {}
void print(int indent, std::ostream& to) const;
};
struct instruction_pack : public instruction {
int tag;
int size;
instruction_pack(int t, int s)
: tag(t), size(s) {}
void print(int indent, std::ostream& to) const;
};
struct instruction_split : public instruction {
void print(int indent, std::ostream& to) const;
};
struct instruction_jump : public instruction {
std::vector<std::vector<instruction_ptr>> branches;
std::map<int, int> tag_mappings;
void print(int indent, std::ostream& to) const;
};
struct instruction_slide : public instruction {
int offset;
instruction_slide(int o)
: offset(o) {}
void print(int indent, std::ostream& to) const;
};
struct instruction_binop : public instruction {
binop op;
instruction_binop(binop o)
: op(o) {}
void print(int indent, std::ostream& to) const;
};
struct instruction_eval : public instruction {
void print(int indent, std::ostream& to) const;
};
struct instruction_alloc : public instruction {
int amount;
instruction_alloc(int a)
: amount(a) {}
void print(int indent, std::ostream& to) const;
};
struct instruction_unwind : public instruction {
void print(int indent, std::ostream& to) const;
};

88
code/compiler/07/main.cpp Normal file
View File

@@ -0,0 +1,88 @@
#include "ast.hpp"
#include <iostream>
#include "parser.hpp"
#include "error.hpp"
#include "type.hpp"
void yy::parser::error(const std::string& msg) {
std::cout << "An error occured: " << msg << std::endl;
}
extern std::vector<definition_ptr> program;
void typecheck_program(
const std::vector<definition_ptr>& prog,
type_mgr& mgr, type_env& env) {
type_ptr int_type = type_ptr(new type_base("Int"));
type_ptr binop_type = type_ptr(new type_arr(
int_type,
type_ptr(new type_arr(int_type, int_type))));
env.bind("+", binop_type);
env.bind("-", binop_type);
env.bind("*", binop_type);
env.bind("/", binop_type);
for(auto& def : prog) {
def->typecheck_first(mgr, env);
}
for(auto& def : prog) {
def->typecheck_second(mgr, env);
}
for(auto& pair : env.names) {
std::cout << pair.first << ": ";
pair.second->print(mgr, std::cout);
std::cout << std::endl;
}
for(auto& def : prog) {
def->resolve(mgr);
}
}
void compile_program(const std::vector<definition_ptr>& prog) {
for(auto& def : prog) {
def->compile();
definition_defn* defn = dynamic_cast<definition_defn*>(def.get());
if(!defn) continue;
for(auto& instruction : defn->instructions) {
instruction->print(0, std::cout);
}
std::cout << std::endl;
}
}
int main() {
yy::parser parser;
type_mgr mgr;
type_env env;
parser.parse();
for(auto& definition : program) {
definition_defn* def = dynamic_cast<definition_defn*>(definition.get());
if(!def) continue;
std::cout << def->name;
for(auto& param : def->params) std::cout << " " << param;
std::cout << ":" << std::endl;
def->body->print(1, std::cout);
}
try {
typecheck_program(program, mgr, env);
compile_program(program);
} catch(unification_error& err) {
std::cout << "failed to unify types: " << std::endl;
std::cout << " (1) \033[34m";
err.left->print(mgr, std::cout);
std::cout << "\033[0m" << std::endl;
std::cout << " (2) \033[32m";
err.right->print(mgr, std::cout);
std::cout << "\033[0m" << std::endl;
} catch(type_error& err) {
std::cout << "failed to type check program: " << err.description << std::endl;
}
}

140
code/compiler/07/parser.y Normal file
View File

@@ -0,0 +1,140 @@
%{
#include <string>
#include <iostream>
#include "ast.hpp"
#include "parser.hpp"
std::vector<definition_ptr> program;
extern yy::parser::symbol_type yylex();
%}
%token PLUS
%token TIMES
%token MINUS
%token DIVIDE
%token <int> INT
%token DEFN
%token DATA
%token CASE
%token OF
%token OCURLY
%token CCURLY
%token OPAREN
%token CPAREN
%token COMMA
%token ARROW
%token EQUAL
%token <std::string> LID
%token <std::string> UID
%language "c++"
%define api.value.type variant
%define api.token.constructor
%type <std::vector<std::string>> lowercaseParams uppercaseParams
%type <std::vector<definition_ptr>> program definitions
%type <std::vector<branch_ptr>> branches
%type <std::vector<constructor_ptr>> constructors
%type <ast_ptr> aAdd aMul case app appBase
%type <definition_ptr> definition defn data
%type <branch_ptr> branch
%type <pattern_ptr> pattern
%type <constructor_ptr> constructor
%start program
%%
program
: definitions { program = std::move($1); }
;
definitions
: definitions definition { $$ = std::move($1); $$.push_back(std::move($2)); }
| definition { $$ = std::vector<definition_ptr>(); $$.push_back(std::move($1)); }
;
definition
: defn { $$ = std::move($1); }
| data { $$ = std::move($1); }
;
defn
: DEFN LID lowercaseParams EQUAL OCURLY aAdd CCURLY
{ $$ = definition_ptr(
new definition_defn(std::move($2), std::move($3), std::move($6))); }
;
lowercaseParams
: %empty { $$ = std::vector<std::string>(); }
| lowercaseParams LID { $$ = std::move($1); $$.push_back(std::move($2)); }
;
uppercaseParams
: %empty { $$ = std::vector<std::string>(); }
| uppercaseParams UID { $$ = std::move($1); $$.push_back(std::move($2)); }
;
aAdd
: aAdd PLUS aMul { $$ = ast_ptr(new ast_binop(PLUS, std::move($1), std::move($3))); }
| aAdd MINUS aMul { $$ = ast_ptr(new ast_binop(MINUS, std::move($1), std::move($3))); }
| aMul { $$ = std::move($1); }
;
aMul
: aMul TIMES app { $$ = ast_ptr(new ast_binop(TIMES, std::move($1), std::move($3))); }
| aMul DIVIDE app { $$ = ast_ptr(new ast_binop(DIVIDE, std::move($1), std::move($3))); }
| app { $$ = std::move($1); }
;
app
: app appBase { $$ = ast_ptr(new ast_app(std::move($1), std::move($2))); }
| appBase { $$ = std::move($1); }
;
appBase
: INT { $$ = ast_ptr(new ast_int($1)); }
| LID { $$ = ast_ptr(new ast_lid(std::move($1))); }
| UID { $$ = ast_ptr(new ast_uid(std::move($1))); }
| OPAREN aAdd CPAREN { $$ = std::move($2); }
| case { $$ = std::move($1); }
;
case
: CASE aAdd OF OCURLY branches CCURLY
{ $$ = ast_ptr(new ast_case(std::move($2), std::move($5))); }
;
branches
: branches branch { $$ = std::move($1); $$.push_back(std::move($2)); }
| branch { $$ = std::vector<branch_ptr>(); $$.push_back(std::move($1));}
;
branch
: pattern ARROW OCURLY aAdd CCURLY
{ $$ = branch_ptr(new branch(std::move($1), std::move($4))); }
;
pattern
: LID { $$ = pattern_ptr(new pattern_var(std::move($1))); }
| UID lowercaseParams
{ $$ = pattern_ptr(new pattern_constr(std::move($1), std::move($2))); }
;
data
: DATA UID EQUAL OCURLY constructors CCURLY
{ $$ = definition_ptr(new definition_data(std::move($2), std::move($5))); }
;
constructors
: constructors COMMA constructor { $$ = std::move($1); $$.push_back(std::move($3)); }
| constructor
{ $$ = std::vector<constructor_ptr>(); $$.push_back(std::move($1)); }
;
constructor
: UID uppercaseParams
{ $$ = constructor_ptr(new constructor(std::move($1), std::move($2))); }
;

159
code/compiler/07/runtime.c Normal file
View File

@@ -0,0 +1,159 @@
#include <stdint.h>
#include <assert.h>
#include <memory.h>
#include "runtime.h"
struct node_base* alloc_node() {
struct node_base* new_node = malloc(sizeof(struct node_app));
assert(new_node != NULL);
return new_node;
}
struct node_app* alloc_app(struct node_base* l, struct node_base* r) {
struct node_app* node = (struct node_app*) alloc_node();
node->base.tag = NODE_APP;
node->left = l;
node->right = r;
return node;
}
struct node_num* alloc_num(int32_t n) {
struct node_num* node = (struct node_num*) alloc_node();
node->base.tag = NODE_NUM;
node->value = n;
return node;
}
struct node_global* alloc_global(void (*f)(struct stack*), int32_t a) {
struct node_global* node = (struct node_global*) alloc_node();
node->base.tag = NODE_GLOBAL;
node->arity = a;
node->function = f;
return node;
}
struct node_ind* alloc_ind(struct node_base* n) {
struct node_ind* node = (struct node_ind*) alloc_node();
node->base.tag = NODE_IND;
node->next = n;
return node;
}
void stack_init(struct stack* s) {
s->size = 4;
s->count = 0;
s->data = malloc(sizeof(*s->data) * s->size);
assert(s->data != NULL);
}
void stack_free(struct stack* s) {
free(s->data);
}
void stack_push(struct stack* s, struct node_base* n) {
while(s->count >= s->size) {
s->data = realloc(s->data, sizeof(*s->data) * (s->size *= 2));
assert(s->data != NULL);
}
s->data[s->count++] = n;
}
struct node_base* stack_pop(struct stack* s) {
assert(s->count > 0);
return s->data[--s->count];
}
struct node_base* stack_peek(struct stack* s, size_t o) {
assert(s->count > o);
return s->data[s->count - o - 1];
}
void stack_popn(struct stack* s, size_t n) {
assert(s->count >= n);
s->count -= n;
}
void stack_slide(struct stack* s, size_t n) {
assert(s->count > n);
s->data[s->count - n - 1] = s->data[s->count - 1];
s->count -= n;
}
void stack_update(struct stack* s, size_t o) {
assert(s->count > o + 1);
struct node_ind* ind = (struct node_ind*) s->data[s->count - o - 2];
ind->base.tag = NODE_IND;
ind->next = s->data[s->count -= 1];
}
void stack_alloc(struct stack* s, size_t o) {
while(o--) {
stack_push(s, (struct node_base*) alloc_ind(NULL));
}
}
void stack_pack(struct stack* s, size_t n, int8_t t) {
assert(s->count >= n);
struct node_base** data = malloc(sizeof(*data) * n);
assert(data != NULL);
memcpy(data, &s->data[s->count - 1 - n], n * sizeof(*data));
struct node_data* new_node = (struct node_data*) alloc_node();
new_node->array = data;
new_node->base.tag = NODE_DATA;
new_node->tag = t;
stack_popn(s, n);
stack_push(s, (struct node_base*) new_node);
}
void stack_split(struct stack* s, size_t n) {
struct node_data* node = (struct node_data*) stack_pop(s);
for(size_t i = 0; i < n; i++) {
stack_push(s, node->array[i]);
}
}
void unwind(struct stack* s) {
while(1) {
struct node_base* peek = stack_peek(s, 0);
if(peek->tag == NODE_APP) {
struct node_app* n = (struct node_app*) peek;
stack_push(s, n->left);
} else if(peek->tag == NODE_GLOBAL) {
struct node_global* n = (struct node_global*) peek;
assert(s->count > n->arity);
for(size_t i = 1; i <= n->arity; i++) {
s->data[s->count - i]
= ((struct node_app*) s->data[s->count - i - 1])->right;
}
n->function(s);
} else if(peek->tag == NODE_IND) {
struct node_ind* n = (struct node_ind*) peek;
stack_pop(s);
stack_push(s, n->next);
} else {
break;
}
}
}
struct node_base* eval(struct node_base* n) {
struct stack program_stack;
stack_init(&program_stack);
stack_push(&program_stack, n);
unwind(&program_stack);
struct node_base* result = stack_pop(&program_stack);
stack_free(&program_stack);
return result;
}
extern void f_main(struct stack* s);
int main(int argc, char** argv) {
struct node_global* first_node = alloc_global(f_main, 0);
struct node_base* result = eval((struct node_base*) first_node);
}

View File

@@ -0,0 +1,70 @@
#pragma once
#include <stdlib.h>
struct stack;
enum node_tag {
NODE_APP,
NODE_NUM,
NODE_GLOBAL,
NODE_IND,
NODE_DATA
};
struct node_base {
enum node_tag tag;
};
struct node_app {
struct node_base base;
struct node_base* left;
struct node_base* right;
};
struct node_num {
struct node_base base;
int32_t value;
};
struct node_global {
struct node_base base;
int32_t arity;
void (*function)(struct stack*);
};
struct node_ind {
struct node_base base;
struct node_base* next;
};
struct node_data {
struct node_base base;
int8_t tag;
struct node_base** array;
};
struct node_base* alloc_node();
struct node_app* alloc_app(struct node_base* l, struct node_base* r);
struct node_num* alloc_num(int32_t n);
struct node_global* alloc_global(void (*f)(struct stack*), int32_t a);
struct node_ind* alloc_ind(struct node_base* n);
struct stack {
size_t size;
size_t count;
struct node_base** data;
};
void stack_init(struct stack* s);
void stack_free(struct stack* s);
void stack_push(struct stack* s, struct node_base* n);
struct node_base* stack_pop(struct stack* s);
struct node_base* stack_peek(struct stack* s, size_t o);
void stack_popn(struct stack* s, size_t n);
void stack_slide(struct stack* s, size_t n);
void stack_update(struct stack* s, size_t o);
void stack_alloc(struct stack* s, size_t o);
void stack_pack(struct stack* s, size_t n, int8_t t);
void stack_split(struct stack* s, size_t n);
struct node_base* eval(struct node_base* n);

View File

@@ -0,0 +1,34 @@
%option noyywrap
%{
#include <iostream>
#include "ast.hpp"
#include "parser.hpp"
#define YY_DECL yy::parser::symbol_type yylex()
%}
%%
[ \n]+ {}
\+ { return yy::parser::make_PLUS(); }
\* { return yy::parser::make_TIMES(); }
- { return yy::parser::make_MINUS(); }
\/ { return yy::parser::make_DIVIDE(); }
[0-9]+ { return yy::parser::make_INT(atoi(yytext)); }
defn { return yy::parser::make_DEFN(); }
data { return yy::parser::make_DATA(); }
case { return yy::parser::make_CASE(); }
of { return yy::parser::make_OF(); }
\{ { return yy::parser::make_OCURLY(); }
\} { return yy::parser::make_CCURLY(); }
\( { return yy::parser::make_OPAREN(); }
\) { return yy::parser::make_CPAREN(); }
, { return yy::parser::make_COMMA(); }
-> { return yy::parser::make_ARROW(); }
= { return yy::parser::make_EQUAL(); }
[a-z][a-zA-Z]* { return yy::parser::make_LID(std::string(yytext)); }
[A-Z][a-zA-Z]* { return yy::parser::make_UID(std::string(yytext)); }
%%

99
code/compiler/07/type.cpp Normal file
View File

@@ -0,0 +1,99 @@
#include "type.hpp"
#include <sstream>
#include <algorithm>
#include "error.hpp"
void type_var::print(const type_mgr& mgr, std::ostream& to) const {
auto it = mgr.types.find(name);
if(it != mgr.types.end()) {
it->second->print(mgr, to);
} else {
to << name;
}
}
void type_base::print(const type_mgr& mgr, std::ostream& to) const {
to << name;
}
void type_arr::print(const type_mgr& mgr, std::ostream& to) const {
left->print(mgr, to);
to << " -> (";
right->print(mgr, to);
to << ")";
}
std::string type_mgr::new_type_name() {
int temp = last_id++;
std::string str = "";
while(temp != -1) {
str += (char) ('a' + (temp % 26));
temp = temp / 26 - 1;
}
std::reverse(str.begin(), str.end());
return str;
}
type_ptr type_mgr::new_type() {
return type_ptr(new type_var(new_type_name()));
}
type_ptr type_mgr::new_arrow_type() {
return type_ptr(new type_arr(new_type(), new_type()));
}
type_ptr type_mgr::resolve(type_ptr t, type_var*& var) const {
type_var* cast;
var = nullptr;
while((cast = dynamic_cast<type_var*>(t.get()))) {
auto it = types.find(cast->name);
if(it == types.end()) {
var = cast;
break;
}
t = it->second;
}
return t;
}
void type_mgr::unify(type_ptr l, type_ptr r) {
type_var* lvar;
type_var* rvar;
type_arr* larr;
type_arr* rarr;
type_base* lid;
type_base* rid;
l = resolve(l, lvar);
r = resolve(r, rvar);
if(lvar) {
bind(lvar->name, r);
return;
} else if(rvar) {
bind(rvar->name, l);
return;
} else if((larr = dynamic_cast<type_arr*>(l.get())) &&
(rarr = dynamic_cast<type_arr*>(r.get()))) {
unify(larr->left, rarr->left);
unify(larr->right, rarr->right);
return;
} else if((lid = dynamic_cast<type_base*>(l.get())) &&
(rid = dynamic_cast<type_base*>(r.get()))) {
if(lid->name == rid->name) return;
}
throw unification_error(l, r);
}
void type_mgr::bind(const std::string& s, type_ptr t) {
type_var* other = dynamic_cast<type_var*>(t.get());
if(other && other->name == s) return;
types[s] = t;
}

65
code/compiler/07/type.hpp Normal file
View File

@@ -0,0 +1,65 @@
#pragma once
#include <memory>
#include <map>
struct type_mgr;
struct type {
virtual ~type() = default;
virtual void print(const type_mgr& mgr, std::ostream& to) const = 0;
};
using type_ptr = std::shared_ptr<type>;
struct type_var : public type {
std::string name;
type_var(std::string n)
: name(std::move(n)) {}
void print(const type_mgr& mgr, std::ostream& to) const;
};
struct type_base : public type {
std::string name;
type_base(std::string n)
: name(std::move(n)) {}
void print(const type_mgr& mgr, std::ostream& to) const;
};
struct type_data : public type_base {
struct constructor {
int tag;
};
std::map<std::string, constructor> constructors;
type_data(std::string n)
: type_base(std::move(n)) {}
};
struct type_arr : public type {
type_ptr left;
type_ptr right;
type_arr(type_ptr l, type_ptr r)
: left(std::move(l)), right(std::move(r)) {}
void print(const type_mgr& mgr, std::ostream& to) const;
};
struct type_mgr {
int last_id = 0;
std::map<std::string, type_ptr> types;
std::string new_type_name();
type_ptr new_type();
type_ptr new_arrow_type();
void unify(type_ptr l, type_ptr r);
type_ptr resolve(type_ptr t, type_var*& var) const;
void bind(const std::string& s, type_ptr t);
};

View File

@@ -0,0 +1,16 @@
#include "type_env.hpp"
type_ptr type_env::lookup(const std::string& name) const {
auto it = names.find(name);
if(it != names.end()) return it->second;
if(parent) return parent->lookup(name);
return nullptr;
}
void type_env::bind(const std::string& name, type_ptr t) {
names[name] = t;
}
type_env type_env::scope() const {
return type_env(this);
}

View File

@@ -0,0 +1,16 @@
#pragma once
#include <map>
#include "type.hpp"
struct type_env {
std::map<std::string, type_ptr> names;
type_env const* parent = nullptr;
type_env(type_env const* p)
: parent(p) {}
type_env() : type_env(nullptr) {}
type_ptr lookup(const std::string& name) const;
void bind(const std::string& name, type_ptr t);
type_env scope() const;
};

View File

@@ -0,0 +1,42 @@
cmake_minimum_required(VERSION 3.1)
project(compiler)
# Find all the required packages
find_package(BISON)
find_package(FLEX)
find_package(LLVM REQUIRED CONFIG)
# Set up the flex and bison targets
bison_target(parser
${CMAKE_CURRENT_SOURCE_DIR}/parser.y
${CMAKE_CURRENT_BINARY_DIR}/parser.cpp
COMPILE_FLAGS "-d")
flex_target(scanner
${CMAKE_CURRENT_SOURCE_DIR}/scanner.l
${CMAKE_CURRENT_BINARY_DIR}/scanner.cpp)
add_flex_bison_dependency(scanner parser)
# Find all the relevant LLVM components
llvm_map_components_to_libnames(LLVM_LIBS core x86asmparser x86codegen)
# Create compiler executable
add_executable(compiler
ast.cpp ast.hpp definition.cpp
llvm_context.cpp llvm_context.hpp
type_env.cpp type_env.hpp
env.cpp env.hpp
type.cpp type.hpp
error.cpp error.hpp
binop.cpp binop.hpp
instruction.cpp instruction.hpp
${BISON_parser_OUTPUTS}
${FLEX_scanner_OUTPUTS}
main.cpp
)
# Configure compiler executable
target_include_directories(compiler PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})
target_include_directories(compiler PUBLIC ${CMAKE_CURRENT_BINARY_DIR})
target_include_directories(compiler PUBLIC ${LLVM_INCLUDE_DIRS})
target_compile_definitions(compiler PUBLIC ${LLVM_DEFINITIONS})
target_link_libraries(compiler ${LLVM_LIBS})

264
code/compiler/08/ast.cpp Normal file
View File

@@ -0,0 +1,264 @@
#include "ast.hpp"
#include <ostream>
#include "binop.hpp"
#include "error.hpp"
static void print_indent(int n, std::ostream& to) {
while(n--) to << " ";
}
type_ptr ast::typecheck_common(type_mgr& mgr, const type_env& env) {
node_type = typecheck(mgr, env);
return node_type;
}
void ast::resolve_common(const type_mgr& mgr) {
type_var* var;
type_ptr resolved_type = mgr.resolve(node_type, var);
if(var) throw type_error("ambiguously typed program");
resolve(mgr);
node_type = std::move(resolved_type);
}
void ast_int::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "INT: " << value << std::endl;
}
type_ptr ast_int::typecheck(type_mgr& mgr, const type_env& env) const {
return type_ptr(new type_base("Int"));
}
void ast_int::resolve(const type_mgr& mgr) const {
}
void ast_int::compile(const env_ptr& env, std::vector<instruction_ptr>& into) const {
into.push_back(instruction_ptr(new instruction_pushint(value)));
}
void ast_lid::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "LID: " << id << std::endl;
}
type_ptr ast_lid::typecheck(type_mgr& mgr, const type_env& env) const {
return env.lookup(id);
}
void ast_lid::resolve(const type_mgr& mgr) const {
}
void ast_lid::compile(const env_ptr& env, std::vector<instruction_ptr>& into) const {
into.push_back(instruction_ptr(
env->has_variable(id) ?
(instruction*) new instruction_push(env->get_offset(id)) :
(instruction*) new instruction_pushglobal(id)));
}
void ast_uid::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "UID: " << id << std::endl;
}
type_ptr ast_uid::typecheck(type_mgr& mgr, const type_env& env) const {
return env.lookup(id);
}
void ast_uid::resolve(const type_mgr& mgr) const {
}
void ast_uid::compile(const env_ptr& env, std::vector<instruction_ptr>& into) const {
into.push_back(instruction_ptr(new instruction_pushglobal(id)));
}
void ast_binop::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "BINOP: " << op_name(op) << std::endl;
left->print(indent + 1, to);
right->print(indent + 1, to);
}
type_ptr ast_binop::typecheck(type_mgr& mgr, const type_env& env) const {
type_ptr ltype = left->typecheck_common(mgr, env);
type_ptr rtype = right->typecheck_common(mgr, env);
type_ptr ftype = env.lookup(op_name(op));
if(!ftype) throw type_error(std::string("unknown binary operator ") + op_name(op));
type_ptr return_type = mgr.new_type();
type_ptr arrow_one = type_ptr(new type_arr(rtype, return_type));
type_ptr arrow_two = type_ptr(new type_arr(ltype, arrow_one));
mgr.unify(arrow_two, ftype);
return return_type;
}
void ast_binop::resolve(const type_mgr& mgr) const {
left->resolve_common(mgr);
right->resolve_common(mgr);
}
void ast_binop::compile(const env_ptr& env, std::vector<instruction_ptr>& into) const {
right->compile(env, into);
left->compile(env_ptr(new env_offset(1, env)), into);
into.push_back(instruction_ptr(new instruction_pushglobal(op_action(op))));
into.push_back(instruction_ptr(new instruction_mkapp()));
into.push_back(instruction_ptr(new instruction_mkapp()));
}
void ast_app::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "APP:" << std::endl;
left->print(indent + 1, to);
right->print(indent + 1, to);
}
type_ptr ast_app::typecheck(type_mgr& mgr, const type_env& env) const {
type_ptr ltype = left->typecheck_common(mgr, env);
type_ptr rtype = right->typecheck_common(mgr, env);
type_ptr return_type = mgr.new_type();
type_ptr arrow = type_ptr(new type_arr(rtype, return_type));
mgr.unify(arrow, ltype);
return return_type;
}
void ast_app::resolve(const type_mgr& mgr) const {
left->resolve_common(mgr);
right->resolve_common(mgr);
}
void ast_app::compile(const env_ptr& env, std::vector<instruction_ptr>& into) const {
right->compile(env, into);
left->compile(env_ptr(new env_offset(1, env)), into);
into.push_back(instruction_ptr(new instruction_mkapp()));
}
void ast_case::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "CASE: " << std::endl;
for(auto& branch : branches) {
print_indent(indent + 1, to);
branch->pat->print(to);
to << std::endl;
branch->expr->print(indent + 2, to);
}
}
type_ptr ast_case::typecheck(type_mgr& mgr, const type_env& env) const {
type_var* var;
type_ptr case_type = mgr.resolve(of->typecheck_common(mgr, env), var);
type_ptr branch_type = mgr.new_type();
for(auto& branch : branches) {
type_env new_env = env.scope();
branch->pat->match(case_type, mgr, new_env);
type_ptr curr_branch_type = branch->expr->typecheck_common(mgr, new_env);
mgr.unify(branch_type, curr_branch_type);
}
case_type = mgr.resolve(case_type, var);
if(!dynamic_cast<type_data*>(case_type.get())) {
throw type_error("attempting case analysis of non-data type");
}
return branch_type;
}
void ast_case::resolve(const type_mgr& mgr) const {
of->resolve_common(mgr);
for(auto& branch : branches) {
branch->expr->resolve_common(mgr);
}
}
void ast_case::compile(const env_ptr& env, std::vector<instruction_ptr>& into) const {
type_data* type = dynamic_cast<type_data*>(of->node_type.get());
of->compile(env, into);
into.push_back(instruction_ptr(new instruction_eval()));
instruction_jump* jump_instruction = new instruction_jump();
into.push_back(instruction_ptr(jump_instruction));
for(auto& branch : branches) {
std::vector<instruction_ptr> branch_instructions;
pattern_var* vpat;
pattern_constr* cpat;
if((vpat = dynamic_cast<pattern_var*>(branch->pat.get()))) {
branch->expr->compile(env_ptr(new env_offset(1, env)), branch_instructions);
for(auto& constr_pair : type->constructors) {
if(jump_instruction->tag_mappings.find(constr_pair.second.tag) !=
jump_instruction->tag_mappings.end())
break;
jump_instruction->tag_mappings[constr_pair.second.tag] =
jump_instruction->branches.size();
}
jump_instruction->branches.push_back(std::move(branch_instructions));
} else if((cpat = dynamic_cast<pattern_constr*>(branch->pat.get()))) {
env_ptr new_env = env;
for(auto it = cpat->params.rbegin(); it != cpat->params.rend(); it++) {
new_env = env_ptr(new env_var(*it, new_env));
}
branch_instructions.push_back(instruction_ptr(new instruction_split(
cpat->params.size())));
branch->expr->compile(new_env, branch_instructions);
branch_instructions.push_back(instruction_ptr(new instruction_slide(
cpat->params.size())));
int new_tag = type->constructors[cpat->constr].tag;
if(jump_instruction->tag_mappings.find(new_tag) !=
jump_instruction->tag_mappings.end())
throw type_error("technically not a type error: duplicate pattern");
jump_instruction->tag_mappings[new_tag] =
jump_instruction->branches.size();
jump_instruction->branches.push_back(std::move(branch_instructions));
}
}
for(auto& constr_pair : type->constructors) {
if(jump_instruction->tag_mappings.find(constr_pair.second.tag) ==
jump_instruction->tag_mappings.end())
throw type_error("non-total pattern");
}
}
void pattern_var::print(std::ostream& to) const {
to << var;
}
void pattern_var::match(type_ptr t, type_mgr& mgr, type_env& env) const {
env.bind(var, t);
}
void pattern_constr::print(std::ostream& to) const {
to << constr;
for(auto& param : params) {
to << " " << param;
}
}
void pattern_constr::match(type_ptr t, type_mgr& mgr, type_env& env) const {
type_ptr constructor_type = env.lookup(constr);
if(!constructor_type) {
throw type_error(std::string("pattern using unknown constructor ") + constr);
}
for(int i = 0; i < params.size(); i++) {
type_arr* arr = dynamic_cast<type_arr*>(constructor_type.get());
if(!arr) throw type_error("too many parameters in constructor pattern");
env.bind(params[i], arr->left);
constructor_type = arr->right;
}
mgr.unify(t, constructor_type);
}

141
code/compiler/08/ast.hpp Normal file
View File

@@ -0,0 +1,141 @@
#pragma once
#include <memory>
#include <vector>
#include "type.hpp"
#include "type_env.hpp"
#include "binop.hpp"
#include "instruction.hpp"
#include "env.hpp"
struct ast {
type_ptr node_type;
virtual ~ast() = default;
virtual void print(int indent, std::ostream& to) const = 0;
virtual type_ptr typecheck(type_mgr& mgr, const type_env& env) const = 0;
virtual void resolve(const type_mgr& mgr) const = 0;
virtual void compile(const env_ptr& env,
std::vector<instruction_ptr>& into) const = 0;
type_ptr typecheck_common(type_mgr& mgr, const type_env& env);
void resolve_common(const type_mgr& mgr);
};
using ast_ptr = std::unique_ptr<ast>;
struct pattern {
virtual ~pattern() = default;
virtual void print(std::ostream& to) const = 0;
virtual void match(type_ptr t, type_mgr& mgr, type_env& env) const = 0;
};
using pattern_ptr = std::unique_ptr<pattern>;
struct branch {
pattern_ptr pat;
ast_ptr expr;
branch(pattern_ptr p, ast_ptr a)
: pat(std::move(p)), expr(std::move(a)) {}
};
using branch_ptr = std::unique_ptr<branch>;
struct ast_int : public ast {
int value;
explicit ast_int(int v)
: value(v) {}
void print(int indent, std::ostream& to) const;
type_ptr typecheck(type_mgr& mgr, const type_env& env) const;
void resolve(const type_mgr& mgr) const;
void compile(const env_ptr& env, std::vector<instruction_ptr>& into) const;
};
struct ast_lid : public ast {
std::string id;
explicit ast_lid(std::string i)
: id(std::move(i)) {}
void print(int indent, std::ostream& to) const;
type_ptr typecheck(type_mgr& mgr, const type_env& env) const;
void resolve(const type_mgr& mgr) const;
void compile(const env_ptr& env, std::vector<instruction_ptr>& into) const;
};
struct ast_uid : public ast {
std::string id;
explicit ast_uid(std::string i)
: id(std::move(i)) {}
void print(int indent, std::ostream& to) const;
type_ptr typecheck(type_mgr& mgr, const type_env& env) const;
void resolve(const type_mgr& mgr) const;
void compile(const env_ptr& env, std::vector<instruction_ptr>& into) const;
};
struct ast_binop : public ast {
binop op;
ast_ptr left;
ast_ptr right;
ast_binop(binop o, ast_ptr l, ast_ptr r)
: op(o), left(std::move(l)), right(std::move(r)) {}
void print(int indent, std::ostream& to) const;
type_ptr typecheck(type_mgr& mgr, const type_env& env) const;
void resolve(const type_mgr& mgr) const;
void compile(const env_ptr& env, std::vector<instruction_ptr>& into) const;
};
struct ast_app : public ast {
ast_ptr left;
ast_ptr right;
ast_app(ast_ptr l, ast_ptr r)
: left(std::move(l)), right(std::move(r)) {}
void print(int indent, std::ostream& to) const;
type_ptr typecheck(type_mgr& mgr, const type_env& env) const;
void resolve(const type_mgr& mgr) const;
void compile(const env_ptr& env, std::vector<instruction_ptr>& into) const;
};
struct ast_case : public ast {
ast_ptr of;
std::vector<branch_ptr> branches;
ast_case(ast_ptr o, std::vector<branch_ptr> b)
: of(std::move(o)), branches(std::move(b)) {}
void print(int indent, std::ostream& to) const;
type_ptr typecheck(type_mgr& mgr, const type_env& env) const;
void resolve(const type_mgr& mgr) const;
void compile(const env_ptr& env, std::vector<instruction_ptr>& into) const;
};
struct pattern_var : public pattern {
std::string var;
pattern_var(std::string v)
: var(std::move(v)) {}
void print(std::ostream &to) const;
void match(type_ptr t, type_mgr& mgr, type_env& env) const;
};
struct pattern_constr : public pattern {
std::string constr;
std::vector<std::string> params;
pattern_constr(std::string c, std::vector<std::string> p)
: constr(std::move(c)), params(std::move(p)) {}
void print(std::ostream &to) const;
void match(type_ptr t, type_mgr&, type_env& env) const;
};

View File

@@ -0,0 +1,21 @@
#include "binop.hpp"
std::string op_name(binop op) {
switch(op) {
case PLUS: return "+";
case MINUS: return "-";
case TIMES: return "*";
case DIVIDE: return "/";
}
return "??";
}
std::string op_action(binop op) {
switch(op) {
case PLUS: return "plus";
case MINUS: return "minus";
case TIMES: return "times";
case DIVIDE: return "divide";
}
return "??";
}

View File

@@ -0,0 +1,12 @@
#pragma once
#include <string>
enum binop {
PLUS,
MINUS,
TIMES,
DIVIDE
};
std::string op_name(binop op);
std::string op_action(binop op);

View File

@@ -0,0 +1,116 @@
#include "definition.hpp"
#include "error.hpp"
#include "ast.hpp"
#include "llvm_context.hpp"
#include <llvm/IR/DerivedTypes.h>
#include <llvm/IR/Function.h>
#include <llvm/IR/Type.h>
void definition_defn::typecheck_first(type_mgr& mgr, type_env& env) {
return_type = mgr.new_type();
type_ptr full_type = return_type;
for(auto it = params.rbegin(); it != params.rend(); it++) {
type_ptr param_type = mgr.new_type();
full_type = type_ptr(new type_arr(param_type, full_type));
param_types.push_back(param_type);
}
env.bind(name, full_type);
}
void definition_defn::typecheck_second(type_mgr& mgr, const type_env& env) const {
type_env new_env = env.scope();
auto param_it = params.begin();
auto type_it = param_types.rbegin();
while(param_it != params.end() && type_it != param_types.rend()) {
new_env.bind(*param_it, *type_it);
param_it++;
type_it++;
}
type_ptr body_type = body->typecheck_common(mgr, new_env);
mgr.unify(return_type, body_type);
}
void definition_defn::resolve(const type_mgr& mgr) {
type_var* var;
body->resolve_common(mgr);
return_type = mgr.resolve(return_type, var);
if(var) throw type_error("ambiguously typed program");
for(auto& param_type : param_types) {
param_type = mgr.resolve(param_type, var);
if(var) throw type_error("ambiguously typed program");
}
}
void definition_defn::compile() {
env_ptr new_env = env_ptr(new env_offset(0, nullptr));
for(auto it = params.rbegin(); it != params.rend(); it++) {
new_env = env_ptr(new env_var(*it, new_env));
}
body->compile(new_env, instructions);
instructions.push_back(instruction_ptr(new instruction_update(params.size())));
instructions.push_back(instruction_ptr(new instruction_pop(params.size())));
}
void definition_defn::gen_llvm_first(llvm_context& ctx) {
generated_function = ctx.create_custom_function(name, params.size());
}
void definition_defn::gen_llvm_second(llvm_context& ctx) {
ctx.builder.SetInsertPoint(&generated_function->getEntryBlock());
for(auto& instruction : instructions) {
instruction->gen_llvm(ctx, generated_function);
}
ctx.builder.CreateRetVoid();
}
void definition_data::typecheck_first(type_mgr& mgr, type_env& env) {
type_data* this_type = new type_data(name);
type_ptr return_type = type_ptr(this_type);
int next_tag = 0;
for(auto& constructor : constructors) {
constructor->tag = next_tag;
this_type->constructors[constructor->name] = { next_tag++ };
type_ptr full_type = return_type;
for(auto it = constructor->types.rbegin(); it != constructor->types.rend(); it++) {
type_ptr type = type_ptr(new type_base(*it));
full_type = type_ptr(new type_arr(type, full_type));
}
env.bind(constructor->name, full_type);
}
}
void definition_data::typecheck_second(type_mgr& mgr, const type_env& env) const {
// Nothing
}
void definition_data::resolve(const type_mgr& mgr) {
// Nothing
}
void definition_data::compile() {
}
void definition_data::gen_llvm_first(llvm_context& ctx) {
for(auto& constructor : constructors) {
auto new_function =
ctx.create_custom_function(constructor->name, constructor->types.size());
ctx.builder.SetInsertPoint(&new_function->getEntryBlock());
ctx.create_pack(new_function,
ctx.create_size(constructor->types.size()),
ctx.create_i8(constructor->tag)
);
ctx.builder.CreateRetVoid();
}
}
void definition_data::gen_llvm_second(llvm_context& ctx) {
// Nothing
}

View File

@@ -0,0 +1,73 @@
#pragma once
#include <memory>
#include <vector>
#include "instruction.hpp"
#include "llvm_context.hpp"
#include "type_env.hpp"
struct ast;
using ast_ptr = std::unique_ptr<ast>;
struct definition {
virtual ~definition() = default;
virtual void typecheck_first(type_mgr& mgr, type_env& env) = 0;
virtual void typecheck_second(type_mgr& mgr, const type_env& env) const = 0;
virtual void resolve(const type_mgr& mgr) = 0;
virtual void compile() = 0;
virtual void gen_llvm_first(llvm_context& ctx) = 0;
virtual void gen_llvm_second(llvm_context& ctx) = 0;
};
using definition_ptr = std::unique_ptr<definition>;
struct constructor {
std::string name;
std::vector<std::string> types;
int8_t tag;
constructor(std::string n, std::vector<std::string> ts)
: name(std::move(n)), types(std::move(ts)) {}
};
using constructor_ptr = std::unique_ptr<constructor>;
struct definition_defn : public definition {
std::string name;
std::vector<std::string> params;
ast_ptr body;
type_ptr return_type;
std::vector<type_ptr> param_types;
std::vector<instruction_ptr> instructions;
llvm::Function* generated_function;
definition_defn(std::string n, std::vector<std::string> p, ast_ptr b)
: name(std::move(n)), params(std::move(p)), body(std::move(b)) {
}
void typecheck_first(type_mgr& mgr, type_env& env);
void typecheck_second(type_mgr& mgr, const type_env& env) const;
void resolve(const type_mgr& mgr);
void compile();
void gen_llvm_first(llvm_context& ctx);
void gen_llvm_second(llvm_context& ctx);
};
struct definition_data : public definition {
std::string name;
std::vector<constructor_ptr> constructors;
definition_data(std::string n, std::vector<constructor_ptr> cs)
: name(std::move(n)), constructors(std::move(cs)) {}
void typecheck_first(type_mgr& mgr, type_env& env);
void typecheck_second(type_mgr& mgr, const type_env& env) const;
void resolve(const type_mgr& mgr);
void compile();
void gen_llvm_first(llvm_context& ctx);
void gen_llvm_second(llvm_context& ctx);
};

23
code/compiler/08/env.cpp Normal file
View File

@@ -0,0 +1,23 @@
#include "env.hpp"
int env_var::get_offset(const std::string& name) const {
if(name == this->name) return 0;
if(parent) return parent->get_offset(name) + 1;
throw 0;
}
bool env_var::has_variable(const std::string& name) const {
if(name == this->name) return true;
if(parent) return parent->has_variable(name);
return false;
}
int env_offset::get_offset(const std::string& name) const {
if(parent) return parent->get_offset(name) + offset;
throw 0;
}
bool env_offset::has_variable(const std::string& name) const {
if(parent) return parent->has_variable(name);
return false;
}

34
code/compiler/08/env.hpp Normal file
View File

@@ -0,0 +1,34 @@
#pragma once
#include <memory>
#include <string>
struct env {
virtual ~env() = default;
virtual int get_offset(const std::string& name) const = 0;
virtual bool has_variable(const std::string& name) const = 0;
};
using env_ptr = std::shared_ptr<env>;
struct env_var : public env {
std::string name;
env_ptr parent;
env_var(std::string& n, env_ptr p)
: name(std::move(n)), parent(std::move(p)) {}
int get_offset(const std::string& name) const;
bool has_variable(const std::string& name) const;
};
struct env_offset : public env {
int offset;
env_ptr parent;
env_offset(int o, env_ptr p)
: offset(o), parent(std::move(p)) {}
int get_offset(const std::string& name) const;
bool has_variable(const std::string& name) const;
};

View File

@@ -0,0 +1,5 @@
#include "error.hpp"
const char* type_error::what() const noexcept {
return "an error occured while checking the types of the program";
}

View File

@@ -0,0 +1,21 @@
#pragma once
#include <exception>
#include "type.hpp"
struct type_error : std::exception {
std::string description;
type_error(std::string d)
: description(std::move(d)) {}
const char* what() const noexcept override;
};
struct unification_error : public type_error {
type_ptr left;
type_ptr right;
unification_error(type_ptr l, type_ptr r)
: left(std::move(l)), right(std::move(r)),
type_error("failed to unify types") {}
};

View File

@@ -0,0 +1,2 @@
data Bool = { True, False }
defn main = { 3 + True }

View File

@@ -0,0 +1 @@
defn main = { 1 2 3 4 5 }

View File

@@ -0,0 +1,8 @@
data List = { Nil, Cons Int List }
defn head l = {
case l of {
Nil -> { 0 }
Cons x y z -> { x }
}
}

View File

@@ -0,0 +1,31 @@
#include "../runtime.h"
void f_add(struct stack* s) {
struct node_num* left = (struct node_num*) eval(stack_peek(s, 0));
struct node_num* right = (struct node_num*) eval(stack_peek(s, 1));
stack_push(s, (struct node_base*) alloc_num(left->value + right->value));
}
void f_main(struct stack* s) {
// PushInt 320
stack_push(s, (struct node_base*) alloc_num(320));
// PushInt 6
stack_push(s, (struct node_base*) alloc_num(6));
// PushGlobal f_add (the function for +)
stack_push(s, (struct node_base*) alloc_global(f_add, 2));
struct node_base* left;
struct node_base* right;
// MkApp
left = stack_pop(s);
right = stack_pop(s);
stack_push(s, (struct node_base*) alloc_app(left, right));
// MkApp
left = stack_pop(s);
right = stack_pop(s);
stack_push(s, (struct node_base*) alloc_app(left, right));
}

View File

@@ -0,0 +1,3 @@
defn main = { sum 320 6 }
defn sum x y = { x + y }

View File

@@ -0,0 +1,3 @@
defn add x y = { x + y }
defn double x = { add x x }
defn main = { double 163 }

View File

@@ -0,0 +1,8 @@
data List = { Nil, Cons Int List }
defn length l = {
case l of {
Nil -> { 0 }
Cons x xs -> { 1 + length xs }
}
}
defn main = { length (Cons 1 (Cons 2 (Cons 3 Nil))) }

View File

@@ -0,0 +1,16 @@
data List = { Nil, Cons Int List }
defn add x y = { x + y }
defn mul x y = { x * y }
defn foldr f b l = {
case l of {
Nil -> { b }
Cons x xs -> { f x (foldr f b xs) }
}
}
defn main = {
foldr add 0 (Cons 1 (Cons 2 (Cons 3 (Cons 4 Nil)))) +
foldr mul 1 (Cons 1 (Cons 2 (Cons 3 (Cons 4 Nil))))
}

View File

@@ -0,0 +1,17 @@
data List = { Nil, Cons Int List }
defn sumZip l m = {
case l of {
Nil -> { 0 }
Cons x xs -> {
case m of {
Nil -> { 0 }
Cons y ys -> { x + y + sumZip xs ys }
}
}
}
}
defn ones = { Cons 1 ones }
defn main = { sumZip ones (Cons 1 (Cons 2 (Cons 3 Nil))) }

View File

@@ -0,0 +1,177 @@
#include "instruction.hpp"
#include "llvm_context.hpp"
#include <llvm/IR/BasicBlock.h>
#include <llvm/IR/Function.h>
using namespace llvm;
static void print_indent(int n, std::ostream& to) {
while(n--) to << " ";
}
void instruction_pushint::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "PushInt(" << value << ")" << std::endl;
}
void instruction_pushint::gen_llvm(llvm_context& ctx, Function* f) const {
ctx.create_push(f, ctx.create_num(ctx.create_i32(value)));
}
void instruction_pushglobal::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "PushGlobal(" << name << ")" << std::endl;
}
void instruction_pushglobal::gen_llvm(llvm_context& ctx, Function* f) const {
auto& global_f = ctx.custom_functions.at("f_" + name);
auto arity = ctx.create_i32(global_f->arity);
ctx.create_push(f, ctx.create_global(global_f->function, arity));
}
void instruction_push::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Push(" << offset << ")" << std::endl;
}
void instruction_push::gen_llvm(llvm_context& ctx, Function* f) const {
ctx.create_push(f, ctx.create_peek(f, ctx.create_size(offset)));
}
void instruction_pop::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Pop(" << count << ")" << std::endl;
}
void instruction_pop::gen_llvm(llvm_context& ctx, Function* f) const {
ctx.create_popn(f, ctx.create_size(count));
}
void instruction_mkapp::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "MkApp()" << std::endl;
}
void instruction_mkapp::gen_llvm(llvm_context& ctx, Function* f) const {
auto left = ctx.create_pop(f);
auto right = ctx.create_pop(f);
ctx.create_push(f, ctx.create_app(left, right));
}
void instruction_update::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Update(" << offset << ")" << std::endl;
}
void instruction_update::gen_llvm(llvm_context& ctx, Function* f) const {
ctx.create_update(f, ctx.create_size(offset));
}
void instruction_pack::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Pack(" << tag << ", " << size << ")" << std::endl;
}
void instruction_pack::gen_llvm(llvm_context& ctx, Function* f) const {
ctx.create_pack(f, ctx.create_size(size), ctx.create_i8(tag));
}
void instruction_split::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Split()" << std::endl;
}
void instruction_split::gen_llvm(llvm_context& ctx, Function* f) const {
ctx.create_split(f, ctx.create_size(size));
}
void instruction_jump::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Jump(" << std::endl;
for(auto& instruction_set : branches) {
for(auto& instruction : instruction_set) {
instruction->print(indent + 2, to);
}
to << std::endl;
}
print_indent(indent, to);
to << ")" << std::endl;
}
void instruction_jump::gen_llvm(llvm_context& ctx, Function* f) const {
auto top_node = ctx.create_peek(f, ctx.create_size(0));
auto tag = ctx.unwrap_data_tag(top_node);
auto safety_block = BasicBlock::Create(ctx.ctx, "safety", f);
auto switch_op = ctx.builder.CreateSwitch(tag, safety_block, tag_mappings.size());
std::vector<BasicBlock*> blocks;
for(auto& branch : branches) {
auto branch_block = BasicBlock::Create(ctx.ctx, "branch", f);
ctx.builder.SetInsertPoint(branch_block);
for(auto& instruction : branch) {
instruction->gen_llvm(ctx, f);
}
ctx.builder.CreateBr(safety_block);
blocks.push_back(branch_block);
}
for(auto& mapping : tag_mappings) {
switch_op->addCase(ctx.create_i8(mapping.first), blocks[mapping.second]);
}
ctx.builder.SetInsertPoint(safety_block);
}
void instruction_slide::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Slide(" << offset << ")" << std::endl;
}
void instruction_slide::gen_llvm(llvm_context& ctx, Function* f) const {
ctx.create_slide(f, ctx.create_size(offset));
}
void instruction_binop::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "BinOp(" << op_action(op) << ")" << std::endl;
}
void instruction_binop::gen_llvm(llvm_context& ctx, Function* f) const {
auto left_int = ctx.unwrap_num(ctx.create_pop(f));
auto right_int = ctx.unwrap_num(ctx.create_pop(f));
llvm::Value* result;
switch(op) {
case PLUS: result = ctx.builder.CreateAdd(left_int, right_int); break;
case MINUS: result = ctx.builder.CreateSub(left_int, right_int); break;
case TIMES: result = ctx.builder.CreateMul(left_int, right_int); break;
case DIVIDE: result = ctx.builder.CreateSDiv(left_int, right_int); break;
}
ctx.create_push(f, ctx.create_num(result));
}
void instruction_eval::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Eval()" << std::endl;
}
void instruction_eval::gen_llvm(llvm_context& ctx, Function* f) const {
ctx.create_push(f, ctx.create_eval(ctx.create_pop(f)));
}
void instruction_alloc::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Alloc(" << amount << ")" << std::endl;
}
void instruction_alloc::gen_llvm(llvm_context& ctx, Function* f) const {
ctx.create_alloc(f, ctx.create_size(amount));
}
void instruction_unwind::print(int indent, std::ostream& to) const {
print_indent(indent, to);
to << "Unwind()" << std::endl;
}
void instruction_unwind::gen_llvm(llvm_context& ctx, Function* f) const {
// Nothing
}

View File

@@ -0,0 +1,142 @@
#pragma once
#include <llvm/IR/Function.h>
#include <string>
#include <memory>
#include <vector>
#include <map>
#include <ostream>
#include "binop.hpp"
#include "llvm_context.hpp"
struct instruction {
virtual ~instruction() = default;
virtual void print(int indent, std::ostream& to) const = 0;
virtual void gen_llvm(llvm_context& ctx, llvm::Function* f) const = 0;
};
using instruction_ptr = std::unique_ptr<instruction>;
struct instruction_pushint : public instruction {
int value;
instruction_pushint(int v)
: value(v) {}
void print(int indent, std::ostream& to) const;
void gen_llvm(llvm_context& ctx, llvm::Function* f) const;
};
struct instruction_pushglobal : public instruction {
std::string name;
instruction_pushglobal(std::string n)
: name(std::move(n)) {}
void print(int indent, std::ostream& to) const;
void gen_llvm(llvm_context& ctx, llvm::Function* f) const;
};
struct instruction_push : public instruction {
int offset;
instruction_push(int o)
: offset(o) {}
void print(int indent, std::ostream& to) const;
void gen_llvm(llvm_context& ctx, llvm::Function* f) const;
};
struct instruction_pop : public instruction {
int count;
instruction_pop(int c)
: count(c) {}
void print(int indent, std::ostream& to) const;
void gen_llvm(llvm_context& ctx, llvm::Function* f) const;
};
struct instruction_mkapp : public instruction {
void print(int indent, std::ostream& to) const;
void gen_llvm(llvm_context& ctx, llvm::Function* f) const;
};
struct instruction_update : public instruction {
int offset;
instruction_update(int o)
: offset(o) {}
void print(int indent, std::ostream& to) const;
void gen_llvm(llvm_context& ctx, llvm::Function* f) const;
};
struct instruction_pack : public instruction {
int tag;
int size;
instruction_pack(int t, int s)
: tag(t), size(s) {}
void print(int indent, std::ostream& to) const;
void gen_llvm(llvm_context& ctx, llvm::Function* f) const;
};
struct instruction_split : public instruction {
int size;
instruction_split(int s)
: size(s) {}
void print(int indent, std::ostream& to) const;
void gen_llvm(llvm_context& ctx, llvm::Function* f) const;
};
struct instruction_jump : public instruction {
std::vector<std::vector<instruction_ptr>> branches;
std::map<int, int> tag_mappings;
void print(int indent, std::ostream& to) const;
void gen_llvm(llvm_context& ctx, llvm::Function* f) const;
};
struct instruction_slide : public instruction {
int offset;
instruction_slide(int o)
: offset(o) {}
void print(int indent, std::ostream& to) const;
void gen_llvm(llvm_context& ctx, llvm::Function* f) const;
};
struct instruction_binop : public instruction {
binop op;
instruction_binop(binop o)
: op(o) {}
void print(int indent, std::ostream& to) const;
void gen_llvm(llvm_context& ctx, llvm::Function* f) const;
};
struct instruction_eval : public instruction {
void print(int indent, std::ostream& to) const;
void gen_llvm(llvm_context& ctx, llvm::Function* f) const;
};
struct instruction_alloc : public instruction {
int amount;
instruction_alloc(int a)
: amount(a) {}
void print(int indent, std::ostream& to) const;
void gen_llvm(llvm_context& ctx, llvm::Function* f) const;
};
struct instruction_unwind : public instruction {
void print(int indent, std::ostream& to) const;
void gen_llvm(llvm_context& ctx, llvm::Function* f) const;
};

View File

@@ -0,0 +1,252 @@
#include "llvm_context.hpp"
#include <llvm/IR/DerivedTypes.h>
using namespace llvm;
void llvm_context::create_types() {
stack_type = StructType::create(ctx, "stack");
stack_ptr_type = PointerType::getUnqual(stack_type);
tag_type = IntegerType::getInt8Ty(ctx);
struct_types["node_base"] = StructType::create(ctx, "node_base");
struct_types["node_app"] = StructType::create(ctx, "node_app");
struct_types["node_num"] = StructType::create(ctx, "node_num");
struct_types["node_global"] = StructType::create(ctx, "node_global");
struct_types["node_ind"] = StructType::create(ctx, "node_ind");
struct_types["node_data"] = StructType::create(ctx, "node_data");
node_ptr_type = PointerType::getUnqual(struct_types.at("node_base"));
function_type = FunctionType::get(Type::getVoidTy(ctx), { stack_ptr_type }, false);
struct_types.at("node_base")->setBody(
IntegerType::getInt32Ty(ctx)
);
struct_types.at("node_app")->setBody(
struct_types.at("node_base"),
node_ptr_type,
node_ptr_type
);
struct_types.at("node_num")->setBody(
struct_types.at("node_base"),
IntegerType::getInt32Ty(ctx)
);
struct_types.at("node_global")->setBody(
struct_types.at("node_base"),
FunctionType::get(Type::getVoidTy(ctx), { stack_ptr_type }, false)
);
struct_types.at("node_ind")->setBody(
struct_types.at("node_base"),
node_ptr_type
);
struct_types.at("node_data")->setBody(
struct_types.at("node_base"),
IntegerType::getInt8Ty(ctx),
PointerType::getUnqual(node_ptr_type)
);
}
void llvm_context::create_functions() {
auto void_type = Type::getVoidTy(ctx);
auto sizet_type = IntegerType::get(ctx, sizeof(size_t) * 8);
functions["stack_init"] = Function::Create(
FunctionType::get(void_type, { stack_ptr_type }, false),
Function::LinkageTypes::ExternalLinkage,
"stack_init",
&module
);
functions["stack_free"] = Function::Create(
FunctionType::get(void_type, { stack_ptr_type }, false),
Function::LinkageTypes::ExternalLinkage,
"stack_free",
&module
);
functions["stack_push"] = Function::Create(
FunctionType::get(void_type, { stack_ptr_type, node_ptr_type }, false),
Function::LinkageTypes::ExternalLinkage,
"stack_push",
&module
);
functions["stack_pop"] = Function::Create(
FunctionType::get(node_ptr_type, { stack_ptr_type }, false),
Function::LinkageTypes::ExternalLinkage,
"stack_pop",
&module
);
functions["stack_peek"] = Function::Create(
FunctionType::get(node_ptr_type, { stack_ptr_type, sizet_type }, false),
Function::LinkageTypes::ExternalLinkage,
"stack_peek",
&module
);
functions["stack_popn"] = Function::Create(
FunctionType::get(void_type, { stack_ptr_type, sizet_type }, false),
Function::LinkageTypes::ExternalLinkage,
"stack_popn",
&module
);
functions["stack_slide"] = Function::Create(
FunctionType::get(void_type, { stack_ptr_type, sizet_type }, false),
Function::LinkageTypes::ExternalLinkage,
"stack_slide",
&module
);
functions["stack_update"] = Function::Create(
FunctionType::get(void_type, { stack_ptr_type, sizet_type }, false),
Function::LinkageTypes::ExternalLinkage,
"stack_update",
&module
);
functions["stack_alloc"] = Function::Create(
FunctionType::get(void_type, { stack_ptr_type, sizet_type }, false),
Function::LinkageTypes::ExternalLinkage,
"stack_alloc",
&module
);
functions["stack_pack"] = Function::Create(
FunctionType::get(void_type, { stack_ptr_type, sizet_type, tag_type }, false),
Function::LinkageTypes::ExternalLinkage,
"stack_pack",
&module
);
functions["stack_split"] = Function::Create(
FunctionType::get(node_ptr_type, { stack_ptr_type, sizet_type }, false),
Function::LinkageTypes::ExternalLinkage,
"stack_split",
&module
);
auto int32_type = IntegerType::getInt32Ty(ctx);
functions["alloc_app"] = Function::Create(
FunctionType::get(node_ptr_type, { node_ptr_type, node_ptr_type }, false),
Function::LinkageTypes::ExternalLinkage,
"alloc_app",
&module
);
functions["alloc_num"] = Function::Create(
FunctionType::get(node_ptr_type, { int32_type }, false),
Function::LinkageTypes::ExternalLinkage,
"alloc_num",
&module
);
functions["alloc_global"] = Function::Create(
FunctionType::get(node_ptr_type, { function_type, int32_type }, false),
Function::LinkageTypes::ExternalLinkage,
"alloc_global",
&module
);
functions["alloc_ind"] = Function::Create(
FunctionType::get(node_ptr_type, { node_ptr_type }, false),
Function::LinkageTypes::ExternalLinkage,
"alloc_ind",
&module
);
functions["eval"] = Function::Create(
FunctionType::get(node_ptr_type, { node_ptr_type }, false),
Function::LinkageTypes::ExternalLinkage,
"eval",
&module
);
}
ConstantInt* llvm_context::create_i8(int8_t i) {
return ConstantInt::get(ctx, APInt(8, i));
}
ConstantInt* llvm_context::create_i32(int32_t i) {
return ConstantInt::get(ctx, APInt(32, i));
}
ConstantInt* llvm_context::create_size(size_t i) {
return ConstantInt::get(ctx, APInt(sizeof(size_t) * 8, i));
}
Value* llvm_context::create_pop(Function* f) {
auto pop_f = functions.at("stack_pop");
return builder.CreateCall(pop_f, { f->arg_begin() });
}
Value* llvm_context::create_peek(Function* f, Value* off) {
auto peek_f = functions.at("stack_peek");
return builder.CreateCall(peek_f, { f->arg_begin(), off });
}
void llvm_context::create_push(Function* f, Value* v) {
auto push_f = functions.at("stack_push");
builder.CreateCall(push_f, { f->arg_begin(), v });
}
void llvm_context::create_popn(Function* f, Value* off) {
auto popn_f = functions.at("stack_popn");
builder.CreateCall(popn_f, { f->arg_begin(), off });
}
void llvm_context::create_update(Function* f, Value* off) {
auto update_f = functions.at("stack_update");
builder.CreateCall(update_f, { f->arg_begin(), off });
}
void llvm_context::create_pack(Function* f, Value* c, Value* t) {
auto pack_f = functions.at("stack_pack");
builder.CreateCall(pack_f, { f->arg_begin(), c, t });
}
void llvm_context::create_split(Function* f, Value* c) {
auto split_f = functions.at("stack_split");
builder.CreateCall(split_f, { f->arg_begin(), c });
}
void llvm_context::create_slide(Function* f, Value* off) {
auto slide_f = functions.at("stack_slide");
builder.CreateCall(slide_f, { f->arg_begin(), off });
}
void llvm_context::create_alloc(Function* f, Value* n) {
auto alloc_f = functions.at("stack_alloc");
builder.CreateCall(alloc_f, { f->arg_begin(), n });
}
Value* llvm_context::create_eval(Value* e) {
auto eval_f = functions.at("eval");
return builder.CreateCall(eval_f, { e });
}
Value* llvm_context::unwrap_num(Value* v) {
auto num_ptr_type = PointerType::getUnqual(struct_types.at("node_num"));
auto cast = builder.CreatePointerCast(v, num_ptr_type);
auto offset_0 = create_i32(0);
auto offset_1 = create_i32(1);
auto int_ptr = builder.CreateGEP(cast, { offset_0, offset_1 });
return builder.CreateLoad(int_ptr);
}
Value* llvm_context::create_num(Value* v) {
auto alloc_num_f = functions.at("alloc_num");
return builder.CreateCall(alloc_num_f, { v });
}
Value* llvm_context::unwrap_data_tag(Value* v) {
auto data_ptr_type = PointerType::getUnqual(struct_types.at("node_data"));
auto cast = builder.CreatePointerCast(v, data_ptr_type);
auto offset_0 = create_i32(0);
auto offset_1 = create_i32(1);
auto tag_ptr = builder.CreateGEP(cast, { offset_0, offset_1 });
return builder.CreateLoad(tag_ptr);
}
Value* llvm_context::create_global(Value* f, Value* a) {
auto alloc_global_f = functions.at("alloc_global");
return builder.CreateCall(alloc_global_f, { f, a });
}
Value* llvm_context::create_app(Value* l, Value* r) {
auto alloc_app_f = functions.at("alloc_app");
return builder.CreateCall(alloc_app_f, { l, r });
}
llvm::Function* llvm_context::create_custom_function(std::string name, int32_t arity) {
auto void_type = llvm::Type::getVoidTy(ctx);
auto function_type =
llvm::FunctionType::get(void_type, { stack_ptr_type }, false);
auto new_function = llvm::Function::Create(
function_type,
llvm::Function::LinkageTypes::ExternalLinkage,
"f_" + name,
&module
);
auto start_block = llvm::BasicBlock::Create(ctx, "entry", new_function);
auto new_custom_f = custom_function_ptr(new custom_function());
new_custom_f->arity = arity;
new_custom_f->function = new_function;
custom_functions["f_" + name] = std::move(new_custom_f);
return new_function;
}

View File

@@ -0,0 +1,66 @@
#pragma once
#include <llvm/IR/DerivedTypes.h>
#include <llvm/IR/Function.h>
#include <llvm/IR/LLVMContext.h>
#include <llvm/IR/IRBuilder.h>
#include <llvm/IR/Module.h>
#include <map>
struct llvm_context {
struct custom_function {
llvm::Function* function;
int32_t arity;
};
using custom_function_ptr = std::unique_ptr<custom_function>;
llvm::LLVMContext ctx;
llvm::IRBuilder<> builder;
llvm::Module module;
std::map<std::string, custom_function_ptr> custom_functions;
std::map<std::string, llvm::Function*> functions;
std::map<std::string, llvm::StructType*> struct_types;
llvm::StructType* stack_type;
llvm::PointerType* stack_ptr_type;
llvm::PointerType* node_ptr_type;
llvm::IntegerType* tag_type;
llvm::FunctionType* function_type;
llvm_context()
: builder(ctx), module("bloglang", ctx) {
create_types();
create_functions();
}
void create_types();
void create_functions();
llvm::ConstantInt* create_i8(int8_t);
llvm::ConstantInt* create_i32(int32_t);
llvm::ConstantInt* create_size(size_t);
llvm::Value* create_pop(llvm::Function*);
llvm::Value* create_peek(llvm::Function*, llvm::Value*);
void create_push(llvm::Function*, llvm::Value*);
void create_popn(llvm::Function*, llvm::Value*);
void create_update(llvm::Function*, llvm::Value*);
void create_pack(llvm::Function*, llvm::Value*, llvm::Value*);
void create_split(llvm::Function*, llvm::Value*);
void create_slide(llvm::Function*, llvm::Value*);
void create_alloc(llvm::Function*, llvm::Value*);
llvm::Value* create_eval(llvm::Value*);
llvm::Value* unwrap_num(llvm::Value*);
llvm::Value* create_num(llvm::Value*);
llvm::Value* unwrap_data_tag(llvm::Value*);
llvm::Value* create_global(llvm::Value*, llvm::Value*);
llvm::Value* create_app(llvm::Value*, llvm::Value*);
llvm::Function* create_custom_function(std::string name, int32_t arity);
};

174
code/compiler/08/main.cpp Normal file
View File

@@ -0,0 +1,174 @@
#include "ast.hpp"
#include <iostream>
#include "binop.hpp"
#include "definition.hpp"
#include "instruction.hpp"
#include "llvm_context.hpp"
#include "parser.hpp"
#include "error.hpp"
#include "type.hpp"
#include "llvm/IR/LegacyPassManager.h"
#include "llvm/IR/Verifier.h"
#include "llvm/Support/TargetSelect.h"
#include "llvm/Support/TargetRegistry.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/Support/FileSystem.h"
#include "llvm/Target/TargetOptions.h"
#include "llvm/Target/TargetMachine.h"
void yy::parser::error(const std::string& msg) {
std::cout << "An error occured: " << msg << std::endl;
}
extern std::vector<definition_ptr> program;
void typecheck_program(
const std::vector<definition_ptr>& prog,
type_mgr& mgr, type_env& env) {
type_ptr int_type = type_ptr(new type_base("Int"));
type_ptr binop_type = type_ptr(new type_arr(
int_type,
type_ptr(new type_arr(int_type, int_type))));
env.bind("+", binop_type);
env.bind("-", binop_type);
env.bind("*", binop_type);
env.bind("/", binop_type);
for(auto& def : prog) {
def->typecheck_first(mgr, env);
}
for(auto& def : prog) {
def->typecheck_second(mgr, env);
}
for(auto& pair : env.names) {
std::cout << pair.first << ": ";
pair.second->print(mgr, std::cout);
std::cout << std::endl;
}
for(auto& def : prog) {
def->resolve(mgr);
}
}
void compile_program(const std::vector<definition_ptr>& prog) {
for(auto& def : prog) {
def->compile();
definition_defn* defn = dynamic_cast<definition_defn*>(def.get());
if(!defn) continue;
for(auto& instruction : defn->instructions) {
instruction->print(0, std::cout);
}
std::cout << std::endl;
}
}
void gen_llvm_internal_op(llvm_context& ctx, binop op) {
auto new_function = ctx.create_custom_function(op_action(op), 2);
std::vector<instruction_ptr> instructions;
instructions.push_back(instruction_ptr(new instruction_push(1)));
instructions.push_back(instruction_ptr(new instruction_eval()));
instructions.push_back(instruction_ptr(new instruction_push(1)));
instructions.push_back(instruction_ptr(new instruction_eval()));
instructions.push_back(instruction_ptr(new instruction_binop(op)));
ctx.builder.SetInsertPoint(&new_function->getEntryBlock());
for(auto& instruction : instructions) {
instruction->gen_llvm(ctx, new_function);
}
ctx.builder.CreateRetVoid();
}
void output_llvm(llvm_context& ctx, const std::string& filename) {
std::string targetTriple = llvm::sys::getDefaultTargetTriple();
llvm::InitializeNativeTarget();
llvm::InitializeNativeTargetAsmParser();
llvm::InitializeNativeTargetAsmPrinter();
std::string error;
const llvm::Target* target =
llvm::TargetRegistry::lookupTarget(targetTriple, error);
if (!target) {
std::cerr << error << std::endl;
} else {
std::string cpu = "generic";
std::string features = "";
llvm::TargetOptions options;
llvm::TargetMachine* targetMachine =
target->createTargetMachine(targetTriple, cpu, features,
options, llvm::Optional<llvm::Reloc::Model>());
ctx.module.setDataLayout(targetMachine->createDataLayout());
ctx.module.setTargetTriple(targetTriple);
std::error_code ec;
llvm::raw_fd_ostream file(filename, ec, llvm::sys::fs::F_None);
if (ec) {
throw 0;
} else {
llvm::TargetMachine::CodeGenFileType type = llvm::TargetMachine::CGFT_ObjectFile;
llvm::legacy::PassManager pm;
if (targetMachine->addPassesToEmitFile(pm, file, NULL, type)) {
throw 0;
} else {
pm.run(ctx.module);
file.close();
}
}
}
}
void gen_llvm(const std::vector<definition_ptr>& prog) {
llvm_context ctx;
gen_llvm_internal_op(ctx, PLUS);
gen_llvm_internal_op(ctx, MINUS);
gen_llvm_internal_op(ctx, TIMES);
gen_llvm_internal_op(ctx, DIVIDE);
for(auto& definition : prog) {
definition->gen_llvm_first(ctx);
}
for(auto& definition : prog) {
definition->gen_llvm_second(ctx);
}
ctx.module.print(llvm::outs(), nullptr);
output_llvm(ctx, "program.o");
}
int main() {
yy::parser parser;
type_mgr mgr;
type_env env;
parser.parse();
for(auto& definition : program) {
definition_defn* def = dynamic_cast<definition_defn*>(definition.get());
if(!def) continue;
std::cout << def->name;
for(auto& param : def->params) std::cout << " " << param;
std::cout << ":" << std::endl;
def->body->print(1, std::cout);
}
try {
typecheck_program(program, mgr, env);
compile_program(program);
gen_llvm(program);
} catch(unification_error& err) {
std::cout << "failed to unify types: " << std::endl;
std::cout << " (1) \033[34m";
err.left->print(mgr, std::cout);
std::cout << "\033[0m" << std::endl;
std::cout << " (2) \033[32m";
err.right->print(mgr, std::cout);
std::cout << "\033[0m" << std::endl;
} catch(type_error& err) {
std::cout << "failed to type check program: " << err.description << std::endl;
}
}

141
code/compiler/08/parser.y Normal file
View File

@@ -0,0 +1,141 @@
%{
#include <string>
#include <iostream>
#include "ast.hpp"
#include "definition.hpp"
#include "parser.hpp"
std::vector<definition_ptr> program;
extern yy::parser::symbol_type yylex();
%}
%token PLUS
%token TIMES
%token MINUS
%token DIVIDE
%token <int> INT
%token DEFN
%token DATA
%token CASE
%token OF
%token OCURLY
%token CCURLY
%token OPAREN
%token CPAREN
%token COMMA
%token ARROW
%token EQUAL
%token <std::string> LID
%token <std::string> UID
%language "c++"
%define api.value.type variant
%define api.token.constructor
%type <std::vector<std::string>> lowercaseParams uppercaseParams
%type <std::vector<definition_ptr>> program definitions
%type <std::vector<branch_ptr>> branches
%type <std::vector<constructor_ptr>> constructors
%type <ast_ptr> aAdd aMul case app appBase
%type <definition_ptr> definition defn data
%type <branch_ptr> branch
%type <pattern_ptr> pattern
%type <constructor_ptr> constructor
%start program
%%
program
: definitions { program = std::move($1); }
;
definitions
: definitions definition { $$ = std::move($1); $$.push_back(std::move($2)); }
| definition { $$ = std::vector<definition_ptr>(); $$.push_back(std::move($1)); }
;
definition
: defn { $$ = std::move($1); }
| data { $$ = std::move($1); }
;
defn
: DEFN LID lowercaseParams EQUAL OCURLY aAdd CCURLY
{ $$ = definition_ptr(
new definition_defn(std::move($2), std::move($3), std::move($6))); }
;
lowercaseParams
: %empty { $$ = std::vector<std::string>(); }
| lowercaseParams LID { $$ = std::move($1); $$.push_back(std::move($2)); }
;
uppercaseParams
: %empty { $$ = std::vector<std::string>(); }
| uppercaseParams UID { $$ = std::move($1); $$.push_back(std::move($2)); }
;
aAdd
: aAdd PLUS aMul { $$ = ast_ptr(new ast_binop(PLUS, std::move($1), std::move($3))); }
| aAdd MINUS aMul { $$ = ast_ptr(new ast_binop(MINUS, std::move($1), std::move($3))); }
| aMul { $$ = std::move($1); }
;
aMul
: aMul TIMES app { $$ = ast_ptr(new ast_binop(TIMES, std::move($1), std::move($3))); }
| aMul DIVIDE app { $$ = ast_ptr(new ast_binop(DIVIDE, std::move($1), std::move($3))); }
| app { $$ = std::move($1); }
;
app
: app appBase { $$ = ast_ptr(new ast_app(std::move($1), std::move($2))); }
| appBase { $$ = std::move($1); }
;
appBase
: INT { $$ = ast_ptr(new ast_int($1)); }
| LID { $$ = ast_ptr(new ast_lid(std::move($1))); }
| UID { $$ = ast_ptr(new ast_uid(std::move($1))); }
| OPAREN aAdd CPAREN { $$ = std::move($2); }
| case { $$ = std::move($1); }
;
case
: CASE aAdd OF OCURLY branches CCURLY
{ $$ = ast_ptr(new ast_case(std::move($2), std::move($5))); }
;
branches
: branches branch { $$ = std::move($1); $$.push_back(std::move($2)); }
| branch { $$ = std::vector<branch_ptr>(); $$.push_back(std::move($1));}
;
branch
: pattern ARROW OCURLY aAdd CCURLY
{ $$ = branch_ptr(new branch(std::move($1), std::move($4))); }
;
pattern
: LID { $$ = pattern_ptr(new pattern_var(std::move($1))); }
| UID lowercaseParams
{ $$ = pattern_ptr(new pattern_constr(std::move($1), std::move($2))); }
;
data
: DATA UID EQUAL OCURLY constructors CCURLY
{ $$ = definition_ptr(new definition_data(std::move($2), std::move($5))); }
;
constructors
: constructors COMMA constructor { $$ = std::move($1); $$.push_back(std::move($3)); }
| constructor
{ $$ = std::vector<constructor_ptr>(); $$.push_back(std::move($1)); }
;
constructor
: UID uppercaseParams
{ $$ = constructor_ptr(new constructor(std::move($1), std::move($2))); }
;

183
code/compiler/08/runtime.c Normal file
View File

@@ -0,0 +1,183 @@
#include <stdint.h>
#include <assert.h>
#include <memory.h>
#include <stdio.h>
#include "runtime.h"
struct node_base* alloc_node() {
struct node_base* new_node = malloc(sizeof(struct node_app));
assert(new_node != NULL);
return new_node;
}
struct node_app* alloc_app(struct node_base* l, struct node_base* r) {
struct node_app* node = (struct node_app*) alloc_node();
node->base.tag = NODE_APP;
node->left = l;
node->right = r;
return node;
}
struct node_num* alloc_num(int32_t n) {
struct node_num* node = (struct node_num*) alloc_node();
node->base.tag = NODE_NUM;
node->value = n;
return node;
}
struct node_global* alloc_global(void (*f)(struct stack*), int32_t a) {
struct node_global* node = (struct node_global*) alloc_node();
node->base.tag = NODE_GLOBAL;
node->arity = a;
node->function = f;
return node;
}
struct node_ind* alloc_ind(struct node_base* n) {
struct node_ind* node = (struct node_ind*) alloc_node();
node->base.tag = NODE_IND;
node->next = n;
return node;
}
void stack_init(struct stack* s) {
s->size = 4;
s->count = 0;
s->data = malloc(sizeof(*s->data) * s->size);
assert(s->data != NULL);
}
void stack_free(struct stack* s) {
free(s->data);
}
void stack_push(struct stack* s, struct node_base* n) {
while(s->count >= s->size) {
s->data = realloc(s->data, sizeof(*s->data) * (s->size *= 2));
assert(s->data != NULL);
}
s->data[s->count++] = n;
}
struct node_base* stack_pop(struct stack* s) {
assert(s->count > 0);
return s->data[--s->count];
}
struct node_base* stack_peek(struct stack* s, size_t o) {
assert(s->count > o);
return s->data[s->count - o - 1];
}
void stack_popn(struct stack* s, size_t n) {
assert(s->count >= n);
s->count -= n;
}
void stack_slide(struct stack* s, size_t n) {
assert(s->count > n);
s->data[s->count - n - 1] = s->data[s->count - 1];
s->count -= n;
}
void stack_update(struct stack* s, size_t o) {
assert(s->count > o + 1);
struct node_ind* ind = (struct node_ind*) s->data[s->count - o - 2];
ind->base.tag = NODE_IND;
ind->next = s->data[s->count -= 1];
}
void stack_alloc(struct stack* s, size_t o) {
while(o--) {
stack_push(s, (struct node_base*) alloc_ind(NULL));
}
}
void stack_pack(struct stack* s, size_t n, int8_t t) {
assert(s->count >= n);
struct node_base** data = malloc(sizeof(*data) * n);
assert(data != NULL);
memcpy(data, &s->data[s->count - n], n * sizeof(*data));
struct node_data* new_node = (struct node_data*) alloc_node();
new_node->array = data;
new_node->base.tag = NODE_DATA;
new_node->tag = t;
stack_popn(s, n);
stack_push(s, (struct node_base*) new_node);
}
void stack_split(struct stack* s, size_t n) {
struct node_data* node = (struct node_data*) stack_pop(s);
for(size_t i = 0; i < n; i++) {
stack_push(s, node->array[i]);
}
}
void unwind(struct stack* s) {
while(1) {
struct node_base* peek = stack_peek(s, 0);
if(peek->tag == NODE_APP) {
struct node_app* n = (struct node_app*) peek;
stack_push(s, n->left);
} else if(peek->tag == NODE_GLOBAL) {
struct node_global* n = (struct node_global*) peek;
assert(s->count > n->arity);
for(size_t i = 1; i <= n->arity; i++) {
s->data[s->count - i]
= ((struct node_app*) s->data[s->count - i - 1])->right;
}
n->function(s);
} else if(peek->tag == NODE_IND) {
struct node_ind* n = (struct node_ind*) peek;
stack_pop(s);
stack_push(s, n->next);
} else {
break;
}
}
}
struct node_base* eval(struct node_base* n) {
struct stack program_stack;
stack_init(&program_stack);
stack_push(&program_stack, n);
unwind(&program_stack);
struct node_base* result = stack_pop(&program_stack);
stack_free(&program_stack);
return result;
}
extern void f_main(struct stack* s);
void print_node(struct node_base* n) {
if(n->tag == NODE_APP) {
struct node_app* app = (struct node_app*) n;
print_node(app->left);
putchar(' ');
print_node(app->right);
} else if(n->tag == NODE_DATA) {
printf("(Packed)");
} else if(n->tag == NODE_GLOBAL) {
struct node_global* global = (struct node_global*) n;
printf("(Global: %p)", global->function);
} else if(n->tag == NODE_IND) {
print_node(((struct node_ind*) n)->next);
} else if(n->tag == NODE_NUM) {
struct node_num* num = (struct node_num*) n;
printf("%d", num->value);
}
}
int main(int argc, char** argv) {
struct node_global* first_node = alloc_global(f_main, 0);
struct node_base* result = eval((struct node_base*) first_node);
printf("Result: ");
print_node(result);
putchar('\n');
}

View File

@@ -0,0 +1,70 @@
#pragma once
#include <stdlib.h>
struct stack;
enum node_tag {
NODE_APP,
NODE_NUM,
NODE_GLOBAL,
NODE_IND,
NODE_DATA
};
struct node_base {
enum node_tag tag;
};
struct node_app {
struct node_base base;
struct node_base* left;
struct node_base* right;
};
struct node_num {
struct node_base base;
int32_t value;
};
struct node_global {
struct node_base base;
int32_t arity;
void (*function)(struct stack*);
};
struct node_ind {
struct node_base base;
struct node_base* next;
};
struct node_data {
struct node_base base;
int8_t tag;
struct node_base** array;
};
struct node_base* alloc_node();
struct node_app* alloc_app(struct node_base* l, struct node_base* r);
struct node_num* alloc_num(int32_t n);
struct node_global* alloc_global(void (*f)(struct stack*), int32_t a);
struct node_ind* alloc_ind(struct node_base* n);
struct stack {
size_t size;
size_t count;
struct node_base** data;
};
void stack_init(struct stack* s);
void stack_free(struct stack* s);
void stack_push(struct stack* s, struct node_base* n);
struct node_base* stack_pop(struct stack* s);
struct node_base* stack_peek(struct stack* s, size_t o);
void stack_popn(struct stack* s, size_t n);
void stack_slide(struct stack* s, size_t n);
void stack_update(struct stack* s, size_t o);
void stack_alloc(struct stack* s, size_t o);
void stack_pack(struct stack* s, size_t n, int8_t t);
void stack_split(struct stack* s, size_t n);
struct node_base* eval(struct node_base* n);

View File

@@ -0,0 +1,35 @@
%option noyywrap
%{
#include <iostream>
#include "ast.hpp"
#include "definition.hpp"
#include "parser.hpp"
#define YY_DECL yy::parser::symbol_type yylex()
%}
%%
[ \n]+ {}
\+ { return yy::parser::make_PLUS(); }
\* { return yy::parser::make_TIMES(); }
- { return yy::parser::make_MINUS(); }
\/ { return yy::parser::make_DIVIDE(); }
[0-9]+ { return yy::parser::make_INT(atoi(yytext)); }
defn { return yy::parser::make_DEFN(); }
data { return yy::parser::make_DATA(); }
case { return yy::parser::make_CASE(); }
of { return yy::parser::make_OF(); }
\{ { return yy::parser::make_OCURLY(); }
\} { return yy::parser::make_CCURLY(); }
\( { return yy::parser::make_OPAREN(); }
\) { return yy::parser::make_CPAREN(); }
, { return yy::parser::make_COMMA(); }
-> { return yy::parser::make_ARROW(); }
= { return yy::parser::make_EQUAL(); }
[a-z][a-zA-Z]* { return yy::parser::make_LID(std::string(yytext)); }
[A-Z][a-zA-Z]* { return yy::parser::make_UID(std::string(yytext)); }
%%

99
code/compiler/08/type.cpp Normal file
View File

@@ -0,0 +1,99 @@
#include "type.hpp"
#include <sstream>
#include <algorithm>
#include "error.hpp"
void type_var::print(const type_mgr& mgr, std::ostream& to) const {
auto it = mgr.types.find(name);
if(it != mgr.types.end()) {
it->second->print(mgr, to);
} else {
to << name;
}
}
void type_base::print(const type_mgr& mgr, std::ostream& to) const {
to << name;
}
void type_arr::print(const type_mgr& mgr, std::ostream& to) const {
left->print(mgr, to);
to << " -> (";
right->print(mgr, to);
to << ")";
}
std::string type_mgr::new_type_name() {
int temp = last_id++;
std::string str = "";
while(temp != -1) {
str += (char) ('a' + (temp % 26));
temp = temp / 26 - 1;
}
std::reverse(str.begin(), str.end());
return str;
}
type_ptr type_mgr::new_type() {
return type_ptr(new type_var(new_type_name()));
}
type_ptr type_mgr::new_arrow_type() {
return type_ptr(new type_arr(new_type(), new_type()));
}
type_ptr type_mgr::resolve(type_ptr t, type_var*& var) const {
type_var* cast;
var = nullptr;
while((cast = dynamic_cast<type_var*>(t.get()))) {
auto it = types.find(cast->name);
if(it == types.end()) {
var = cast;
break;
}
t = it->second;
}
return t;
}
void type_mgr::unify(type_ptr l, type_ptr r) {
type_var* lvar;
type_var* rvar;
type_arr* larr;
type_arr* rarr;
type_base* lid;
type_base* rid;
l = resolve(l, lvar);
r = resolve(r, rvar);
if(lvar) {
bind(lvar->name, r);
return;
} else if(rvar) {
bind(rvar->name, l);
return;
} else if((larr = dynamic_cast<type_arr*>(l.get())) &&
(rarr = dynamic_cast<type_arr*>(r.get()))) {
unify(larr->left, rarr->left);
unify(larr->right, rarr->right);
return;
} else if((lid = dynamic_cast<type_base*>(l.get())) &&
(rid = dynamic_cast<type_base*>(r.get()))) {
if(lid->name == rid->name) return;
}
throw unification_error(l, r);
}
void type_mgr::bind(const std::string& s, type_ptr t) {
type_var* other = dynamic_cast<type_var*>(t.get());
if(other && other->name == s) return;
types[s] = t;
}

65
code/compiler/08/type.hpp Normal file
View File

@@ -0,0 +1,65 @@
#pragma once
#include <memory>
#include <map>
struct type_mgr;
struct type {
virtual ~type() = default;
virtual void print(const type_mgr& mgr, std::ostream& to) const = 0;
};
using type_ptr = std::shared_ptr<type>;
struct type_var : public type {
std::string name;
type_var(std::string n)
: name(std::move(n)) {}
void print(const type_mgr& mgr, std::ostream& to) const;
};
struct type_base : public type {
std::string name;
type_base(std::string n)
: name(std::move(n)) {}
void print(const type_mgr& mgr, std::ostream& to) const;
};
struct type_data : public type_base {
struct constructor {
int tag;
};
std::map<std::string, constructor> constructors;
type_data(std::string n)
: type_base(std::move(n)) {}
};
struct type_arr : public type {
type_ptr left;
type_ptr right;
type_arr(type_ptr l, type_ptr r)
: left(std::move(l)), right(std::move(r)) {}
void print(const type_mgr& mgr, std::ostream& to) const;
};
struct type_mgr {
int last_id = 0;
std::map<std::string, type_ptr> types;
std::string new_type_name();
type_ptr new_type();
type_ptr new_arrow_type();
void unify(type_ptr l, type_ptr r);
type_ptr resolve(type_ptr t, type_var*& var) const;
void bind(const std::string& s, type_ptr t);
};

View File

@@ -0,0 +1,16 @@
#include "type_env.hpp"
type_ptr type_env::lookup(const std::string& name) const {
auto it = names.find(name);
if(it != names.end()) return it->second;
if(parent) return parent->lookup(name);
return nullptr;
}
void type_env::bind(const std::string& name, type_ptr t) {
names[name] = t;
}
type_env type_env::scope() const {
return type_env(this);
}

View File

@@ -0,0 +1,16 @@
#pragma once
#include <map>
#include "type.hpp"
struct type_env {
std::map<std::string, type_ptr> names;
type_env const* parent = nullptr;
type_env(type_env const* p)
: parent(p) {}
type_env() : type_env(nullptr) {}
type_ptr lookup(const std::string& name) const;
void bind(const std::string& name, type_ptr t);
type_env scope() const;
};

View File

@@ -2,12 +2,11 @@
title: Compiling a Functional Language Using C++, Part 0 - Intro title: Compiling a Functional Language Using C++, Part 0 - Intro
date: 2019-08-03T01:02:30-07:00 date: 2019-08-03T01:02:30-07:00
tags: ["C and C++", "Functional Languages", "Compilers"] tags: ["C and C++", "Functional Languages", "Compilers"]
draft: true
--- ---
During my last academic term, I was enrolled in a compilers course. During my last academic term, I was enrolled in a compilers course.
We had a final project - develop a compiler for a basic Python subset, We had a final project - develop a compiler for a basic Python subset,
using LLVM. It was a little boring - virtually nothing about the compiler using LLVM. It was a little boring - virtually nothing about the compiler
was __not__ covered in class, and it felt more like putting two puzzles was __not__ covered in class, and it felt more like putting two puzzle
pieces together than building a real project. pieces together than building a real project.
Instead, I chose to implement a compiler for a functional programming language, Instead, I chose to implement a compiler for a functional programming language,
@@ -54,7 +53,7 @@ an alternative way of representing the program that isn't machine code.
In some compilers, the stages of parsing and analysis can overlap. In some compilers, the stages of parsing and analysis can overlap.
In short, just like the pirate's code, it's more of a guideline than a rule. In short, just like the pirate's code, it's more of a guideline than a rule.
#### What we'll cover #### What we will cover
We'll go through the stages of a compiler, starting from scratch We'll go through the stages of a compiler, starting from scratch
and building up our project. We'll cover: and building up our project. We'll cover:
@@ -128,7 +127,7 @@ constructed from an integer and another list (as defined in our `data` example),
return the integer". return the integer".
That's it for the introduction! In the next post, we'll cover tokenizng, which is That's it for the introduction! In the next post, we'll cover tokenizng, which is
the first step in coverting source code into an executable program. the first step in converting source code into an executable program.
### Navigation ### Navigation
Here are the posts that I've written so far for this series: Here are the posts that I've written so far for this series:
@@ -136,3 +135,8 @@ Here are the posts that I've written so far for this series:
* [Tokenizing]({{< relref "01_compiler_tokenizing.md" >}}) * [Tokenizing]({{< relref "01_compiler_tokenizing.md" >}})
* [Parsing]({{< relref "02_compiler_parsing.md" >}}) * [Parsing]({{< relref "02_compiler_parsing.md" >}})
* [Typechecking]({{< relref "03_compiler_typechecking.md" >}}) * [Typechecking]({{< relref "03_compiler_typechecking.md" >}})
* [Small Improvements]({{< relref "04_compiler_improvements.md" >}})
* [Execution]({{< relref "05_compiler_execution.md" >}})
* [Compilation]({{< relref "06_compiler_compilation.md" >}})
* [Runtime]({{< relref "07_compiler_runtime.md" >}})
* [LLVM]({{< relref "08_compiler_llvm.md" >}})

View File

@@ -2,7 +2,6 @@
title: Compiling a Functional Language Using C++, Part 1 - Tokenizing title: Compiling a Functional Language Using C++, Part 1 - Tokenizing
date: 2019-08-03T01:02:30-07:00 date: 2019-08-03T01:02:30-07:00
tags: ["C and C++", "Functional Languages", "Compilers"] tags: ["C and C++", "Functional Languages", "Compilers"]
draft: true
--- ---
It makes sense to build a compiler bit by bit, following the stages we outlined in It makes sense to build a compiler bit by bit, following the stages we outlined in
the first post of the series. This is because these stages are essentially a pipeline, the first post of the series. This is because these stages are essentially a pipeline,
@@ -48,7 +47,7 @@ are fairly simple - one or more digits is an integer, a few letters together
are a variable name. In order to be able to efficiently break text up into are a variable name. In order to be able to efficiently break text up into
such tokens, we restrict ourselves to __regular languages__. A language such tokens, we restrict ourselves to __regular languages__. A language
is defined as a set of strings (potentially infinite), and a regular is defined as a set of strings (potentially infinite), and a regular
language for which we can write a __regular expression__ to check if language is one for which we can write a __regular expression__ to check if
a string is in the set. Regular expressions are a way of representing a string is in the set. Regular expressions are a way of representing
patterns that a string has to match. We define regular expressions patterns that a string has to match. We define regular expressions
as follows: as follows:
@@ -77,7 +76,7 @@ Let's see some examples. An integer, such as 326, can be represented with \\([0-
This means, one or more characters between 0 or 9. Some (most) regex implementations This means, one or more characters between 0 or 9. Some (most) regex implementations
have a special symbol for \\([0-9]\\), written as \\(\\setminus d\\). A variable, have a special symbol for \\([0-9]\\), written as \\(\\setminus d\\). A variable,
starting with a lowercase letter and containing lowercase or uppercase letters after it, starting with a lowercase letter and containing lowercase or uppercase letters after it,
can be written as \\(\[a-z\]([a-z]+)?\\). Again, most regex implementations provide can be written as \\(\[a-z\]([a-zA-Z]+)?\\). Again, most regex implementations provide
a special operator for \\((r_1+)?\\), written as \\(r_1*\\). a special operator for \\((r_1+)?\\), written as \\(r_1*\\).
So how does one go about checking if a regular expression matches a string? An efficient way is to So how does one go about checking if a regular expression matches a string? An efficient way is to
@@ -115,8 +114,8 @@ represent numbers directly into numbers, and do other small tasks.
So, what tokens do we have? From our arithmetic definition, we see that we have integers. So, what tokens do we have? From our arithmetic definition, we see that we have integers.
Let's use the regex `[0-9]+` for those. We also have the operators `+`, `-`, `*`, and `/`. Let's use the regex `[0-9]+` for those. We also have the operators `+`, `-`, `*`, and `/`.
`-` is simple enough: the corresponding regex is `-`. We need to The regex for `-` is simple enough: it's just `-`. However, we need to
preface our `/`, `+` and `*` with a backslash, though, since they happen to also be modifiers preface our `/`, `+` and `*` with a backslash, since they happen to also be modifiers
in flex's regular expressions: `\/`, `\+`, `\*`. in flex's regular expressions: `\/`, `\+`, `\*`.
Let's also represent some reserved keywords. We'll say that `defn`, `data`, `case`, and `of` Let's also represent some reserved keywords. We'll say that `defn`, `data`, `case`, and `of`
@@ -142,7 +141,7 @@ C++ code that should be executed when the regular expression is matched.
The first token: whitespace. This includes the space character, The first token: whitespace. This includes the space character,
and the newline character. We ignore it, so its rule is empty. After that, and the newline character. We ignore it, so its rule is empty. After that,
we have the regular expressions for the tokens we've talked about. For each, I just we have the regular expressions for the tokens we've talked about. For each, I just
print a description of the token that matched. This will change we we hook this up to print a description of the token that matched. This will change when we hook this up to
a parser, but for now, this works fine. Notice that the variable `yytext` contains a parser, but for now, this works fine. Notice that the variable `yytext` contains
the string matched by our regular expression. This variable is set by the code flex the string matched by our regular expression. This variable is set by the code flex
generates, and we can use it to get the extract text that matched a regex. This is generates, and we can use it to get the extract text that matched a regex. This is
@@ -170,3 +169,6 @@ TIMES
NUMBER: 6 NUMBER: 6
``` ```
Hooray! We have tokenizing. Hooray! We have tokenizing.
With our text neatly divided into meaningful chunks, we
can continue on to [Part 2 - Parsing]({{< relref "02_compiler_parsing.md" >}}).

View File

@@ -2,7 +2,6 @@
title: Compiling a Functional Language Using C++, Part 2 - Parsing title: Compiling a Functional Language Using C++, Part 2 - Parsing
date: 2019-08-03T01:02:30-07:00 date: 2019-08-03T01:02:30-07:00
tags: ["C and C++", "Functional Languages", "Compilers"] tags: ["C and C++", "Functional Languages", "Compilers"]
draft: true
--- ---
In the previous post, we covered tokenizing. We learned how to convert an input string into logical segments, and even wrote up a tokenizer to do it according to the rules of our language. Now, it's time to make sense of the tokens, and parse our language. In the previous post, we covered tokenizing. We learned how to convert an input string into logical segments, and even wrote up a tokenizer to do it according to the rules of our language. Now, it's time to make sense of the tokens, and parse our language.
@@ -38,7 +37,7 @@ $$
In practice, there are many ways of using a CFG to parse a programming language. Various parsing algorithms support various subsets In practice, there are many ways of using a CFG to parse a programming language. Various parsing algorithms support various subsets
of context free languages. For instance, top down parsers follow nearly exactly the structure that we had. They try to parse of context free languages. For instance, top down parsers follow nearly exactly the structure that we had. They try to parse
a nonterminal by trying to match each symbol in its body. In the rule \\(S \\rightarrow \\alpha \\beta \\gamma\\), it will a nonterminal by trying to match each symbol in its body. In the rule \\(S \\rightarrow \\alpha \\beta \\gamma\\), it will
first try to match \\(alpha\\), then \\(beta\\), and so on. If one of the three contains a nonterminal, it will attempt to parse first try to match \\(\\alpha\\), then \\(\\beta\\), and so on. If one of the three contains a nonterminal, it will attempt to parse
that nonterminal following the same strategy. However, this leaves a flaw - For instance, consider the grammar that nonterminal following the same strategy. However, this leaves a flaw - For instance, consider the grammar
$$ $$
\\begin{align} \\begin{align}
@@ -105,7 +104,7 @@ A\_{add} & \\rightarrow A\_{add}-A\_{mult} \\\\\\
A\_{add} & \\rightarrow A\_{mult} A\_{add} & \\rightarrow A\_{mult}
\\end{align} \\end{align}
$$ $$
The first rule matches another addition, added to the result of another addition. We use the addition in the body The first rule matches another addition, added to the result of a multiplication. Similarly, the second rule matches another addition, from which the result of a multiplication is then subtracted. We use the \\(A\_{add}\\) on the left side of \\(+\\) and \\(-\\) in the body
because we want to be able to parse strings like `1+2+3+4`, which we want to view as `((1+2)+3)+4` (mostly because because we want to be able to parse strings like `1+2+3+4`, which we want to view as `((1+2)+3)+4` (mostly because
subtraction is [left-associative](https://en.wikipedia.org/wiki/Operator_associativity)). So, we want the top level subtraction is [left-associative](https://en.wikipedia.org/wiki/Operator_associativity)). So, we want the top level
of the tree to be the rightmost `+` or `-`, since that means it will be the "last" operation. You may be asking, of the tree to be the rightmost `+` or `-`, since that means it will be the "last" operation. You may be asking,
@@ -150,7 +149,7 @@ What's the last \\(C\\)? We also want a "thing" to be a case expression. Here ar
$$ $$
\\begin{align} \\begin{align}
C & \\rightarrow \\text{case} \\; A\_{add} \\; \\text{of} \\; \\{ L\_B\\} \\\\\\ C & \\rightarrow \\text{case} \\; A\_{add} \\; \\text{of} \\; \\{ L\_B\\} \\\\\\
L\_B & \\rightarrow R \\; , \\; L\_B \\\\\\ L\_B & \\rightarrow R \\; L\_B \\\\\\
L\_B & \\rightarrow R \\\\\\ L\_B & \\rightarrow R \\\\\\
R & \\rightarrow N \\; \\text{arrow} \\; \\{ A\_{add} \\} \\\\\\ R & \\rightarrow N \\; \\text{arrow} \\; \\{ A\_{add} \\} \\\\\\
N & \\rightarrow \\text{lowerVar} \\\\\\ N & \\rightarrow \\text{lowerVar} \\\\\\
@@ -289,5 +288,6 @@ wrong:
}{ }{
``` ```
We are told an error occured. Excellent! We're not really sure what our tree looks like, though. We are told an error occured. Excellent! We're not really sure what our tree looks like, though.
We just know there's __stuff__ in the list of definitions. We will revisit our trees We just know there's __stuff__ in the list of definitions. Having read our source code into
in the next post, adding code to print them and to verify that our programs make some sense. a structure we're more equipped to handle, we can now try to verify that the code
makes sense in [Part 3 - Type Checking]({{< relref "03_compiler_typechecking.md" >}}).

View File

@@ -1,7 +1,6 @@
--- ---
title: Compiling a Functional Language Using C++, Part 3 - Type Checking title: Compiling a Functional Language Using C++, Part 3 - Type Checking
date: 2019-08-06T14:26:38-07:00 date: 2019-08-06T14:26:38-07:00
draft: true
tags: ["C and C++", "Functional Languages", "Compilers"] tags: ["C and C++", "Functional Languages", "Compilers"]
--- ---
I think tokenizing and parsing are boring. The thing is, looking at syntax I think tokenizing and parsing are boring. The thing is, looking at syntax
@@ -604,3 +603,6 @@ as well as these two:
All of our examples print the number of declarations in the program, All of our examples print the number of declarations in the program,
which means they don't throw 0. And so, we have typechecking! which means they don't throw 0. And so, we have typechecking!
Before we look at how we will execute our source code,
we will slow down and make quality of life improvements
in our codebase in [Part 4 - Small Improvements]({{< relref "04_compiler_improvements.md" >}}).

View File

@@ -1,7 +1,6 @@
--- ---
title: Compiling a Functional Language Using C++, Part 4 - Small Improvements title: Compiling a Functional Language Using C++, Part 4 - Small Improvements
date: 2019-08-06T14:26:38-07:00 date: 2019-08-06T14:26:38-07:00
draft: true
tags: ["C and C++", "Functional Languages", "Compilers"] tags: ["C and C++", "Functional Languages", "Compilers"]
--- ---
We've done quite a big push in the previous post. We defined We've done quite a big push in the previous post. We defined
@@ -47,29 +46,29 @@ virtual void print(std::ostream& to) const;
Let's take a look at the implementation. For `ast_int`, Let's take a look at the implementation. For `ast_int`,
`ast_lid`, and `ast_uid`: `ast_lid`, and `ast_uid`:
{{< codelines "C++" "compiler/04/ast.cpp" 18 21 >}} {{< codelines "C++" "compiler/04/ast.cpp" 19 22 >}}
{{< codelines "C++" "compiler/04/ast.cpp" 27 30 >}} {{< codelines "C++" "compiler/04/ast.cpp" 28 31 >}}
{{< codelines "C++" "compiler/04/ast.cpp" 36 39 >}} {{< codelines "C++" "compiler/04/ast.cpp" 37 40 >}}
With `ast_binop` things get a bit more interesting. With `ast_binop` things get a bit more interesting.
We call `print` recursively on the children of the We call `print` recursively on the children of the
`binop` node: `binop` node:
{{< codelines "C++" "compiler/04/ast.cpp" 45 50 >}} {{< codelines "C++" "compiler/04/ast.cpp" 46 51 >}}
The same idea for `ast_app`: The same idea for `ast_app`:
{{< codelines "C++" "compiler/04/ast.cpp" 66 71 >}} {{< codelines "C++" "compiler/04/ast.cpp" 67 72 >}}
Finally, just like `ast_case::typecheck` called Finally, just like `ast_case::typecheck` called
`pattern::match`, `ast_case::print` calls `pattern::print`: `pattern::match`, `ast_case::print` calls `pattern::print`:
{{< codelines "C++" "compiler/04/ast.cpp" 83 92 >}} {{< codelines "C++" "compiler/04/ast.cpp" 84 93 >}}
We follow the same implementation strategy for patterns, We follow the same implementation strategy for patterns,
but we don't need indentation, or recursion: but we don't need indentation, or recursion:
{{< codelines "C++" "compiler/04/ast.cpp" 108 110 >}} {{< codelines "C++" "compiler/04/ast.cpp" 115 117 >}}
{{< codelines "C++" "compiler/04/ast.cpp" 116 121 >}} {{< codelines "C++" "compiler/04/ast.cpp" 123 128 >}}
Let's print the bodies of each function we receive from the parser: In `main`, let's print the bodies of each function we receive from the parser:
{{< codelines "C++" "compiler/04/main.cpp" 41 56 >}} {{< codelines "C++" "compiler/04/main.cpp" 47 56 >}}
### Printing Types ### Printing Types
Types are another thing that we want to be able to inspect, so let's Types are another thing that we want to be able to inspect, so let's
@@ -79,7 +78,7 @@ virtual void print(const type_mgr& mgr, std::ostream& to) const;
``` ```
We need the type manager so we can follow substitutions. We need the type manager so we can follow substitutions.
The implementation is simple enough: The implementation is simple enough:
{{< codelines "C++" "compiler/04/type.cpp" 5 24 >}} {{< codelines "C++" "compiler/04/type.cpp" 6 24 >}}
Let's also print out the types we infer. We'll make it a separate loop Let's also print out the types we infer. We'll make it a separate loop
at the bottom of the `typecheck_program` function, because it's mostly just at the bottom of the `typecheck_program` function, because it's mostly just
@@ -127,6 +126,72 @@ Remember how we build the function type backwards in Part 3? We have to do the s
We replace the fragment with the proper reverse iteration: We replace the fragment with the proper reverse iteration:
{{< codelines "C++" "compiler/04/definition.cpp" 37 40 >}} {{< codelines "C++" "compiler/04/definition.cpp" 37 40 >}}
### Throwing Exceptions
Throwing 0 is never a good idea. Such an exception doesn't contain any information
that we may find useful in debugging, nor any information that would benefit
the users of the compiler. Instead, let's define our own exception classes,
and throw them instead. We'll make two:
{{< codeblock "C++" "compiler/04/error.hpp" >}}
Only one function needs to be implemented, and it's pretty boring:
{{< codeblock "C++" "compiler/04/error.cpp" >}}
It's time to throw these instead of 0. Let's take a look at the places
we do so.
First, we throw 0 in `type.cpp`, in the `type_mgr::unify` method. This is
where our `unification_error` comes in. The error will
contain the two types that we failed to unify, which we will
later report to the user:
{{< codelines "C++" "compiler/04/type.cpp" 91 91 >}}
Next up, we have a few throws in `ast.cpp`. The first is in `op_string`, but
we will simply replace it with `return "??"`, which will be caught later on
(either way, the case expression falling through would be a compiler bug,
since the user has no way of providing an invalid binary operator). The
first throw we need to address is in `ast_binop::typecheck`, in the case
that we don't find a type for a binary operator. We report this
directly:
{{< codelines "C++" "compiler/04/ast.cpp" 57 57 >}}
We will introduce a new exception into `ast_case::typecheck`. Previously,
we simply pass the type of the expression to be case analyzed into
the pattern matching method. However, since we don't want
case analysis on functions, we ensure that the type of the expression
is `type_base`. If not, we report this:
{{< codelines "C++" "compiler/04/ast.cpp" 107 110 >}}
The next exception is in `pattern_constr::match`. It occurs
when the pattern has a constructor we don't recognize, and
that's exactly what we report:
{{< codelines "C++" "compiler/04/ast.cpp" 132 134 >}}
The next exception occurs in a loop, when we bind
types for each of the constructor pattern's variables.
We throw when we are unable to cast the remaining
constructor type to a `type_arr`. Conceptually,
this means that the pattern wants to apply the
constructor to more parameters than it actually
takes:
{{< codelines "C++" "compiler/04/ast.cpp" 138 138 >}}
We remove the last throw at the bottom of `pattern_constr::match`.
This is because once unification succeeds, we know
that the return type of the pattern is a base type since
we know the type of the case expression is a base type
(we know this because we added that check to `ast_case::typecheck`).
Finally, let's catch and report these exceptions. We could do it
in `typecheck_program`, but I think doing so in `main` is neater.
Since printing types requires a `type_mgr`, we'll move the
declarations of both `type_mgr` and `type_env` to the top of
main, and pass them to `typecheck_program` as parameters. Then,
we can surround the call to `typecheck_program` with
try/catch:
{{< codelines "C++" "compiler/04/main.cpp" 57 69 >}}
We use some [ANSI escape codes](https://en.wikipedia.org/wiki/ANSI_escape_code)
to color the types in the case of a unification error.
### Setting up CMake ### Setting up CMake
We will set up CMake as our build system. This would be extremely easy We will set up CMake as our build system. This would be extremely easy
@@ -147,16 +212,16 @@ in order to compile. We add this dependency:
Finally, we add our source code to a CMake target. We use Finally, we add our source code to a CMake target. We use
the `BISON_parser_OUTPUTS` and `FLEX_scanner_OUTPUTS` to the `BISON_parser_OUTPUTS` and `FLEX_scanner_OUTPUTS` to
pass in the source files generated by Flex and Bison. pass in the source files generated by Flex and Bison.
{{< codelines "CMake" "compiler/04/CMakeLists.txt" 15 22 >}} {{< codelines "CMake" "compiler/04/CMakeLists.txt" 15 23 >}}
Almost there! `parser.cpp` will be generated in the `build` directory Almost there! `parser.cpp` will be generated in the `build` directory
during an out-of-source build, and so will `parser.hpp`. When building, during an out-of-source build, and so will `parser.hpp`. When building,
`parser.cpp` will try to look for `ast.hpp`, and `main.cpp` will look for `parser.cpp` will try to look for `ast.hpp`, and `main.cpp` will look for
`parser.hpp`. We want them to be able to find each other, so we `parser.hpp`. We want them to be able to find each other, so we
add both the source directory and the build (binary) directory to add both the source directory and the build (binary) directory to
the list of includes directories: the list of include directories:
{{< codelines "CMake" "compiler/04/CMakeLists.txt" 23 24 >}} {{< codelines "CMake" "compiler/04/CMakeLists.txt" 24 25 >}}
That's it for CMake! Let's try our build: That's it for CMake! Let's try our build:
``` ```
@@ -164,5 +229,10 @@ cmake -S . -B build
cd build && make -j8 cd build && make -j8
``` ```
We get an executable called `compiler`. Excellent! Here's the whole file: ### Updated Code
{{< codeblock "CMake" "compiler/04/CMakeLists.txt" >}} We've made a lot of changes to the codebase, and I've only shown snippets of the code
so far. If you'de like to see the whole codebase, you can go to my site's git repository
and check out [the code so far](https://dev.danilafe.com/Web-Projects/blog-static/src/branch/master/code/compiler/04).
Having taken this little break, it's time for our next push. We will define
how our programs will be evaluated in [Part 5 - Execution]({{< relref "05_compiler_execution.md" >}}).

View File

@@ -0,0 +1,614 @@
---
title: Compiling a Functional Language Using C++, Part 5 - Execution
date: 2019-08-06T14:26:38-07:00
tags: ["C and C++", "Functional Languages", "Compilers"]
---
{{< gmachine_css >}}
We now have trees representing valid programs in our language,
and it's time to think about how to compile them into machine code,
to be executed on hardware. But __how should we execute programs__?
The programs we define are actually lists of definitions. But
you can't evaluate definitions - they just tell you, well,
how things are defined. Expressions, on the other hand,
can be simplified. So, let's start by evaluating
the body of the function called `main`, similarly
to how C/C++ programs start.
Alright, we've made it past that hurdle. Next,
to figure out how to evaluate expressions. It's easy
enough with binary operators: `3+2*6` becomes `3+12`,
and `3+12` becomes `15`. Functions are when things
get interesting. Consider:
```
double (160+3)
```
There's many perfectly valid ways to evaluate the program.
When we get to a function application, we can first evaluate
the arguments, and then expand the function definition:
```
double (160+3)
double 163
163+163
326
```
Let's come up with a more interesting program to illustrate
execution. How about:
```
data Pair = { P Int Int }
defn fst p = {
case p of {
P x y -> { x }
}
}
defn snd p = {
case p of {
P x y -> { y }
}
}
defn slow x = { returns x after waiting for 1 second }
defn main = { fst (P (slow 320) (slow 6)) }
```
If we follow our rules for evaluating functions,
the execution will follow the following steps:
```
fst (P (slow 320) (slow 6))
fst (P 320 (slow 6)) <- after 1 second
fst (P 320 6) <- after 1 second
320
```
We waited for two seconds, even though we really only
needed to wait one. To avoid this, we could instead
define our function application to substitute in
the parameters of a function before evaluating them:
```
fst (P (slow 320) (slow 6))
(slow 320)
320 <- after 1 second
```
This seems good, until we try doubling an expression again:
```
double (slow 163)
(slow 163) + (slow 163)
163 + (slow 163) <- after 1 second
163 + 163 <- after 1 second
326
```
With ony one argument, we've actually spent two seconds on the
evaluation! If we instead tried to triple using addition,
we'd spend three seconds.
Observe that with these new rules (called "call by name" in programming language theory),
we only waste time because we evaluate an expression that was passed in more than 1 time.
What if we didn't have to do that? Since we have a functional language, there's no way
that two expressions that are the same evaluate to a different value. Thus,
once we know the result of an expression, we can replace all occurences of that expression
with the result:
```
double (slow 163)
(slow 163) + (slow 163)
163 + 163 <- after 1 second
326
```
We're back down to one second, and since we're still substituting parameters
before we evaluate them, we still only take one second.
Alright, this all sounds good. How do we go about implementing this?
Since we're substituting variables for whole expressions, we can't
just use values. Instead, because expressions are represented with trees,
we might as well consider operating on trees. When we evaluate a tree,
we can substitute it in-place with what it evaluates to. We'll do this
depth-first, replacing the children of a node with their reduced trees,
and then moving on to the parent.
There's only one problem with this: if we substitute a variable that occurs many times
with the same expression tree, we no longer have a tree! Trees, by definition,
have only one path from the root to any other node. Since we now have
many ways to reach that expression we substituted, we instead have a __graph__.
Indeed, the way we will be executing our functional code is called __graph reduction__.
### Building Graphs
Naively, we might consider creating a tree for each function at the beginning of our
program, and then, when that function is called, substituting the variables
in it with the parameters of the application. But that approach quickly goes out
the window when we realize that we could be applying a function
multiple times - in fact, an arbitrary number of times. This means we can't
have a single tree, and we must build a new tree every time we call a function.
The question, then, is: how do we construct a new graph? We could
reach into Plato's [Theory of Forms](https://en.wikipedia.org/wiki/Theory_of_forms) and
have a "reference" tree which we then copy every time we apply the function.
But how do you copy a tree? Copying a tree is usually a recursive function,
and __every__ time that we copy a tree, we'll have to look at each node
and decide whether or not to visit its children (or if it has any at all).
If we copy a tree 100 times, we will have to look at each "reference"
node 100 times. Since the reference tree doesn't change, __we'd
be following the exact same sequence of decisions 100 times__. That's
no good!
An alternative approach, one that we'll use from now on, is to instead
convert each function's expression tree into a sequence of instructions
that you can follow to build an identical tree. Every time we have
to apply a function, we'll follow the corresponding recipe for
that function, and end up with a new tree that we continue evaluating.
### G-machine
"Instructions" is a very generic term. Specifically, we will be creating instructions
for a [G-machine](https://link.springer.com/chapter/10.1007/3-540-15975-4_50),
an abstract architecture which we will use to reduce our graphs. The G-machine
is stack-based - all operations push and pop items from a stack. The machine
will also have a "dump", which is a stack of stacks; this will help with
separating evaluation of various graphs.
We will follow the same notation as Simon Peyton Jones in
[his book](https://www.microsoft.com/en-us/research/wp-content/uploads/1992/01/student.pdf)
, which was my source of truth when implementing my compiler. The machine
will be executing instructions that we give it, and as such, it must have
an instruction queue, which we will reference as \\(i\\). We will write
\\(x:i\\) to mean "an instruction queue that starts with
an instruction x and ends with instructions \\(i\\)". A stack machine
obviously needs to have a stack - we will call it \\(s\\), and will
adopt a similar notation to the instruction queue: \\(a\_1, a\_2, a\_3 : s\\)
will mean "a stack with the top values \\(a\_1\\), \\(a\_2\\), and \\(a\_3\\),
and remaining instructions \\(s\\)". Finally, as we said, our stack
machine has a dump, which we will write as \\(d\\). On this dump,
we will push not only the current stack, but also the current
instructions that we are executing, so we may resume execution
later. We will write \\(\\langle i, s \\rangle : d\\) to mean
"a dump with instructions \\(i\\) and stack \\(s\\) on top,
followed by instructions and stacks in \\(d\\)".
There's one more thing the G-machine will have that we've not yet discussed at all,
and it's needed because of the following quip earlier in the post:
> When we evaluate a tree, we can substitute it in-place with what it evaluates to.
How can we substitute a value in place? Surely we won't iterate over the entire
tree and look for an occurence of the tree we evaluted. Rather, wouldn't it be
nice if we could update all references to a tree to be something else? Indeed,
we can achieve this effect by using __pointers__. I don't mean specifically
C/C++ pointers - I mean the more general concept of "an address in memory".
The G-machine has a __heap__, much like the heap of a C/C++ process. We
can create a tree node on the heap, and then get an __address__ of the node.
We then have trees use these addresses to link their child nodes.
If we want to replace a tree node with its reduced form, we keep
its address the same, but change the value on the heap.
This way, all trees that reference the node we change become updated,
without us having to change them - their child address remains the same,
but the child has now been updated. We represent the heap
using \\(h\\). We write \\(h[a : v]\\) to say "the address \\(a\\) points
to value \\(v\\) in the heap \\(h\\)". Now you also know why we used
the letter \\(a\\) when describing values on the stack - the stack contains
addresses of (or pointers to) tree nodes.
_Compiling Functional Languages: a tutorial_ also keeps another component
of the G-machine, the __global map__, which maps function names to addresses of nodes
that represent them. We'll stick with this, and call this global map \\(m\\).
Finally, let's talk about what kind of nodes our trees will be made of.
We don't have to include every node that we've defined as a subclass of
`ast` - some nodes we can compile to instructions, without having to build
them. We will also include nodes that we didn't need for to represent expressions.
Here's the list of nodes types we'll have:
* `NInt` - represents an integer.
* `NApp` - represents an application (has two children).
* `NGlobal` - represents a global function (like the `f` in `f x`).
* `NInd` - an "indrection" node that points to another node. This will help with "replacing" a node.
* `NData` - a "packed" node that will represent a constructor with all the arguments.
With these nodes in mind, let's try defining some instructions for the G-machine.
We start with instructions we'll use to assemble new version of function body trees as we discussed above.
First up is __PushInt__:
{{< gmachine "PushInt" >}}
{{< gmachine_inner "Before">}}
\( \text{PushInt} \; n : i \quad s \quad d \quad h \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "After" >}}
\( i \quad a : s \quad d \quad h[a : \text{NInt} \; n] \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "Description" >}}
Push an integer \(n\) onto the stack.
{{< /gmachine_inner >}}
{{< /gmachine >}}
Let's go through this. We start with an instruction queue
with `PushInt n` on top. We allocate a new `NInt` with the
number `n` on the heap at address \\(a\\). We then push
the address of the `NInt` node on top of the stack. Next,
__PushGlobal__:
{{< gmachine "PushGlobal" >}}
{{< gmachine_inner "Before">}}
\( \text{PushGlobal} \; f : i \quad s \quad d \quad h \quad m[f : a] \)
{{< /gmachine_inner >}}
{{< gmachine_inner "After" >}}
\( i \quad a : s \quad d \quad h \quad m[f : a] \)
{{< /gmachine_inner >}}
{{< gmachine_inner "Description" >}}
Push a global function \(f\) onto the stack.
{{< /gmachine_inner >}}
{{< /gmachine >}}
We don't allocate anything new on the heap for this one -
we already have a node for the global function. Next up,
__Push__:
{{< gmachine "Push" >}}
{{< gmachine_inner "Before">}}
\( \text{Push} \; n : i \quad a_0, a_1, ..., a_n : s \quad d \quad h \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "After" >}}
\( i \quad a_n, a_0, a_1, ..., a_n : s \quad d \quad h \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "Description" >}}
Push a value at offset \(n\) from the top of the stack onto the stack.
{{< /gmachine_inner >}}
{{< /gmachine >}}
We define this instruction to work if and only if there exists an address
on the stack at offset \\(n\\). We take the value at that offset, and
push it onto the stack again. This can be helpful for something like
`f x x`, where we use the same tree twice. Speaking of that - let's
define an instruction to combine two nodes into an application:
{{< gmachine "MkApp" >}}
{{< gmachine_inner "Before">}}
\( \text{MkApp} : i \quad a_0, a_1 : s \quad d \quad h \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "After" >}}
\( i \quad a : s \quad d \quad h[ a : \text{NApp} \; a_0 \; a_1] \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "Description" >}}
Apply a function at the top of the stack to a value after it.
{{< /gmachine_inner >}}
{{< /gmachine >}}
We pop two things off the stack: first, the thing we're applying, then
the thing we apply it to. We then create a new node on the heap
that is an `NApp` node, with its two children being the nodes we popped off.
Finally, we push it onto the stack.
Let's try use these instructions to get a feel for it. In
order to conserve space, let's use \\(\\text{G}\\) for PushGlobal,
\\(\\text{I}\\) for PushInt, and \\(\\text{A}\\) for PushApp.
Let's say we want to construct a graph for `double 326`. We'll
use the instructions \\(\\text{I} \; 326\\), \\(\\text{G} \; \\text{double}\\),
and \\(\\text{A}\\). Let's watch these instructions play out:
$$
\\begin{align}
[\\text{I} \; 326, \\text{G} \; \text{double}, \\text{A}] & \\quad s \\quad & d \\quad & h \\quad & m[\\text{double} : a\_d] \\\\\\
[\\text{G} \; \text{double}, \\text{A}] & \\quad a\_1 : s \\quad & d \\quad & h[a\_1 : \\text{NInt} \; 326] \\quad & m[\\text{double} : a\_d] \\\\\\
[\\text{A}] & \\quad a\_d, a\_1 : s \\quad & d \\quad & h[a\_1 : \\text{NInt} \; 326] \\quad & m[\\text{double} : a\_d] \\\\\\
[] & \\quad a\_2 : s \\quad & d \\quad & h[\\substack{\\begin{aligned}a\_1 & : \\text{NInt} \; 326 \\\\ a\_2 & : \\text{NApp} \; a\_d \; a\_1 \\end{aligned}}] \\quad & m[\\text{double} : a\_d] \\\\\\
\\end{align}
$$
How did we come up with these instructions? We'll answer this question with
more generality later, but let's take a look at this particular expression
right now. We know it's an application, so we'll be using MkApp eventually.
We also know that MkApp expects two values on top of the stack from
which to make an application. The node on top has to be the function, and the next
node is the value to be passed into that function. Since a stack is first-in-last-out,
for the function (`double`, in our case) to be on top, we need
to push it last. Thus, we push `double` first, then 326. Finally,
we call MkApp now that the stack is in the right state.
Having defined instructions to __build__ graphs, it's now time
to move on to instructions to __reduce__ graphs - after all,
we're performing graph reduction. A crucial instruction for the
G-machine is __Unwind__. What Unwind does depends on what
nodes are on the stack. Its name comes from how it behaves
when the top of the stack is an `NApp` node that is at
the top of a potentially long chain of applications: given
an application node, it pushes its left hand side onto the stack.
It then __continues to run Unwind__. This is effectively a while loop:
applications nodes continue to be expanded this way until the left
hand side of an application is finally something
that __isn't__ an application. Let's write this rule as follows:
{{< gmachine "Unwind-App" >}}
{{< gmachine_inner "Before">}}
\( \text{Unwind} : i \quad a : s \quad d \quad h[a : \text{NApp} \; a_0 \; a_1] \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "After" >}}
\( \text{Unwind} : i \quad a_0, a : s \quad d \quad h[ a : \text{NApp} \; a_0 \; a_1] \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "Description" >}}
Unwind an application by pushing its left node.
{{< /gmachine_inner >}}
{{< /gmachine >}}
Let's talk about what happens when Unwind hits a node that isn't an application. Of all nodes
we have described, `NGlobal` seems to be the most likely to be on top of the stack after
an application chain has finished unwinding. In this case we want to run the instructions
for building the referenced global function. Naturally, these instructions
may reference the arguments of the application. We can find the first argument
by looking at offset 1 on the stack, which will be an `NApp` node, and then going
to its right child. The same can be done for the second and third arguments, if
they exist. But this doesn't feel right - we don't want to constantly be looking
at the right child of a node on the stack. Instead, we replace each application
node on the stack with its right child. Once that's done, we run the actual
code for the global function:
{{< gmachine "Unwind-Global" >}}
{{< gmachine_inner "Before">}}
\( \text{Unwind} : i \quad a, a_0, a_1, ..., a_n : s \quad d \quad h[\substack{a : \text{NGlobal} \; n \; c \\ a_k : \text{NApp} \; a_{k-1} \; a_k'}] \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "After" >}}
\( c \quad a_0', a_1', ..., a_n', a_n : s \quad d \quad h[\substack{a : \text{NGlobal} \; n \; c \\ a_k : \text{NApp} \; a_{k-1} \; a_k'}] \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "Description" >}}
Call a global function.
{{< /gmachine_inner >}}
{{< /gmachine >}}
In this rule, we used a general rule for \\(a\_k\\), in which \\(k\\) is any number
between 0 and \\(n\\). We also expect the `NGlobal` node to contain two parameters,
\\(n\\) and \\(c\\). \\(n\\) is the arity of the function (the number of arguments
it expects), and \\(c\\) are the instructions to construct the function's tree.
The attentive reader will have noticed a catch: we kept \\(a\_n\\) on the stack!
This once again goes back to replacing a node in-place. \\(a\_n\\) is the address of the "root" of the
whole expression we're simplifying. Thus, to replace the value at this address, we need to keep
the address until we have something to replace it with.
There's one more thing that can be found at the leftmost end of a tree of applications: `NInd`.
We simply replace `NInd` with the node it points to, and resume Unwind:
{{< gmachine "Unwind-Ind" >}}
{{< gmachine_inner "Before">}}
\( \text{Unwind} : i \quad a : s \quad d \quad h[a : \text{NInd} \; a' ] \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "After" >}}
\( \text{Unwind} : i \quad a' : s \quad d \quad h[a : \text{NInd} \; a' ] \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "Description" >}}
Replace indirection node with its target.
{{< /gmachine_inner >}}
{{< /gmachine >}}
We've talked about replacing a node, and we've talked about indirection, but we
haven't yet an instruction to perform these actions. Let's do so now:
{{< gmachine "Update" >}}
{{< gmachine_inner "Before">}}
\( \text{Update} \; n : i \quad a,a_0,a_1,...a_n : s \quad d \quad h \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "After" >}}
\( i \quad a_0,a_1,...,a_n : s \quad d \quad h[a_n : \text{NInd} \; a ] \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "Description" >}}
Transform node at offset into an indirection.
{{< /gmachine_inner >}}
{{< /gmachine >}}
This instruction pops an address from the top of the stack, and replaces
a node at the given offset with an indirection to the popped node. After
we evaluate a function call, we will use `update` to make sure it's
not evaluated again.
Now, let's talk about data structures. We have mentioned an `NData` node,
but we've given no explanation of how it will work. Obviously, we need
to distinguish values of a type created by different constructors:
If we have a value of type `List`, it could have been created either
using `Nil` or `Cons`. Depending on which constructor was used to
create a value of a type, we might treat it differently. Furthermore,
it's not always possible to know what constructor was used to
create what value at compile time. So, we need a way to know,
at runtime, how the value was constructed. We do this using
a __tag__. A tag is an integer value that will be contained in
the `NData` node. We assign a tag number to each constructor,
and when we create a node with that constructor, we set
the node's tag accordingly. This way, we can easily
tell if a `List` value is a `Nil` or a `Cons`, or
if a `Tree` value is a `Node` or a `Leaf`.
To operate on `NData` nodes, we will need two primitive operations: __Pack__ and __Split__.
Pack will create an `NData` node with a tag from some number of nodes
on the stack. These nodes will be placed into a dynamically
allocated array:
{{< gmachine "Pack" >}}
{{< gmachine_inner "Before">}}
\( \text{Pack} \; t \; n : i \quad a_1,a_2,...a_n : s \quad d \quad h \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "After" >}}
\( i \quad a : s \quad d \quad h[a : \text{NData} \; t \; [a_1, a_2, ..., a_n] ] \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "Description" >}}
Pack \(n\) nodes from the stack into a node with tag \(t\).
{{< /gmachine_inner >}}
{{< /gmachine >}}
Split will do the opposite, by popping
of an `NData` node and moving the contents of its
array onto the stack:
{{< gmachine "Split" >}}
{{< gmachine_inner "Before">}}
\( \text{Split} : i \quad a : s \quad d \quad h[a : \text{NData} \; t \; [a_1, a_2, ..., a_n] ] \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "After" >}}
\( i \quad a_1, a_2, ...,a_n : s \quad d \quad h[a : \text{NData} \; t \; [a_1, a_2, ..., a_n] ] \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "Description" >}}
Unpack a data node on top of the stack.
{{< /gmachine_inner >}}
{{< /gmachine >}}
These two instructions are a good start, but we're missing something
fairly big: case analysis. After we've constructed a data type,
to perform operations on it, we want to figure out what
constructor and values which were used to create it. In order
to implement patterns and case expressions, we'll need another
instruction that's capable of making a decision based on
the tag of an `NData` node. We'll call this instruction __Jump__,
and define it to contain a mapping from tags to instructions
to be executed for a value of that tag. For instance,
if the constructor `Nil` has tag 0, and `Cons` has tag 1,
the mapping for the case expression of a length function
could be written as \\([0 \\rightarrow [\\text{PushInt} \; 0], 1 \\rightarrow [\\text{PushGlobal} \; \\text{length}, ...] ]\\).
Let's define the rule for it:
{{< gmachine "Jump" >}}
{{< gmachine_inner "Before">}}
\( \text{Jump} [..., t \rightarrow i_t, ...] : i \quad a : s \quad d \quad h[a : \text{NData} \; t \; as ] \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "After" >}}
\( i_t, i \quad a : s \quad d \quad h[a : \text{NData} \; t \; as ] \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "Description" >}}
Execute instructions corresponding to a tag.
{{< /gmachine_inner >}}
{{< /gmachine >}}
Alright, we've made it through the interesting instructions,
but there's still a few that are needed, but less shiny and cool.
For instance: imagine we've made a function call. As per the
rules for Unwind, we've placed the right hand sides of all applications
on the stack, and ran the instructions provided by the function,
creating a final graph. We then continue to reduce this final
graph. But we've left the function parameters on the stack!
This is untidy. We define a __Slide__ instruction,
which keeps the address at the top of the stack, but gets
rid of the next \\(n\\) addresses:
{{< gmachine "Slide" >}}
{{< gmachine_inner "Before">}}
\( \text{Slide} \; n : i \quad a_0, a_1, ..., a_n : s \quad d \quad h \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "After" >}}
\( i \quad a_0 : s \quad d \quad h \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "Description" >}}
Remove \(n\) addresses after the top from the stack.
{{< /gmachine_inner >}}
{{< /gmachine >}}
Just a few more. Next up, we observe that we have not
defined any way for our G-machine to perform arithmetic,
or indeed, any primitive operations. Since we've
not defined any built-in type for booleans,
let's avoid talking about operations like `<`, `==`,
and so on (in fact, we've omitted them from the grammar so far).
So instead, let's talk about the [closed](https://en.wikipedia.org/wiki/Closure_(mathematics)) operations,
namely `+`, `-`, `*`, and `/`. We'll define a special instruction for
them, called __BinOp__:
{{< gmachine "BinOp" >}}
{{< gmachine_inner "Before">}}
\( \text{BinOp} \; \text{op} : i \quad a_0, a_1 : s \quad d \quad h[\substack{a_0 : \text{NInt} \; n \\ a_1 : \text{NInt} \; m}] \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "After" >}}
\( i \quad a : s \quad d \quad h[\substack{a_0 : \text{NInt} \; n \\ a_1 : \text{NInt} \; m \\ a : \text{NInt} \; (\text{op} \; n \; m)}] \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "Description" >}}
Apply a binary operator on integers.
{{< /gmachine_inner >}}
{{< /gmachine >}}
Nothing should be particularly surprising here:
the instruction pops two integers off the stack, applies the given
binary operation to them, and places the result on the stack.
We're not yet done with primitive operations, though.
We have a lazy graph reduction machine, which means
something like the expression `3*(2+6)` might not
be a binary operator applied to two `NInt` nodes.
We keep around graphs until they __really__ need to
be reduced. So now we need an instruction to trigger
reducing a graph, to say, "we need this value now".
We call this instruction __Eval__. This is where
the dump finally comes in!
When we execute Eval, another graph becomes our "focus", and we switch
to a new stack. We obviously want to return from this once we've finished
evaluating what we "focused" on, so we must store the program state somewhere -
on the dump. Here's the rule:
{{< gmachine "Eval" >}}
{{< gmachine_inner "Before">}}
\( \text{Eval} : i \quad a : s \quad d \quad h \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "After" >}}
\( [\text{Unwind}] \quad [a] \quad \langle i, s\rangle : d \quad h \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "Description" >}}
Evaluate graph to its normal form.
{{< /gmachine_inner >}}
{{< /gmachine >}}
We store the current set of instructions and the current stack on the dump,
and start with only Unwind and the value we want to evaluate.
That does the job, but we're missing one thing - a way to return to
the state we placed onto the dump. To do this, we add __another__
rule to Unwind:
{{< gmachine "Unwind-Return" >}}
{{< gmachine_inner "Before">}}
\( \text{Unwind} : i \quad a : s \quad \langle i', s'\rangle : d \quad h[a : \text{NInt} \; n] \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "After" >}}
\( i' \quad a : s' \quad d \quad h[a : \text{NInt} \; n] \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "Description" >}}
Return from Eval instruction.
{{< /gmachine_inner >}}
{{< /gmachine >}}
Just a couple more special-purpose instructions, and we're done!
Sometimes, it's possible for a tree node to reference itself.
For instance, Haskell defines the
[fixpoint combinator](https://en.wikipedia.org/wiki/Fixed-point_combinator)
as follows:
```Haskell
fix f = let x = f x in x
```
In order to do this, an address that references a node must be present
while the node is being constructed. We define an instruction,
__Alloc__, which helps with that:
{{< gmachine "Alloc" >}}
{{< gmachine_inner "Before">}}
\( \text{Alloc} \; n : i \quad s \quad d \quad h \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "After" >}}
\( i \quad s \quad d \quad h[a_k : \text{NInd} \; \text{null}] \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "Description" >}}
Allocate indirection nodes.
{{< /gmachine_inner >}}
{{< /gmachine >}}
We can allocate an indirection on the stack, and call Update on it when
we've constructed a node. While we're constructing the tree, we can
refer to the indirection when a self-reference is required.
Lastly, we also define a Pop instruction, which just removes
some number of nodes from the stack. We want this because
calling Update at the end of a function modifies a node further up the stack,
leaving anything on top of the stack after that node as scratch work. We get
rid of that scratch work simply by popping it.
{{< gmachine "Pop" >}}
{{< gmachine_inner "Before">}}
\( \text{Pop} \; n : i \quad a_1, a_2, ..., a_n : s \quad d \quad h \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "After" >}}
\( i \quad s \quad d \quad h \quad m \)
{{< /gmachine_inner >}}
{{< gmachine_inner "Description" >}}
Pop \(n\) nodes from the stack.
{{< /gmachine_inner >}}
{{< /gmachine >}}
That's it for the instructions. Knowing them, however, doesn't
tell us what to do with our `ast` structs. We'll need to define
rules to translate trees into these instructions, and I've already
alluded to this when we went over `double 326`.
However, this has already gotten pretty long,
so we'll do it in the next post: [Part 6 - Compilation]({{< relref "06_compiler_compilation.md" >}}).

Some files were not shown because too many files have changed in this diff Show More