Add complete assignment description and starter code.

This commit is contained in:
Rob Hess 2019-05-14 12:00:45 -07:00
commit ee407b6450
16 changed files with 798 additions and 0 deletions

5
.gitignore vendored Normal file
View File

@ -0,0 +1,5 @@
parse
scanner.cpp
parser.cpp
parser.hpp
parser.output

13
Makefile Normal file
View File

@ -0,0 +1,13 @@
all: parse
parser.cpp parser.hpp: parser.y
bison -d -o parser.cpp parser.y
scanner.cpp: scanner.l
flex -o scanner.cpp scanner.l
parse: main.cpp parser.cpp scanner.cpp
g++ main.cpp parser.cpp scanner.cpp -o parse
clean:
rm -f parse scanner.cpp parser.cpp parser.hpp

93
README.md Normal file
View File

@ -0,0 +1,93 @@
# Assignment 3
**Due by 11:59pm on Monday, 5/27/2019**
**Demo due by 11:59pm on Monday, 6/10/2019**
In this assignment, we'll work on generating an intermediate representation for a source program, to be passed on to later phases of the compiler. Specifically, we'll modify the Python parser from assignment 2 to generate an abstract syntax tree (AST) representing the source program. In addition, we'll take a first stab at generating code using this AST by generating a GraphViz specification of the AST itself. This will enable us to visualize the AST.
There are a few major parts to this assignment, described below. To get you started, you are provided with a Flex scanner specification in `scanner.l` and a Bison parser specification in `parser.y` that, together with the `main()` function in `main.cpp`, solve the problem defined in assignment 2. There is also a makefile that specifies compilation for the parser. Instead of using these files, you may also start with your own solution to assignment 2, if you'd like.
## 1. Implement one or more data structures for building an AST
An abstract syntax tree (AST) is a tree-based representation of the source program in which each node in the tree represents a coherent semantic construct in the source program. An AST closely resembles a parse tree but is more compact because it eliminates or contracts many of the nodes corresponding to nonterminals. For example, an entire if statement might be represented by a single node in an AST instead of by separate nodes for each of its individual components, as you'd see in a typical parse tree. You can see visualizations of example ASTs in the `example_output/` directory.
Your first task in this assignment is to implement a set of one or more data structures that allow you to construct an AST for a source program written in the same subset of Python we worked with for assignment 2. To do this, you'll have to figure out what specific language constructs need to be represented by AST nodes, what data might be associated with each of these nodes, and how each type of node connects to other nodes in the AST.
There are different approaches to this problem. For example, you might implement a single, general purpose C++ class or C structure, capable of representing all programmatic constructs in the source language. Then, each time your parser encountered any kind of program construct, it would create a new object of this class/structure, add the appropriate data to represent the recognized construct, and pass the newly-created node up in the parse tree to be combined with higher-level nodes.
Alternatively, you might implement separate classes/structures to represent the various different kinds of programmatic constructs in the source language. For example, you might have a C++ class or C structure to represent all statements, one to represent all expressions, etc. Each of these might have further specializations. For example, you could implement a C++ class to represent all statements, and then use inheritance to implement derived classes representing assignment statements, if/elif/else statements, while statements, etc.
An implementation taking this latter approach might, for example, include a C++ class or C structure to represent an AST node corresponding to a binary expression (e.g. `expr OP expr`) in the source language. Your class/structure might contain a field to represent the specific operator associated with the expression, and it might contain two pointers to other AST nodes, one representing the left-hand side of the binary expression and another representing the right-hand side of the binary expression. These LHS and RHS nodes might have additional children, or they could represent identifiers, floats, integers, etc. that have no children. If this implementation also had classes/structures for nodes representing higher-level language constructs, such as assignment statements, the binary expression node could be a child of one of these higher-level nodes.
## 2. Modify the parser to use your data structures to build an AST
Your next task is to modify the parser included in the starter code (or your own parser from assignment 2) to build an AST using the data structures you defined above. The general idea here is to modify the actions associated with each of the parser's grammar rules to return an AST node instead of returning a string of C++ code. This node will then potentially be combined with other nodes when a higher-level construct is recognized.
Though the end goal here is different from the foal of assignment 2, the mechanics will be similar to generating a C++ translation. In particular, to generate a string containing C++ translation for the language construct on the left-hand side of a particular grammar production, you assumed that you had C++ strings for the language constructs on the right-hand side of the production (specifically in `$1`, `$2`, `$3`, etc.), and you concatenated these together (along with proper punctuation, etc.) to form the left-hand side's C++ string (i.e. `$$`).
Similarly, when building an AST node for a language construct on the left-hand side of a grammar production, you should assume that you have AST nodes for the relevant language constructs on the right-hand side of the production (in the `$n`'s), and you can use these to generate the node for the production's left-hand side construct (i.e. `$$`). For example, you could pass the `$n`'s to a function or class constructor that generates an AST node for the production's left-hand-side language construct and then assign that generated node to `$$`.
You'll have to do a few other things in the parser to get this all working, as well. For example, the current parser uses the type `std::string*` for all grammar symbols. You'll instead have to use the appropriate AST node class/structure as the type for each nonterminal symbol. You'll likely want to continue to represent all terminals with `std::string*`, since the values for the terminals will come from the scanner, and you don't necessarily want your scanner to worry about building AST nodes. The scanner may or may not need minor changes to make everything work.
Once you have the parser building your AST, save the root node of the AST in a global variable, similar to the way the current parser provided in the starter code saves the entire translated program string in a global variable. If you'd like, you can modify the `main()` function to print out some information about the AST using this global variable.
## 3. Use your AST to generate a GraphViz specification to visualize the AST
Finally, to get practice generating code from an AST, implement functionality to use your AST to generate its own [GraphViz](http://www.graphviz.org/) specification. You should specifically write a specification that can be passed to the [`dot`](https://graphviz.gitlab.io/_pages/pdf/dotguide.pdf) program (which is installed on the ENGR servers) to generate a visualization of the AST.
GraphViz uses a very simple notation for representing trees and graphs. Here's an example of a simple GraphViz tree specification:
```
digraph G {
a [label="Node A"];
b [label="Node B"];
c [label="Node C"];
d [label="Node D"];
e [label="Node E"];
f [label="Node F"];
g [label="Node G"];
h [label="Node H"];
a -> b;
a -> c;
a -> d;
c -> e;
d -> f;
e -> g;
e -> h;
}
```
Here, we create a new directed graph (i.e. a "digraph") in which each node is first declared and assigned a label (i.e. a string to be printed inside the node in the visualization of the tree). Following the node declarations, all (directed) edges in the tree are represented using the `->` operator.
Assuming this file was named `tree.gv`, we could use it to generate a PNG image visualizing the specified tree with the following command:
```
dot -Tpng -otree.png tree.gv
```
This is the image that would be produced (in `tree.png`):
![Tree visualization](example_output/tree.png)
The GraphViz specification is flexible, and nodes and edges can be defined in any order, with edges even possibly appearing before the corresponding nodes are declared. One important constraint on the GraphViz specification is that each node must have a unique identifier. More info on additional visualization options is available in the documentation linked above.
To generate a GraphViz specification for your AST, you should write one or more functions that generate GraphViz code for each of your AST node classes/structures (e.g. one function per class/structure). Each of these functions should generate the relevant GraphViz code for the node itself (e.g. a label for the node indicating the language construct it represents and specifications of the edges to the node's children). They should then recursively call other functions to generate GraphViz code for each of the node's children. The practice of recursively traversing the AST in this way to generate GraphViz code closely resembles the way assembly code is generated from an AST, e.g. using LLVM.
Two primary design considerations when writing your traversal/code generation functions will be how to ensure that each AST node is given a unique identifier in the GraphViz specification and how to concatenate together all of the GraphViz code produced for the individual AST nodes.
You should use your code generation functions by invoking them on the root node of the AST from your `main()` function, and you should output the generated GraphViz code to stdout, so it can be inspected or saved to a file.
## Testing your code
There are some simple Python programs you may use for testing your AST builder included in the `testing_code/` directory. Example outputs (both a `.png` visualization and a `.gv` GraphView specification) for these programs are included in the `example_output/` directory. Note that the ASTs your parser generates may be slightly different than the ones included here, depending on how you choose to represent nodes in the AST. This is OK.
## Submission
We'll be using GitHub Classroom for this assignment, and you will submit your assignment via GitHub. Make sure your completed files are committed and pushed by the assignment's deadline to the master branch of the GitHub repo that was created for you by GitHub Classroom. A good way to check whether your files are safely submitted is to look at the master branch your assignment repo on the github.com website (i.e. https://github.com/osu-cs480-sp19/assignment-3-YourGitHubUsername/). If your changes show up there, you can consider your files submitted.
## Grading criteria
The TAs will grade your assignment by compiling and running it on one of the ENGR servers, e.g. `flip.engr.oregonstate.edu`, so you should make sure your code works as expected there. `bison` and `flex` are installed on the ENGR servers (as is `dot`). If your code does not compile and run on the ENGR servers, the TAs will deduct at least 25 points from your score.
This assignment is worth 100 points total, broken down as follows:
* 25 points: a set of classes/structures is defined for representing an AST of the relevant subset of Python
* 50 points: the parser/scanner are modified to build an AST from the source program using these classes/structures
* 25 points: your compiler outputs a GraphViz specification of the constructed AST

87
example_output/p1.gv Normal file
View File

@ -0,0 +1,87 @@
digraph G {
n0 [label="Block"];
n0 -> n0_0;
n0_0 [label="Assignment"];
n0_0 -> n0_0_lhs;
n0_0_lhs [shape=box,label="Identifier: pi"];
n0_0 -> n0_0_rhs;
n0_0_rhs [shape=box,label="Float: 3.1415"];
n0 -> n0_1;
n0_1 [label="Assignment"];
n0_1 -> n0_1_lhs;
n0_1_lhs [shape=box,label="Identifier: r"];
n0_1 -> n0_1_rhs;
n0_1_rhs [shape=box,label="Float: 8"];
n0 -> n0_2;
n0_2 [label="Assignment"];
n0_2 -> n0_2_lhs;
n0_2_lhs [shape=box,label="Identifier: circle_area"];
n0_2 -> n0_2_rhs;
n0_2_rhs [label="TIMES"];
n0_2_rhs -> n0_2_rhs_lhs;
n0_2_rhs_lhs [label="TIMES"];
n0_2_rhs_lhs -> n0_2_rhs_lhs_lhs;
n0_2_rhs_lhs_lhs [shape=box,label="Identifier: pi"];
n0_2_rhs_lhs -> n0_2_rhs_lhs_rhs;
n0_2_rhs_lhs_rhs [shape=box,label="Identifier: r"];
n0_2_rhs -> n0_2_rhs_rhs;
n0_2_rhs_rhs [shape=box,label="Identifier: r"];
n0 -> n0_3;
n0_3 [label="Assignment"];
n0_3 -> n0_3_lhs;
n0_3_lhs [shape=box,label="Identifier: circle_circum"];
n0_3 -> n0_3_rhs;
n0_3_rhs [label="TIMES"];
n0_3_rhs -> n0_3_rhs_lhs;
n0_3_rhs_lhs [label="TIMES"];
n0_3_rhs_lhs -> n0_3_rhs_lhs_lhs;
n0_3_rhs_lhs_lhs [shape=box,label="Identifier: pi"];
n0_3_rhs_lhs -> n0_3_rhs_lhs_rhs;
n0_3_rhs_lhs_rhs [shape=box,label="Integer: 2"];
n0_3_rhs -> n0_3_rhs_rhs;
n0_3_rhs_rhs [shape=box,label="Identifier: r"];
n0 -> n0_4;
n0_4 [label="Assignment"];
n0_4 -> n0_4_lhs;
n0_4_lhs [shape=box,label="Identifier: sphere_vol"];
n0_4 -> n0_4_rhs;
n0_4_rhs [label="TIMES"];
n0_4_rhs -> n0_4_rhs_lhs;
n0_4_rhs_lhs [label="TIMES"];
n0_4_rhs_lhs -> n0_4_rhs_lhs_lhs;
n0_4_rhs_lhs_lhs [label="TIMES"];
n0_4_rhs_lhs_lhs -> n0_4_rhs_lhs_lhs_lhs;
n0_4_rhs_lhs_lhs_lhs [label="TIMES"];
n0_4_rhs_lhs_lhs_lhs -> n0_4_rhs_lhs_lhs_lhs_lhs;
n0_4_rhs_lhs_lhs_lhs_lhs [label="DIVIDEDBY"];
n0_4_rhs_lhs_lhs_lhs_lhs -> n0_4_rhs_lhs_lhs_lhs_lhs_lhs;
n0_4_rhs_lhs_lhs_lhs_lhs_lhs [shape=box,label="Float: 4"];
n0_4_rhs_lhs_lhs_lhs_lhs -> n0_4_rhs_lhs_lhs_lhs_lhs_rhs;
n0_4_rhs_lhs_lhs_lhs_lhs_rhs [shape=box,label="Float: 3"];
n0_4_rhs_lhs_lhs_lhs -> n0_4_rhs_lhs_lhs_lhs_rhs;
n0_4_rhs_lhs_lhs_lhs_rhs [shape=box,label="Identifier: pi"];
n0_4_rhs_lhs_lhs -> n0_4_rhs_lhs_lhs_rhs;
n0_4_rhs_lhs_lhs_rhs [shape=box,label="Identifier: r"];
n0_4_rhs_lhs -> n0_4_rhs_lhs_rhs;
n0_4_rhs_lhs_rhs [shape=box,label="Identifier: r"];
n0_4_rhs -> n0_4_rhs_rhs;
n0_4_rhs_rhs [shape=box,label="Identifier: r"];
n0 -> n0_5;
n0_5 [label="Assignment"];
n0_5 -> n0_5_lhs;
n0_5_lhs [shape=box,label="Identifier: sphere_surf_area"];
n0_5 -> n0_5_rhs;
n0_5_rhs [label="TIMES"];
n0_5_rhs -> n0_5_rhs_lhs;
n0_5_rhs_lhs [label="TIMES"];
n0_5_rhs_lhs -> n0_5_rhs_lhs_lhs;
n0_5_rhs_lhs_lhs [label="TIMES"];
n0_5_rhs_lhs_lhs -> n0_5_rhs_lhs_lhs_lhs;
n0_5_rhs_lhs_lhs_lhs [shape=box,label="Integer: 4"];
n0_5_rhs_lhs_lhs -> n0_5_rhs_lhs_lhs_rhs;
n0_5_rhs_lhs_lhs_rhs [shape=box,label="Identifier: pi"];
n0_5_rhs_lhs -> n0_5_rhs_lhs_rhs;
n0_5_rhs_lhs_rhs [shape=box,label="Identifier: r"];
n0_5_rhs -> n0_5_rhs_rhs;
n0_5_rhs_rhs [shape=box,label="Identifier: r"];
}

BIN
example_output/p1.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 105 KiB

87
example_output/p2.gv Normal file
View File

@ -0,0 +1,87 @@
digraph G {
n0 [label="Block"];
n0 -> n0_0;
n0_0 [label="Assignment"];
n0_0 -> n0_0_lhs;
n0_0_lhs [shape=box,label="Identifier: a"];
n0_0 -> n0_0_rhs;
n0_0_rhs [shape=box,label="Boolean: 1"];
n0 -> n0_1;
n0_1 [label="Assignment"];
n0_1 -> n0_1_lhs;
n0_1_lhs [shape=box,label="Identifier: b"];
n0_1 -> n0_1_rhs;
n0_1_rhs [shape=box,label="Boolean: 0"];
n0 -> n0_2;
n0_2 [label="Assignment"];
n0_2 -> n0_2_lhs;
n0_2_lhs [shape=box,label="Identifier: x"];
n0_2 -> n0_2_rhs;
n0_2_rhs [shape=box,label="Integer: 7"];
n0 -> n0_3;
n0_3 [label="If"];
n0_3 -> n0_3_cond;
n0_3_cond [shape=box,label="Identifier: a"];
n0_3 -> n0_3_if;
n0_3_if [label="Block"];
n0_3_if -> n0_3_if_0;
n0_3_if_0 [label="Assignment"];
n0_3_if_0 -> n0_3_if_0_lhs;
n0_3_if_0_lhs [shape=box,label="Identifier: x"];
n0_3_if_0 -> n0_3_if_0_rhs;
n0_3_if_0_rhs [shape=box,label="Integer: 5"];
n0_3_if -> n0_3_if_1;
n0_3_if_1 [label="If"];
n0_3_if_1 -> n0_3_if_1_cond;
n0_3_if_1_cond [shape=box,label="Identifier: b"];
n0_3_if_1 -> n0_3_if_1_if;
n0_3_if_1_if [label="Block"];
n0_3_if_1_if -> n0_3_if_1_if_0;
n0_3_if_1_if_0 [label="Assignment"];
n0_3_if_1_if_0 -> n0_3_if_1_if_0_lhs;
n0_3_if_1_if_0_lhs [shape=box,label="Identifier: y"];
n0_3_if_1_if_0 -> n0_3_if_1_if_0_rhs;
n0_3_if_1_if_0_rhs [shape=box,label="Integer: 4"];
n0_3_if_1 -> n0_3_if_1_else;
n0_3_if_1_else [label="Block"];
n0_3_if_1_else -> n0_3_if_1_else_0;
n0_3_if_1_else_0 [label="Assignment"];
n0_3_if_1_else_0 -> n0_3_if_1_else_0_lhs;
n0_3_if_1_else_0_lhs [shape=box,label="Identifier: y"];
n0_3_if_1_else_0 -> n0_3_if_1_else_0_rhs;
n0_3_if_1_else_0_rhs [shape=box,label="Integer: 2"];
n0 -> n0_4;
n0_4 [label="Assignment"];
n0_4 -> n0_4_lhs;
n0_4_lhs [shape=box,label="Identifier: z"];
n0_4 -> n0_4_rhs;
n0_4_rhs [label="DIVIDEDBY"];
n0_4_rhs -> n0_4_rhs_lhs;
n0_4_rhs_lhs [label="TIMES"];
n0_4_rhs_lhs -> n0_4_rhs_lhs_lhs;
n0_4_rhs_lhs_lhs [label="TIMES"];
n0_4_rhs_lhs_lhs -> n0_4_rhs_lhs_lhs_lhs;
n0_4_rhs_lhs_lhs_lhs [shape=box,label="Identifier: x"];
n0_4_rhs_lhs_lhs -> n0_4_rhs_lhs_lhs_rhs;
n0_4_rhs_lhs_lhs_rhs [shape=box,label="Integer: 3"];
n0_4_rhs_lhs -> n0_4_rhs_lhs_rhs;
n0_4_rhs_lhs_rhs [shape=box,label="Integer: 7"];
n0_4_rhs -> n0_4_rhs_rhs;
n0_4_rhs_rhs [shape=box,label="Identifier: y"];
n0 -> n0_5;
n0_5 [label="If"];
n0_5 -> n0_5_cond;
n0_5_cond [label="GT"];
n0_5_cond -> n0_5_cond_lhs;
n0_5_cond_lhs [shape=box,label="Identifier: z"];
n0_5_cond -> n0_5_cond_rhs;
n0_5_cond_rhs [shape=box,label="Integer: 10"];
n0_5 -> n0_5_if;
n0_5_if [label="Block"];
n0_5_if -> n0_5_if_0;
n0_5_if_0 [label="Assignment"];
n0_5_if_0 -> n0_5_if_0_lhs;
n0_5_if_0_lhs [shape=box,label="Identifier: y"];
n0_5_if_0 -> n0_5_if_0_rhs;
n0_5_if_0_rhs [shape=box,label="Integer: 5"];
}

BIN
example_output/p2.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 110 KiB

83
example_output/p3.gv Normal file
View File

@ -0,0 +1,83 @@
digraph G {
n0 [label="Block"];
n0 -> n0_0;
n0_0 [label="Assignment"];
n0_0 -> n0_0_lhs;
n0_0_lhs [shape=box,label="Identifier: n"];
n0_0 -> n0_0_rhs;
n0_0_rhs [shape=box,label="Integer: 6"];
n0 -> n0_1;
n0_1 [label="Assignment"];
n0_1 -> n0_1_lhs;
n0_1_lhs [shape=box,label="Identifier: f0"];
n0_1 -> n0_1_rhs;
n0_1_rhs [shape=box,label="Integer: 0"];
n0 -> n0_2;
n0_2 [label="Assignment"];
n0_2 -> n0_2_lhs;
n0_2_lhs [shape=box,label="Identifier: f1"];
n0_2 -> n0_2_rhs;
n0_2_rhs [shape=box,label="Integer: 1"];
n0 -> n0_3;
n0_3 [label="Assignment"];
n0_3 -> n0_3_lhs;
n0_3_lhs [shape=box,label="Identifier: i"];
n0_3 -> n0_3_rhs;
n0_3_rhs [shape=box,label="Integer: 0"];
n0 -> n0_4;
n0_4 [label="While"];
n0_4 -> n0_4_cond;
n0_4_cond [shape=box,label="Boolean: 1"];
n0_4 -> n0_4_while;
n0_4_while [label="Block"];
n0_4_while -> n0_4_while_0;
n0_4_while_0 [label="Assignment"];
n0_4_while_0 -> n0_4_while_0_lhs;
n0_4_while_0_lhs [shape=box,label="Identifier: fi"];
n0_4_while_0 -> n0_4_while_0_rhs;
n0_4_while_0_rhs [label="PLUS"];
n0_4_while_0_rhs -> n0_4_while_0_rhs_lhs;
n0_4_while_0_rhs_lhs [shape=box,label="Identifier: f0"];
n0_4_while_0_rhs -> n0_4_while_0_rhs_rhs;
n0_4_while_0_rhs_rhs [shape=box,label="Identifier: f1"];
n0_4_while -> n0_4_while_1;
n0_4_while_1 [label="Assignment"];
n0_4_while_1 -> n0_4_while_1_lhs;
n0_4_while_1_lhs [shape=box,label="Identifier: f0"];
n0_4_while_1 -> n0_4_while_1_rhs;
n0_4_while_1_rhs [shape=box,label="Identifier: f1"];
n0_4_while -> n0_4_while_2;
n0_4_while_2 [label="Assignment"];
n0_4_while_2 -> n0_4_while_2_lhs;
n0_4_while_2_lhs [shape=box,label="Identifier: f1"];
n0_4_while_2 -> n0_4_while_2_rhs;
n0_4_while_2_rhs [shape=box,label="Identifier: fi"];
n0_4_while -> n0_4_while_3;
n0_4_while_3 [label="Assignment"];
n0_4_while_3 -> n0_4_while_3_lhs;
n0_4_while_3_lhs [shape=box,label="Identifier: i"];
n0_4_while_3 -> n0_4_while_3_rhs;
n0_4_while_3_rhs [label="PLUS"];
n0_4_while_3_rhs -> n0_4_while_3_rhs_lhs;
n0_4_while_3_rhs_lhs [shape=box,label="Identifier: i"];
n0_4_while_3_rhs -> n0_4_while_3_rhs_rhs;
n0_4_while_3_rhs_rhs [shape=box,label="Integer: 1"];
n0_4_while -> n0_4_while_4;
n0_4_while_4 [label="If"];
n0_4_while_4 -> n0_4_while_4_cond;
n0_4_while_4_cond [label="GTE"];
n0_4_while_4_cond -> n0_4_while_4_cond_lhs;
n0_4_while_4_cond_lhs [shape=box,label="Identifier: i"];
n0_4_while_4_cond -> n0_4_while_4_cond_rhs;
n0_4_while_4_cond_rhs [shape=box,label="Identifier: n"];
n0_4_while_4 -> n0_4_while_4_if;
n0_4_while_4_if [label="Block"];
n0_4_while_4_if -> n0_4_while_4_if_0;
n0_4_while_4_if_0 [label="Break"];
n0 -> n0_5;
n0_5 [label="Assignment"];
n0_5 -> n0_5_lhs;
n0_5_lhs [shape=box,label="Identifier: f"];
n0_5 -> n0_5_rhs;
n0_5_rhs [shape=box,label="Identifier: f0"];
}

BIN
example_output/p3.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 98 KiB

BIN
example_output/tree.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

39
main.cpp Normal file
View File

@ -0,0 +1,39 @@
#include <iostream>
#include <set>
#include "parser.hpp"
extern int yylex();
extern std::string* target_program;
extern std::set<std::string> symbols;
int main() {
if (!yylex()) {
// Write initial C++ stuff.
std::cout << "#include <iostream>" << std::endl;
std::cout << "int main() {" << std::endl;
// Write declaractions for all variables.
std::set<std::string>::iterator it;
for (it = symbols.begin(); it != symbols.end(); it++) {
std::cout << "double " << *it << ";" << std::endl;
}
// Write the program itself.
std::cout << std::endl << "/* Begin program */" << std::endl << std::endl;
std::cout << *target_program << std::endl;
std::cout << "/* End program */" << std::endl << std::endl;
// Write print statements for all symbols.
for (it = symbols.begin(); it != symbols.end(); it++) {
std::cout << "std::cout << \"" << *it << ": \" << " << *it << " << std::endl;" << std::endl;
}
// Write terminating brace.
std::cout << "}" << std::endl;
delete target_program;
target_program = NULL;
}
}

166
parser.y Normal file
View File

@ -0,0 +1,166 @@
%{
#include <iostream>
#include <set>
#include "parser.hpp"
extern int yylex();
void yyerror(YYLTYPE* loc, const char* err);
std::string* translate_boolean_str(std::string* boolean_str);
/*
* Here, target_program is a string that will hold the target program being
* generated, and symbols is a simple symbol table.
*/
std::string* target_program;
std::set<std::string> symbols;
%}
/* Enable location tracking. */
%locations
/*
* All program constructs will be represented as strings, specifically as
* their corresponding C/C++ translation.
*/
%define api.value.type { std::string* }
/*
* Because the lexer can generate more than one token at a time (i.e. DEDENT
* tokens), we'll use a push parser.
*/
%define api.pure full
%define api.push-pull push
/*
* These are all of the terminals in our grammar, i.e. the syntactic
* categories that can be recognized by the lexer.
*/
%token IDENTIFIER
%token FLOAT INTEGER BOOLEAN
%token INDENT DEDENT NEWLINE
%token AND BREAK DEF ELIF ELSE FOR IF NOT OR RETURN WHILE
%token ASSIGN PLUS MINUS TIMES DIVIDEDBY
%token EQ NEQ GT GTE LT LTE
%token LPAREN RPAREN COMMA COLON
/*
* Here, we're defining the precedence of the operators. The ones that appear
* later have higher precedence. All of the operators are left-associative
* except the "not" operator, which is right-associative.
*/
%left OR
%left AND
%left PLUS MINUS
%left TIMES DIVIDEDBY
%left EQ NEQ GT GTE LT LTE
%right NOT
/* This is our goal/start symbol. */
%start program
%%
/*
* Each of the CFG rules below recognizes a particular program construct in
* Python and creates a new string containing the corresponding C/C++
* translation. Since we're allocating strings as we go, we also free them
* as we no longer need them. Specifically, each string is freed after it is
* combined into a larger string.
*/
program
: statements { target_program = $1; }
;
statements
: statement { $$ = $1; }
| statements statement { $$ = new std::string(*$1 + *$2); delete $1; delete $2; }
;
statement
: assign_statement { $$ = $1; }
| if_statement { $$ = $1; }
| while_statement { $$ = $1; }
| break_statement { $$ = $1; }
;
primary_expression
: IDENTIFIER { $$ = $1; }
| FLOAT { $$ = $1; }
| INTEGER { $$ = $1; }
| BOOLEAN { $$ = translate_boolean_str($1); delete $1; }
| LPAREN expression RPAREN { $$ = new std::string("(" + *$2 + ")"); delete $2; }
;
negated_expression
: NOT primary_expression { $$ = new std::string("!" + *$2); delete $2; }
;
expression
: primary_expression { $$ = $1; }
| negated_expression { $$ = $1; }
| expression PLUS expression { $$ = new std::string(*$1 + " + " + *$3); delete $1; delete $3; }
| expression MINUS expression { $$ = new std::string(*$1 + " - " + *$3); delete $1; delete $3; }
| expression TIMES expression { $$ = new std::string(*$1 + " * " + *$3); delete $1; delete $3; }
| expression DIVIDEDBY expression { $$ = new std::string(*$1 + " / " + *$3); delete $1; delete $3; }
| expression EQ expression { $$ = new std::string(*$1 + " == " + *$3); delete $1; delete $3; }
| expression NEQ expression { $$ = new std::string(*$1 + " != " + *$3); delete $1; delete $3; }
| expression GT expression { $$ = new std::string(*$1 + " > " + *$3); delete $1; delete $3; }
| expression GTE expression { $$ = new std::string(*$1 + " >= " + *$3); delete $1; delete $3; }
| expression LT expression { $$ = new std::string(*$1 + " < " + *$3); delete $1; delete $3; }
| expression LTE expression { $$ = new std::string(*$1 + " <= " + *$3); delete $1; delete $3; }
;
assign_statement
: IDENTIFIER ASSIGN expression NEWLINE { symbols.insert(*$1); $$ = new std::string(*$1 + " = " + *$3 + ";\n"); delete $1; delete $3; }
;
block
: INDENT statements DEDENT { $$ = new std::string("{\n" + *$2 + "}"); delete $2; }
;
condition
: expression { $$ = $1; }
| condition AND condition { $$ = new std::string(*$1 + " && " + *$3); delete $1; delete $3; }
| condition OR condition { $$ = new std::string(*$1 + " || " + *$3); delete $1; delete $3; }
;
if_statement
: IF condition COLON NEWLINE block elif_blocks else_block { $$ = new std::string("if (" + *$2 + ") " + *$5 + *$6 + *$7 + "\n"); delete $2; delete $5; delete $6; delete $7; }
;
elif_blocks
: %empty { $$ = new std::string(""); }
| elif_blocks ELIF condition COLON NEWLINE block { $$ = new std::string(*$1 + " else if (" + *$3 + ") " + *$6); delete $1; delete $3; delete $6; }
;
else_block
: %empty { $$ = new std::string(""); }
| ELSE COLON NEWLINE block { $$ = new std::string(" else " + *$4); delete $4; }
while_statement
: WHILE condition COLON NEWLINE block { $$ = new std::string("while (" + *$2 + ") " + *$5 + "\n"); delete $2; delete $5; }
;
break_statement
: BREAK NEWLINE { $$ = new std::string("break;\n"); }
;
%%
void yyerror(YYLTYPE* loc, const char* err) {
std::cerr << "Error (line " << loc->first_line << "): " << err << std::endl;
}
/*
* This function translates a Python boolean value into the corresponding
* C++ boolean value.
*/
std::string* translate_boolean_str(std::string* boolean_str) {
if (*boolean_str == "True") {
return new std::string("true");
} else {
return new std::string("false");
}
}

191
scanner.l Normal file
View File

@ -0,0 +1,191 @@
/*
* Lexer definition for simplified Python syntax.
*/
%{
#include <iostream>
#include <stack>
#include <cstdlib>
#include "parser.hpp"
/*
* We'll use this stack to keep track of indentation level, as described in
* the Python docs:
*
* https://docs.python.org/3/reference/lexical_analysis.html#indentation
*/
std::stack<int> _indent_stack;
%}
%option noyywrap
%option yylineno
%%
%{
/*
* These lines go at the top of the lexing function. We only want to
* initialize the indentation level stack once by pushing a 0 onto it (the
* indentation stack should never be empty, except immediately after it is
* created).
*/
if (_indent_stack.empty()) {
_indent_stack.push(0);
}
/*
* We also want to initialize a parser state to be sent to the parser on
* each push parse call.
*/
yypstate* pstate = yypstate_new();
YYSTYPE yylval;
YYLTYPE loc;
#define PUSH_TOKEN(token, text) do { \
yylval = text ? new std::string(text) : NULL; \
loc.first_line = loc.last_line = yylineno; \
int status = yypush_parse(pstate, token, &yylval, &loc); \
if (status != YYPUSH_MORE) { \
yypstate_delete(pstate); \
return status; \
} \
} while (0)
%}
^[ \t]*\r?\n { /* Skip blank lines */ }
^[ \t]*#.*\r?\n { /* Skip whole-line comments. */ }
#.*$ { /* Skip comments on the same line as a statement. */ }
^[ \t]+ {
/*
* Handle indentation as described in Python docs linked above.
* Note that this pattern treats leading spaces and leading tabs
* equivalently, which could cause some unexpected behavior if
* they're combined in a single line. For the purposes of this
* project, that's OK.
*/
if (_indent_stack.top() < yyleng) {
/*
* If the current indentation level is greater than the
* previous indentation level (stored at the top of the stack),
* then emit an INDENT and push the new indentation level onto
* the stack.
*/
_indent_stack.push(yyleng);
/* std::cout << "INDENT" << std::endl; */
PUSH_TOKEN(INDENT, NULL);
} else {
/*
* If the current indentation level is less than or equal to
* the previous indentation level, pop indentation levels off
* the stack until the top is equal to the current indentation
* level. Emit a DEDENT for each element popped from the stack.
*/
while (!_indent_stack.empty() && _indent_stack.top() != yyleng) {
_indent_stack.pop();
/* std::cout << "DEDENT" << std::endl; */
PUSH_TOKEN(DEDENT, NULL);
}
/*
* If we popped everythin g off the stack, that means the
* current indentation level didn't match any on the stack,
* which is an indentation error.
*/
if (_indent_stack.empty()) {
std::cerr << "Error: Incorrect indentation on line "
<< yylineno << std::endl;
return 1;
}
}
}
^[^ \t\n]+ {
/*
* If we find a line that's not indented, pop all indentation
* levels off the stack, and emit a DEDENT for each one. Then,
* call REJECT, so the next rule matching this token is also
* applied.
*/
while(_indent_stack.top() != 0) {
_indent_stack.pop();
/* std::cout << "DEDENT" << std::endl; */
PUSH_TOKEN(DEDENT, NULL);
}
REJECT;
}
\r?\n {
/* std::cout << "NEWLINE" << std::endl; */
PUSH_TOKEN(NEWLINE, NULL);
}
<<EOF>> {
/*
* If we reach the end of the file, pop all indentation levels
* off the stack, and emit a DEDENT for each one.
*/
while(_indent_stack.top() != 0) {
_indent_stack.pop();
/* std::cout << "DEDENT" << std::endl; */
PUSH_TOKEN(DEDENT, "");
}
int status = yypush_parse(pstate, 0, NULL, NULL);
yypstate_delete(pstate);
return status;
/* yyterminate(); */
}
[ \t] { /* Ignore spaces that haven't been handled above. */ }
"and" { PUSH_TOKEN(AND, NULL); }
"break" { PUSH_TOKEN(BREAK, NULL); }
"def" { PUSH_TOKEN(DEF, NULL); }
"elif" { PUSH_TOKEN(ELIF, NULL); }
"else" { PUSH_TOKEN(ELSE, NULL); }
"for" { PUSH_TOKEN(FOR, NULL); }
"if" { PUSH_TOKEN(IF, NULL); }
"not" { PUSH_TOKEN(NOT, NULL); }
"or" { PUSH_TOKEN(OR, NULL); }
"return" { PUSH_TOKEN(RETURN, NULL); }
"while" { PUSH_TOKEN(WHILE, NULL); }
"True" { PUSH_TOKEN(BOOLEAN, yytext); }
"False" { PUSH_TOKEN(BOOLEAN, yytext); }
[a-zA-Z_][a-zA-Z0-9_]* { PUSH_TOKEN(IDENTIFIER, yytext); }
-?[0-9]*"."[0-9]+ { PUSH_TOKEN(FLOAT, yytext); }
-?[0-9]+ { PUSH_TOKEN(INTEGER, yytext); }
"=" { PUSH_TOKEN(ASSIGN, NULL); }
"+" { PUSH_TOKEN(PLUS, NULL); }
"-" { PUSH_TOKEN(MINUS, NULL); }
"*" { PUSH_TOKEN(TIMES, NULL); }
"/" { PUSH_TOKEN(DIVIDEDBY, NULL); }
"==" { PUSH_TOKEN(EQ, NULL); }
"!=" { PUSH_TOKEN(NEQ, NULL); }
">" { PUSH_TOKEN(GT, NULL); }
">=" { PUSH_TOKEN(GTE, NULL); }
"<" { PUSH_TOKEN(LT, NULL); }
"<=" { PUSH_TOKEN(LTE, NULL); }
"(" { PUSH_TOKEN(LPAREN, NULL); }
")" { PUSH_TOKEN(RPAREN, NULL); }
"," { PUSH_TOKEN(COMMA, NULL); }
":" { PUSH_TOKEN(COLON, NULL); }
. {
std::cerr << "Unrecognized token on line " << yylineno << ": "
<< yytext << std::endl;
PUSH_TOKEN(yytext[0], NULL);
}
%%

6
testing_code/p1.py Normal file
View File

@ -0,0 +1,6 @@
pi = 3.1415
r = 8.0
circle_area = pi * r * r
circle_circum = pi * 2 * r
sphere_vol = (4.0 / 3.0) * pi * r * r * r
sphere_surf_area = 4 * pi * r * r

14
testing_code/p2.py Normal file
View File

@ -0,0 +1,14 @@
a = True
b = False
x = 7
if a:
x = 5
if b:
y = 4
else:
y = 2
z = (x * 3 * 7) / y
if z > 10:
y = 5

14
testing_code/p3.py Normal file
View File

@ -0,0 +1,14 @@
# This program computes and returns the n'th Fibonacci number.
n = 6
f0 = 0
f1 = 1
i = 0
while True:
fi = f0 + f1
f0 = f1
f1 = fi
i = i + 1
if i >= n:
break
f = f0