Fix some typos
This commit is contained in:
parent
654239e29f
commit
2cce2859bb
|
@ -7,7 +7,7 @@ draft: true
|
||||||
During my last academic term, I was enrolled in a compilers course.
|
During my last academic term, I was enrolled in a compilers course.
|
||||||
We had a final project - develop a compiler for a basic Python subset,
|
We had a final project - develop a compiler for a basic Python subset,
|
||||||
using LLVM. It was a little boring - virtually nothing about the compiler
|
using LLVM. It was a little boring - virtually nothing about the compiler
|
||||||
was __not__ covered in class, and it felt more like putting two puzzles
|
was __not__ covered in class, and it felt more like putting two puzzle
|
||||||
pieces together than building a real project.
|
pieces together than building a real project.
|
||||||
|
|
||||||
Instead, I chose to implement a compiler for a functional programming language,
|
Instead, I chose to implement a compiler for a functional programming language,
|
||||||
|
|
|
@ -48,7 +48,7 @@ are fairly simple - one or more digits is an integer, a few letters together
|
||||||
are a variable name. In order to be able to efficiently break text up into
|
are a variable name. In order to be able to efficiently break text up into
|
||||||
such tokens, we restrict ourselves to __regular languages__. A language
|
such tokens, we restrict ourselves to __regular languages__. A language
|
||||||
is defined as a set of strings (potentially infinite), and a regular
|
is defined as a set of strings (potentially infinite), and a regular
|
||||||
language for which we can write a __regular expression__ to check if
|
language is one for which we can write a __regular expression__ to check if
|
||||||
a string is in the set. Regular expressions are a way of representing
|
a string is in the set. Regular expressions are a way of representing
|
||||||
patterns that a string has to match. We define regular expressions
|
patterns that a string has to match. We define regular expressions
|
||||||
as follows:
|
as follows:
|
||||||
|
@ -77,7 +77,7 @@ Let's see some examples. An integer, such as 326, can be represented with \\([0-
|
||||||
This means, one or more characters between 0 or 9. Some (most) regex implementations
|
This means, one or more characters between 0 or 9. Some (most) regex implementations
|
||||||
have a special symbol for \\([0-9]\\), written as \\(\\setminus d\\). A variable,
|
have a special symbol for \\([0-9]\\), written as \\(\\setminus d\\). A variable,
|
||||||
starting with a lowercase letter and containing lowercase or uppercase letters after it,
|
starting with a lowercase letter and containing lowercase or uppercase letters after it,
|
||||||
can be written as \\(\[a-z\]([a-z]+)?\\). Again, most regex implementations provide
|
can be written as \\(\[a-z\]([a-zA-Z]+)?\\). Again, most regex implementations provide
|
||||||
a special operator for \\((r_1+)?\\), written as \\(r_1*\\).
|
a special operator for \\((r_1+)?\\), written as \\(r_1*\\).
|
||||||
|
|
||||||
So how does one go about checking if a regular expression matches a string? An efficient way is to
|
So how does one go about checking if a regular expression matches a string? An efficient way is to
|
||||||
|
@ -115,8 +115,8 @@ represent numbers directly into numbers, and do other small tasks.
|
||||||
|
|
||||||
So, what tokens do we have? From our arithmetic definition, we see that we have integers.
|
So, what tokens do we have? From our arithmetic definition, we see that we have integers.
|
||||||
Let's use the regex `[0-9]+` for those. We also have the operators `+`, `-`, `*`, and `/`.
|
Let's use the regex `[0-9]+` for those. We also have the operators `+`, `-`, `*`, and `/`.
|
||||||
`-` is simple enough: the corresponding regex is `-`. We need to
|
The regex for `-` is simple enough: it's just `-`. However, we need to
|
||||||
preface our `/`, `+` and `*` with a backslash, though, since they happen to also be modifiers
|
preface our `/`, `+` and `*` with a backslash, since they happen to also be modifiers
|
||||||
in flex's regular expressions: `\/`, `\+`, `\*`.
|
in flex's regular expressions: `\/`, `\+`, `\*`.
|
||||||
|
|
||||||
Let's also represent some reserved keywords. We'll say that `defn`, `data`, `case`, and `of`
|
Let's also represent some reserved keywords. We'll say that `defn`, `data`, `case`, and `of`
|
||||||
|
|
|
@ -38,7 +38,7 @@ $$
|
||||||
In practice, there are many ways of using a CFG to parse a programming language. Various parsing algorithms support various subsets
|
In practice, there are many ways of using a CFG to parse a programming language. Various parsing algorithms support various subsets
|
||||||
of context free languages. For instance, top down parsers follow nearly exactly the structure that we had. They try to parse
|
of context free languages. For instance, top down parsers follow nearly exactly the structure that we had. They try to parse
|
||||||
a nonterminal by trying to match each symbol in its body. In the rule \\(S \\rightarrow \\alpha \\beta \\gamma\\), it will
|
a nonterminal by trying to match each symbol in its body. In the rule \\(S \\rightarrow \\alpha \\beta \\gamma\\), it will
|
||||||
first try to match \\(alpha\\), then \\(beta\\), and so on. If one of the three contains a nonterminal, it will attempt to parse
|
first try to match \\(\\alpha\\), then \\(\\beta\\), and so on. If one of the three contains a nonterminal, it will attempt to parse
|
||||||
that nonterminal following the same strategy. However, this leaves a flaw - For instance, consider the grammar
|
that nonterminal following the same strategy. However, this leaves a flaw - For instance, consider the grammar
|
||||||
$$
|
$$
|
||||||
\\begin{align}
|
\\begin{align}
|
||||||
|
@ -105,7 +105,7 @@ A\_{add} & \\rightarrow A\_{add}-A\_{mult} \\\\\\
|
||||||
A\_{add} & \\rightarrow A\_{mult}
|
A\_{add} & \\rightarrow A\_{mult}
|
||||||
\\end{align}
|
\\end{align}
|
||||||
$$
|
$$
|
||||||
The first rule matches another addition, added to the result of another addition. We use the addition in the body
|
The first rule matches another addition, added to the result of a multiplication. Similarly, the second rule matches another addition, from which the result of a multiplication is then subtracted. We use the \\(A\_{add}\\) on the left side of \\(+\\) and \\(-\\) in the body
|
||||||
because we want to be able to parse strings like `1+2+3+4`, which we want to view as `((1+2)+3)+4` (mostly because
|
because we want to be able to parse strings like `1+2+3+4`, which we want to view as `((1+2)+3)+4` (mostly because
|
||||||
subtraction is [left-associative](https://en.wikipedia.org/wiki/Operator_associativity)). So, we want the top level
|
subtraction is [left-associative](https://en.wikipedia.org/wiki/Operator_associativity)). So, we want the top level
|
||||||
of the tree to be the rightmost `+` or `-`, since that means it will be the "last" operation. You may be asking,
|
of the tree to be the rightmost `+` or `-`, since that means it will be the "last" operation. You may be asking,
|
||||||
|
@ -150,7 +150,7 @@ What's the last \\(C\\)? We also want a "thing" to be a case expression. Here ar
|
||||||
$$
|
$$
|
||||||
\\begin{align}
|
\\begin{align}
|
||||||
C & \\rightarrow \\text{case} \\; A\_{add} \\; \\text{of} \\; \\{ L\_B\\} \\\\\\
|
C & \\rightarrow \\text{case} \\; A\_{add} \\; \\text{of} \\; \\{ L\_B\\} \\\\\\
|
||||||
L\_B & \\rightarrow R \\; , \\; L\_B \\\\\\
|
L\_B & \\rightarrow R \\; L\_B \\\\\\
|
||||||
L\_B & \\rightarrow R \\\\\\
|
L\_B & \\rightarrow R \\\\\\
|
||||||
R & \\rightarrow N \\; \\text{arrow} \\; \\{ A\_{add} \\} \\\\\\
|
R & \\rightarrow N \\; \\text{arrow} \\; \\{ A\_{add} \\} \\\\\\
|
||||||
N & \\rightarrow \\text{lowerVar} \\\\\\
|
N & \\rightarrow \\text{lowerVar} \\\\\\
|
||||||
|
|
Loading…
Reference in New Issue
Block a user