Fix some typos
This commit is contained in:
parent
654239e29f
commit
2cce2859bb
|
@ -7,7 +7,7 @@ draft: true
|
|||
During my last academic term, I was enrolled in a compilers course.
|
||||
We had a final project - develop a compiler for a basic Python subset,
|
||||
using LLVM. It was a little boring - virtually nothing about the compiler
|
||||
was __not__ covered in class, and it felt more like putting two puzzles
|
||||
was __not__ covered in class, and it felt more like putting two puzzle
|
||||
pieces together than building a real project.
|
||||
|
||||
Instead, I chose to implement a compiler for a functional programming language,
|
||||
|
|
|
@ -48,7 +48,7 @@ are fairly simple - one or more digits is an integer, a few letters together
|
|||
are a variable name. In order to be able to efficiently break text up into
|
||||
such tokens, we restrict ourselves to __regular languages__. A language
|
||||
is defined as a set of strings (potentially infinite), and a regular
|
||||
language for which we can write a __regular expression__ to check if
|
||||
language is one for which we can write a __regular expression__ to check if
|
||||
a string is in the set. Regular expressions are a way of representing
|
||||
patterns that a string has to match. We define regular expressions
|
||||
as follows:
|
||||
|
@ -77,7 +77,7 @@ Let's see some examples. An integer, such as 326, can be represented with \\([0-
|
|||
This means, one or more characters between 0 or 9. Some (most) regex implementations
|
||||
have a special symbol for \\([0-9]\\), written as \\(\\setminus d\\). A variable,
|
||||
starting with a lowercase letter and containing lowercase or uppercase letters after it,
|
||||
can be written as \\(\[a-z\]([a-z]+)?\\). Again, most regex implementations provide
|
||||
can be written as \\(\[a-z\]([a-zA-Z]+)?\\). Again, most regex implementations provide
|
||||
a special operator for \\((r_1+)?\\), written as \\(r_1*\\).
|
||||
|
||||
So how does one go about checking if a regular expression matches a string? An efficient way is to
|
||||
|
@ -115,8 +115,8 @@ represent numbers directly into numbers, and do other small tasks.
|
|||
|
||||
So, what tokens do we have? From our arithmetic definition, we see that we have integers.
|
||||
Let's use the regex `[0-9]+` for those. We also have the operators `+`, `-`, `*`, and `/`.
|
||||
`-` is simple enough: the corresponding regex is `-`. We need to
|
||||
preface our `/`, `+` and `*` with a backslash, though, since they happen to also be modifiers
|
||||
The regex for `-` is simple enough: it's just `-`. However, we need to
|
||||
preface our `/`, `+` and `*` with a backslash, since they happen to also be modifiers
|
||||
in flex's regular expressions: `\/`, `\+`, `\*`.
|
||||
|
||||
Let's also represent some reserved keywords. We'll say that `defn`, `data`, `case`, and `of`
|
||||
|
|
|
@ -38,7 +38,7 @@ $$
|
|||
In practice, there are many ways of using a CFG to parse a programming language. Various parsing algorithms support various subsets
|
||||
of context free languages. For instance, top down parsers follow nearly exactly the structure that we had. They try to parse
|
||||
a nonterminal by trying to match each symbol in its body. In the rule \\(S \\rightarrow \\alpha \\beta \\gamma\\), it will
|
||||
first try to match \\(alpha\\), then \\(beta\\), and so on. If one of the three contains a nonterminal, it will attempt to parse
|
||||
first try to match \\(\\alpha\\), then \\(\\beta\\), and so on. If one of the three contains a nonterminal, it will attempt to parse
|
||||
that nonterminal following the same strategy. However, this leaves a flaw - For instance, consider the grammar
|
||||
$$
|
||||
\\begin{align}
|
||||
|
@ -105,7 +105,7 @@ A\_{add} & \\rightarrow A\_{add}-A\_{mult} \\\\\\
|
|||
A\_{add} & \\rightarrow A\_{mult}
|
||||
\\end{align}
|
||||
$$
|
||||
The first rule matches another addition, added to the result of another addition. We use the addition in the body
|
||||
The first rule matches another addition, added to the result of a multiplication. Similarly, the second rule matches another addition, from which the result of a multiplication is then subtracted. We use the \\(A\_{add}\\) on the left side of \\(+\\) and \\(-\\) in the body
|
||||
because we want to be able to parse strings like `1+2+3+4`, which we want to view as `((1+2)+3)+4` (mostly because
|
||||
subtraction is [left-associative](https://en.wikipedia.org/wiki/Operator_associativity)). So, we want the top level
|
||||
of the tree to be the rightmost `+` or `-`, since that means it will be the "last" operation. You may be asking,
|
||||
|
@ -150,7 +150,7 @@ What's the last \\(C\\)? We also want a "thing" to be a case expression. Here ar
|
|||
$$
|
||||
\\begin{align}
|
||||
C & \\rightarrow \\text{case} \\; A\_{add} \\; \\text{of} \\; \\{ L\_B\\} \\\\\\
|
||||
L\_B & \\rightarrow R \\; , \\; L\_B \\\\\\
|
||||
L\_B & \\rightarrow R \\; L\_B \\\\\\
|
||||
L\_B & \\rightarrow R \\\\\\
|
||||
R & \\rightarrow N \\; \\text{arrow} \\; \\{ A\_{add} \\} \\\\\\
|
||||
N & \\rightarrow \\text{lowerVar} \\\\\\
|
||||
|
|
Loading…
Reference in New Issue
Block a user