Compare commits
No commits in common. "697f08323724c0773c55954c372bdf1ec154f5b1" and "40007c427eac9616609b936ba29f9afd5c8fac24" have entirely different histories.
697f083237
...
40007c427e
|
@ -88,12 +88,7 @@ I'd like to cover the following major topics, spending a couple of posts on each
|
||||||
which is not at all compiler-centric. We need to connect these two, showing
|
which is not at all compiler-centric. We need to connect these two, showing
|
||||||
that the CFGs we produce "make sense" for our language, and that given
|
that the CFGs we produce "make sense" for our language, and that given
|
||||||
CFGs that make sense, our analysis produces results that match the language's
|
CFGs that make sense, our analysis produces results that match the language's
|
||||||
execution. To do so, I write about the language and its semantics
|
execution.
|
||||||
in {{< draftlink "Part 5: Our Programming Language" "05_spa_agda_semantics" >}},
|
|
||||||
then about building control flow graphs for the language in
|
|
||||||
{{< draftlink "Part 6: Control Flow Graphs" "06_spa_agda_cfg" >}}.
|
|
||||||
I then write about combining these two representations in
|
|
||||||
{{< draftlink "Part 7: Connecting Semantics and Control Flow Graphs" "07_spa_agda_semantics_and_cfg" >}}.
|
|
||||||
|
|
||||||
|
|
||||||
### Navigation
|
### Navigation
|
||||||
|
@ -103,6 +98,3 @@ Here are the posts that I’ve written so far for this series:
|
||||||
* {{< draftlink "Combining Lattices" "02_spa_agda_combining_lattices" >}}
|
* {{< draftlink "Combining Lattices" "02_spa_agda_combining_lattices" >}}
|
||||||
* {{< draftlink "Lattices of Finite Height" "03_spa_agda_fixed_height" >}}
|
* {{< draftlink "Lattices of Finite Height" "03_spa_agda_fixed_height" >}}
|
||||||
* {{< draftlink "The Fixed-Point Algorithm" "04_spa_agda_fixedpoint" >}}
|
* {{< draftlink "The Fixed-Point Algorithm" "04_spa_agda_fixedpoint" >}}
|
||||||
* {{< draftlink "Our Programming Language" "05_spa_agda_semantics" >}}
|
|
||||||
* {{< draftlink "Control Flow Graphs" "06_spa_agda_cfg" >}}.
|
|
||||||
* {{< draftlink "Connecting Semantics and Control Flow Graphs" "07_spa_agda_semantics_and_cfg" >}}.
|
|
||||||
|
|
|
@ -1,199 +0,0 @@
|
||||||
---
|
|
||||||
title: "Implementing and Verifying \"Static Program Analysis\" in Agda, Part 5: Our Programming Language"
|
|
||||||
series: "Static Program Analysis in Agda"
|
|
||||||
description: "In this post, I define the language that well serve as the object of our vartious analyses"
|
|
||||||
date: 2024-08-10T17:37:43-07:00
|
|
||||||
tags: ["Agda", "Programming Languages"]
|
|
||||||
draft: true
|
|
||||||
---
|
|
||||||
|
|
||||||
In the previous several posts, I've formalized the notion of lattices, which
|
|
||||||
are an essential ingredient to formalizing the analyses in Anders Møller's
|
|
||||||
lecutre notes. However, there can be no program analysis without a program
|
|
||||||
to analyze! In this post, I will define the (very simple) language that we
|
|
||||||
will be analyzing. An essential aspect of the language is its
|
|
||||||
[semantics](https://en.wikipedia.org/wiki/Semantics_(computer_science), which
|
|
||||||
simply speaking explains what each feature of the language does. At the end
|
|
||||||
of the previous article, I gave the following _inference rule_ which defined
|
|
||||||
(partially) how the `if`-`else` statement in the language works.
|
|
||||||
|
|
||||||
{{< latex >}}
|
|
||||||
\frac{\rho_1, e \Downarrow z \quad \neg (z = 0) \quad \rho_1,s_1 \Downarrow \rho_2}
|
|
||||||
{\rho_1, \textbf{if}\ e\ \textbf{then}\ s_1\ \textbf{else}\ s_2\ \Downarrow\ \rho_2}
|
|
||||||
{{< /latex >}}
|
|
||||||
|
|
||||||
Like I mentioned then, this rule reads as follows:
|
|
||||||
|
|
||||||
> If the condition of an `if`-`else` statement evaluates to a nonzero value,
|
|
||||||
> then to evaluate the statement, you evaluate its `then` branch.
|
|
||||||
|
|
||||||
Another similar --- but crucially, not equivlalent -- rule is the following:
|
|
||||||
|
|
||||||
{{< latex >}}
|
|
||||||
\frac{\rho_1, e \Downarrow z \quad z = 1 \quad \rho_1,s_1 \Downarrow \rho_2}
|
|
||||||
{\rho_1, \textbf{if}\ e\ \textbf{then}\ s_1\ \textbf{else}\ s_2\ \Downarrow\ \rho_2}
|
|
||||||
{{< /latex >}}
|
|
||||||
|
|
||||||
This time, the English interpretation of the rule is as follows:
|
|
||||||
|
|
||||||
> If the condition of an `if`-`else` statement evaluates to one,
|
|
||||||
> then to evaluate the statement, you evaluate its `then` branch.
|
|
||||||
|
|
||||||
These rules are certainly not equivalent. For instance, the former allows
|
|
||||||
the "then" branch to be executed when the condition is `2` two; however, in
|
|
||||||
the latter, the value of the conditional must be `1`. If our analysis were
|
|
||||||
_flow-sensitive_ (our first few will not be), then this difference would change
|
|
||||||
its output when determining the signs of the following program:
|
|
||||||
|
|
||||||
```
|
|
||||||
x = 2
|
|
||||||
if x {
|
|
||||||
y = - 1
|
|
||||||
} else {
|
|
||||||
y = 1
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
Using the first, more "relaxed" rule, the condition would be considered "true",
|
|
||||||
and the sign of `y` would be `-`. On the other hand, using the second,
|
|
||||||
"stricter" rule, the sign of `y` would be `+`. I stress that in this case,
|
|
||||||
I am showing a flow-sensitive analysis (one that can understand control flow
|
|
||||||
and make more speciifc predictions); for our simplest analyses, we will not
|
|
||||||
be aiming for flow-sensitivity. There is plenty of work to do even then.
|
|
||||||
|
|
||||||
The point of showing these two distinct rules is that we need to be very precise
|
|
||||||
about how the language will behave, because our analyses depend on that behavior.
|
|
||||||
|
|
||||||
Let's not get ahead of ourselves, though. I've motivated the need for semantics,
|
|
||||||
but there is much groundwork to be laid before we delve into the precise rules
|
|
||||||
of our language. After all, to define the language's semantics, we need to
|
|
||||||
have a language.
|
|
||||||
|
|
||||||
### The Syntax of Our Simple Language
|
|
||||||
|
|
||||||
I've shown a couple of examples our our language now, and there won't be that
|
|
||||||
much more to it. We can start with _expressions_: things that evaluate to
|
|
||||||
something. Some examples of expressions are `1`, `x`, and `2-(x+y)`. For our
|
|
||||||
specific language, the precise set of possible expressions can be given
|
|
||||||
by the following [Context-Free Grammar](https://en.wikipedia.org/wiki/Context-free_grammar):
|
|
||||||
|
|
||||||
{{< latex >}}
|
|
||||||
\begin{array}{rcll}
|
|
||||||
e & ::= & x & \text{(variables)} \\
|
|
||||||
& | & z & \text{(integer literals)} \\
|
|
||||||
& | & e + e & \text{(addition)} \\
|
|
||||||
& | & e - e & \text{(subtraction)}
|
|
||||||
\end{array}
|
|
||||||
{{< /latex >}}
|
|
||||||
|
|
||||||
The above can be read as follows:
|
|
||||||
|
|
||||||
> An expression \(e\) is one of the following things:
|
|
||||||
> 1. Some variable \(x\) [importantly \(x\) is a placeholder for _any_ variable,
|
|
||||||
> which could be `x` or `y` in our program code; specifically, \(x\) is
|
|
||||||
> a [_metavariable_](https://en.wikipedia.org/wiki/Metavariable).]
|
|
||||||
> 2. Some integer \(z\) [once again, \(z\) can be any integer, like 1, -42, etc.].
|
|
||||||
> 3. The addition of two other expressions [which could themselves be additions etc.].
|
|
||||||
> 4. The subtraction of two other expressions [which could also themselves be additions, subtractions, etc.].
|
|
||||||
|
|
||||||
Since expressions can be nested within other expressions --- which is necessary
|
|
||||||
to allow complicated code like `2-(x+y)` above --- they form a tree. Each node
|
|
||||||
is one of the elements of the grammar above (variable, addition, etc.). If
|
|
||||||
a node contains sub-expressions (like addition and subtraction do), then
|
|
||||||
these sub-expressions form sub-trees of the given node. This data structure
|
|
||||||
is called an [Abstract Syntax Tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree).
|
|
||||||
|
|
||||||
Notably, though `2-(x+y)` has parentheses, our grammar above does not include
|
|
||||||
include them as a case. The reason for this is that the structure of an
|
|
||||||
abstract syntax tree is sufficient to encode the order in which the operations
|
|
||||||
should be evaluated.
|
|
||||||
|
|
||||||
{{< todo >}}
|
|
||||||
Probably two drawings of differently-associated ASTs here.
|
|
||||||
{{< /todo >}}
|
|
||||||
|
|
||||||
To an Agda programmer, the one-of-four-things definition above should read
|
|
||||||
quite similarly to the definition of an algebraic data type. Indeed, this
|
|
||||||
is how we can encode the abstract syntax tree of expressions:
|
|
||||||
|
|
||||||
{{< codelines "Agda" "agda-spa/Language/Base.agda" 12 16 >}}
|
|
||||||
|
|
||||||
The only departure from the grammar above is that I had to invent constructors
|
|
||||||
for the variable and integer cases, since Agda doesn't support implicit coercions.
|
|
||||||
This adds a little bit of extra overhead, requiring, for example, that we write
|
|
||||||
numbers as `# 42` instead of `42`.
|
|
||||||
|
|
||||||
Having defined expressions, the next thing on the menu is _statements_. Unlike
|
|
||||||
expressions, which just produce values, statements "do something"; an example
|
|
||||||
of a statement might be the following Python line:
|
|
||||||
|
|
||||||
```Python
|
|
||||||
print("Hello, world!")
|
|
||||||
```
|
|
||||||
|
|
||||||
The `print` function doesn't produce any value, but it does perform an action;
|
|
||||||
it prints its argument to the console!
|
|
||||||
|
|
||||||
For the formalization, it turns out to be convenient to separate "simple"
|
|
||||||
statements from "complex" ones. Pragmatically speaking, the difference is that
|
|
||||||
between the "simple" and the "complex" is control flow; simple statements
|
|
||||||
will be guaranteed to always execute one-by-one, without any decisions or jumps.
|
|
||||||
The reason for this will become clearer in subsequent posts; I will foreshadow
|
|
||||||
a bit by saying that consecutive simple statements can be placed into a single
|
|
||||||
[basic block](https://en.wikipedia.org/wiki/Basic_block).
|
|
||||||
|
|
||||||
The following is a group of three simple statements:
|
|
||||||
|
|
||||||
```
|
|
||||||
x = 1
|
|
||||||
y = x + 2
|
|
||||||
noop
|
|
||||||
```
|
|
||||||
|
|
||||||
These will always be executed in the same order, exactly once. Here, `noop`
|
|
||||||
is a conveneint type of statement that simply does nothing.
|
|
||||||
|
|
||||||
On the other hand, the following statement is not simple:
|
|
||||||
|
|
||||||
```
|
|
||||||
while x {
|
|
||||||
x = x - 1
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
it's not simple because it makes decisions about how the code should be executed;
|
|
||||||
if `x` is nonzero, it will try executing the statement in the body of the loop
|
|
||||||
(`x = x - 1`). Otherwise, it would skip evaluating that statement, and carry on
|
|
||||||
with subsequent code.
|
|
||||||
|
|
||||||
I first define simple statements using the `BasicStmt` type:
|
|
||||||
|
|
||||||
{{< codelines "Agda" "agda-spa/Language/Base.agda" 18 20 >}}
|
|
||||||
|
|
||||||
Complex statements are just called `Stmt`; they include loops, conditionals and
|
|
||||||
sequences (
|
|
||||||
{{< sidenote "right" "then-note" "\(s_1\ \text{then}\ s_2\)" >}}
|
|
||||||
The standard notation for sequencing in imperative languages is
|
|
||||||
\(s_1; s_2\). However, Agda gives special meaning to the semicolon,
|
|
||||||
and I couldn't find any passable symbolic alternatives.
|
|
||||||
{{< /sidenote >}} is a sequence where \(s_2\) is evaluated after \(s_1\)).
|
|
||||||
Complex statements subsume simple statements, which I model using the constructor
|
|
||||||
`⟨_⟩`.
|
|
||||||
|
|
||||||
{{< codelines "Agda" "agda-spa/Language/Base.agda" 25 29 >}}
|
|
||||||
|
|
||||||
For an example of using this encoding, take the following simple program:
|
|
||||||
|
|
||||||
```
|
|
||||||
var = 1
|
|
||||||
if var {
|
|
||||||
x = 1
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
The Agda version is:
|
|
||||||
|
|
||||||
{{< codelines "Agda" "agda-spa/Main.agda" 27 34 >}}
|
|
||||||
|
|
||||||
Notice how we used `noop` to express the fact that the `else` branch of the
|
|
||||||
conditional does nothing.
|
|
Loading…
Reference in New Issue
Block a user