200 lines
8.1 KiB
Markdown
200 lines
8.1 KiB
Markdown
|
---
|
||
|
title: "Implementing and Verifying \"Static Program Analysis\" in Agda, Part 5: Our Programming Language"
|
||
|
series: "Static Program Analysis in Agda"
|
||
|
description: "In this post, I define the language that well serve as the object of our vartious analyses"
|
||
|
date: 2024-08-10T17:37:43-07:00
|
||
|
tags: ["Agda", "Programming Languages"]
|
||
|
draft: true
|
||
|
---
|
||
|
|
||
|
In the previous several posts, I've formalized the notion of lattices, which
|
||
|
are an essential ingredient to formalizing the analyses in Anders Møller's
|
||
|
lecutre notes. However, there can be no program analysis without a program
|
||
|
to analyze! In this post, I will define the (very simple) language that we
|
||
|
will be analyzing. An essential aspect of the language is its
|
||
|
[semantics](https://en.wikipedia.org/wiki/Semantics_(computer_science), which
|
||
|
simply speaking explains what each feature of the language does. At the end
|
||
|
of the previous article, I gave the following _inference rule_ which defined
|
||
|
(partially) how the `if`-`else` statement in the language works.
|
||
|
|
||
|
{{< latex >}}
|
||
|
\frac{\rho_1, e \Downarrow z \quad \neg (z = 0) \quad \rho_1,s_1 \Downarrow \rho_2}
|
||
|
{\rho_1, \textbf{if}\ e\ \textbf{then}\ s_1\ \textbf{else}\ s_2\ \Downarrow\ \rho_2}
|
||
|
{{< /latex >}}
|
||
|
|
||
|
Like I mentioned then, this rule reads as follows:
|
||
|
|
||
|
> If the condition of an `if`-`else` statement evaluates to a nonzero value,
|
||
|
> then to evaluate the statement, you evaluate its `then` branch.
|
||
|
|
||
|
Another similar --- but crucially, not equivlalent -- rule is the following:
|
||
|
|
||
|
{{< latex >}}
|
||
|
\frac{\rho_1, e \Downarrow z \quad z = 1 \quad \rho_1,s_1 \Downarrow \rho_2}
|
||
|
{\rho_1, \textbf{if}\ e\ \textbf{then}\ s_1\ \textbf{else}\ s_2\ \Downarrow\ \rho_2}
|
||
|
{{< /latex >}}
|
||
|
|
||
|
This time, the English interpretation of the rule is as follows:
|
||
|
|
||
|
> If the condition of an `if`-`else` statement evaluates to one,
|
||
|
> then to evaluate the statement, you evaluate its `then` branch.
|
||
|
|
||
|
These rules are certainly not equivalent. For instance, the former allows
|
||
|
the "then" branch to be executed when the condition is `2` two; however, in
|
||
|
the latter, the value of the conditional must be `1`. If our analysis were
|
||
|
_flow-sensitive_ (our first few will not be), then this difference would change
|
||
|
its output when determining the signs of the following program:
|
||
|
|
||
|
```
|
||
|
x = 2
|
||
|
if x {
|
||
|
y = - 1
|
||
|
} else {
|
||
|
y = 1
|
||
|
}
|
||
|
```
|
||
|
|
||
|
Using the first, more "relaxed" rule, the condition would be considered "true",
|
||
|
and the sign of `y` would be `-`. On the other hand, using the second,
|
||
|
"stricter" rule, the sign of `y` would be `+`. I stress that in this case,
|
||
|
I am showing a flow-sensitive analysis (one that can understand control flow
|
||
|
and make more speciifc predictions); for our simplest analyses, we will not
|
||
|
be aiming for flow-sensitivity. There is plenty of work to do even then.
|
||
|
|
||
|
The point of showing these two distinct rules is that we need to be very precise
|
||
|
about how the language will behave, because our analyses depend on that behavior.
|
||
|
|
||
|
Let's not get ahead of ourselves, though. I've motivated the need for semantics,
|
||
|
but there is much groundwork to be laid before we delve into the precise rules
|
||
|
of our language. After all, to define the language's semantics, we need to
|
||
|
have a language.
|
||
|
|
||
|
### The Syntax of Our Simple Language
|
||
|
|
||
|
I've shown a couple of examples our our language now, and there won't be that
|
||
|
much more to it. We can start with _expressions_: things that evaluate to
|
||
|
something. Some examples of expressions are `1`, `x`, and `2-(x+y)`. For our
|
||
|
specific language, the precise set of possible expressions can be given
|
||
|
by the following [Context-Free Grammar](https://en.wikipedia.org/wiki/Context-free_grammar):
|
||
|
|
||
|
{{< latex >}}
|
||
|
\begin{array}{rcll}
|
||
|
e & ::= & x & \text{(variables)} \\
|
||
|
& | & z & \text{(integer literals)} \\
|
||
|
& | & e + e & \text{(addition)} \\
|
||
|
& | & e - e & \text{(subtraction)}
|
||
|
\end{array}
|
||
|
{{< /latex >}}
|
||
|
|
||
|
The above can be read as follows:
|
||
|
|
||
|
> An expression \(e\) is one of the following things:
|
||
|
> 1. Some variable \(x\) [importantly \(x\) is a placeholder for _any_ variable,
|
||
|
> which could be `x` or `y` in our program code; specifically, \(x\) is
|
||
|
> a [_metavariable_](https://en.wikipedia.org/wiki/Metavariable).]
|
||
|
> 2. Some integer \(z\) [once again, \(z\) can be any integer, like 1, -42, etc.].
|
||
|
> 3. The addition of two other expressions [which could themselves be additions etc.].
|
||
|
> 4. The subtraction of two other expressions [which could also themselves be additions, subtractions, etc.].
|
||
|
|
||
|
Since expressions can be nested within other expressions --- which is necessary
|
||
|
to allow complicated code like `2-(x+y)` above --- they form a tree. Each node
|
||
|
is one of the elements of the grammar above (variable, addition, etc.). If
|
||
|
a node contains sub-expressions (like addition and subtraction do), then
|
||
|
these sub-expressions form sub-trees of the given node. This data structure
|
||
|
is called an [Abstract Syntax Tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree).
|
||
|
|
||
|
Notably, though `2-(x+y)` has parentheses, our grammar above does not include
|
||
|
include them as a case. The reason for this is that the structure of an
|
||
|
abstract syntax tree is sufficient to encode the order in which the operations
|
||
|
should be evaluated.
|
||
|
|
||
|
{{< todo >}}
|
||
|
Probably two drawings of differently-associated ASTs here.
|
||
|
{{< /todo >}}
|
||
|
|
||
|
To an Agda programmer, the one-of-four-things definition above should read
|
||
|
quite similarly to the definition of an algebraic data type. Indeed, this
|
||
|
is how we can encode the abstract syntax tree of expressions:
|
||
|
|
||
|
{{< codelines "Agda" "agda-spa/Language/Base.agda" 12 16 >}}
|
||
|
|
||
|
The only departure from the grammar above is that I had to invent constructors
|
||
|
for the variable and integer cases, since Agda doesn't support implicit coercions.
|
||
|
This adds a little bit of extra overhead, requiring, for example, that we write
|
||
|
numbers as `# 42` instead of `42`.
|
||
|
|
||
|
Having defined expressions, the next thing on the menu is _statements_. Unlike
|
||
|
expressions, which just produce values, statements "do something"; an example
|
||
|
of a statement might be the following Python line:
|
||
|
|
||
|
```Python
|
||
|
print("Hello, world!")
|
||
|
```
|
||
|
|
||
|
The `print` function doesn't produce any value, but it does perform an action;
|
||
|
it prints its argument to the console!
|
||
|
|
||
|
For the formalization, it turns out to be convenient to separate "simple"
|
||
|
statements from "complex" ones. Pragmatically speaking, the difference is that
|
||
|
between the "simple" and the "complex" is control flow; simple statements
|
||
|
will be guaranteed to always execute one-by-one, without any decisions or jumps.
|
||
|
The reason for this will become clearer in subsequent posts; I will foreshadow
|
||
|
a bit by saying that consecutive simple statements can be placed into a single
|
||
|
[basic block](https://en.wikipedia.org/wiki/Basic_block).
|
||
|
|
||
|
The following is a group of three simple statements:
|
||
|
|
||
|
```
|
||
|
x = 1
|
||
|
y = x + 2
|
||
|
noop
|
||
|
```
|
||
|
|
||
|
These will always be executed in the same order, exactly once. Here, `noop`
|
||
|
is a conveneint type of statement that simply does nothing.
|
||
|
|
||
|
On the other hand, the following statement is not simple:
|
||
|
|
||
|
```
|
||
|
while x {
|
||
|
x = x - 1
|
||
|
}
|
||
|
```
|
||
|
|
||
|
it's not simple because it makes decisions about how the code should be executed;
|
||
|
if `x` is nonzero, it will try executing the statement in the body of the loop
|
||
|
(`x = x - 1`). Otherwise, it would skip evaluating that statement, and carry on
|
||
|
with subsequent code.
|
||
|
|
||
|
I first define simple statements using the `BasicStmt` type:
|
||
|
|
||
|
{{< codelines "Agda" "agda-spa/Language/Base.agda" 18 20 >}}
|
||
|
|
||
|
Complex statements are just called `Stmt`; they include loops, conditionals and
|
||
|
sequences (
|
||
|
{{< sidenote "right" "then-note" "\(s_1\ \text{then}\ s_2\)" >}}
|
||
|
The standard notation for sequencing in imperative languages is
|
||
|
\(s_1; s_2\). However, Agda gives special meaning to the semicolon,
|
||
|
and I couldn't find any passable symbolic alternatives.
|
||
|
{{< /sidenote >}} is a sequence where \(s_2\) is evaluated after \(s_1\)).
|
||
|
Complex statements subsume simple statements, which I model using the constructor
|
||
|
`⟨_⟩`.
|
||
|
|
||
|
{{< codelines "Agda" "agda-spa/Language/Base.agda" 25 29 >}}
|
||
|
|
||
|
For an example of using this encoding, take the following simple program:
|
||
|
|
||
|
```
|
||
|
var = 1
|
||
|
if var {
|
||
|
x = 1
|
||
|
}
|
||
|
```
|
||
|
|
||
|
The Agda version is:
|
||
|
|
||
|
{{< codelines "Agda" "agda-spa/Main.agda" 27 34 >}}
|
||
|
|
||
|
Notice how we used `noop` to express the fact that the `else` branch of the
|
||
|
conditional does nothing.
|