2024-08-18 18:05:39 -07:00
|
|
|
|
---
|
|
|
|
|
title: "Implementing and Verifying \"Static Program Analysis\" in Agda, Part 5: Our Programming Language"
|
|
|
|
|
series: "Static Program Analysis in Agda"
|
|
|
|
|
description: "In this post, I define the language that well serve as the object of our vartious analyses"
|
|
|
|
|
date: 2024-08-10T17:37:43-07:00
|
|
|
|
|
tags: ["Agda", "Programming Languages"]
|
2024-10-13 13:31:47 -07:00
|
|
|
|
custom_js: ["parser.js"]
|
|
|
|
|
bergamot:
|
|
|
|
|
render_presets:
|
|
|
|
|
default: "bergamot/rendering/imp.bergamot"
|
|
|
|
|
input_modes:
|
|
|
|
|
- name: "Expression"
|
|
|
|
|
fn: "window.parseExpr"
|
|
|
|
|
- name: "Basic Statement"
|
|
|
|
|
fn: "window.parseBasicStmt"
|
|
|
|
|
- name: "Statement"
|
|
|
|
|
fn: "window.parseStmt"
|
2024-08-18 18:05:39 -07:00
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
In the previous several posts, I've formalized the notion of lattices, which
|
|
|
|
|
are an essential ingredient to formalizing the analyses in Anders Møller's
|
2024-08-31 11:37:28 -07:00
|
|
|
|
lecture notes. However, there can be no program analysis without a program
|
2024-08-18 18:05:39 -07:00
|
|
|
|
to analyze! In this post, I will define the (very simple) language that we
|
|
|
|
|
will be analyzing. An essential aspect of the language is its
|
2024-09-01 15:19:47 -07:00
|
|
|
|
[semantics](https://en.wikipedia.org/wiki/Semantics_(computer_science)), which
|
2024-08-18 18:05:39 -07:00
|
|
|
|
simply speaking explains what each feature of the language does. At the end
|
|
|
|
|
of the previous article, I gave the following _inference rule_ which defined
|
|
|
|
|
(partially) how the `if`-`else` statement in the language works.
|
|
|
|
|
|
|
|
|
|
{{< latex >}}
|
|
|
|
|
\frac{\rho_1, e \Downarrow z \quad \neg (z = 0) \quad \rho_1,s_1 \Downarrow \rho_2}
|
|
|
|
|
{\rho_1, \textbf{if}\ e\ \textbf{then}\ s_1\ \textbf{else}\ s_2\ \Downarrow\ \rho_2}
|
|
|
|
|
{{< /latex >}}
|
|
|
|
|
|
|
|
|
|
Like I mentioned then, this rule reads as follows:
|
|
|
|
|
|
|
|
|
|
> If the condition of an `if`-`else` statement evaluates to a nonzero value,
|
|
|
|
|
> then to evaluate the statement, you evaluate its `then` branch.
|
|
|
|
|
|
|
|
|
|
Another similar --- but crucially, not equivlalent -- rule is the following:
|
|
|
|
|
|
|
|
|
|
{{< latex >}}
|
|
|
|
|
\frac{\rho_1, e \Downarrow z \quad z = 1 \quad \rho_1,s_1 \Downarrow \rho_2}
|
|
|
|
|
{\rho_1, \textbf{if}\ e\ \textbf{then}\ s_1\ \textbf{else}\ s_2\ \Downarrow\ \rho_2}
|
|
|
|
|
{{< /latex >}}
|
|
|
|
|
|
|
|
|
|
This time, the English interpretation of the rule is as follows:
|
|
|
|
|
|
|
|
|
|
> If the condition of an `if`-`else` statement evaluates to one,
|
|
|
|
|
> then to evaluate the statement, you evaluate its `then` branch.
|
|
|
|
|
|
|
|
|
|
These rules are certainly not equivalent. For instance, the former allows
|
2024-08-31 11:37:28 -07:00
|
|
|
|
the "then" branch to be executed when the condition is `2`; however, in
|
2024-08-18 18:05:39 -07:00
|
|
|
|
the latter, the value of the conditional must be `1`. If our analysis were
|
2024-08-31 11:37:28 -07:00
|
|
|
|
intelligent (our first few will not be), then this difference would change
|
2024-08-18 18:05:39 -07:00
|
|
|
|
its output when determining the signs of the following program:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
x = 2
|
|
|
|
|
if x {
|
|
|
|
|
y = - 1
|
|
|
|
|
} else {
|
|
|
|
|
y = 1
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Using the first, more "relaxed" rule, the condition would be considered "true",
|
|
|
|
|
and the sign of `y` would be `-`. On the other hand, using the second,
|
|
|
|
|
"stricter" rule, the sign of `y` would be `+`. I stress that in this case,
|
|
|
|
|
I am showing a flow-sensitive analysis (one that can understand control flow
|
2024-08-31 11:37:28 -07:00
|
|
|
|
and make more specific predictions); for our simplest analyses, we will not
|
2024-08-18 18:05:39 -07:00
|
|
|
|
be aiming for flow-sensitivity. There is plenty of work to do even then.
|
|
|
|
|
|
|
|
|
|
The point of showing these two distinct rules is that we need to be very precise
|
|
|
|
|
about how the language will behave, because our analyses depend on that behavior.
|
|
|
|
|
|
|
|
|
|
Let's not get ahead of ourselves, though. I've motivated the need for semantics,
|
|
|
|
|
but there is much groundwork to be laid before we delve into the precise rules
|
|
|
|
|
of our language. After all, to define the language's semantics, we need to
|
|
|
|
|
have a language.
|
|
|
|
|
|
|
|
|
|
### The Syntax of Our Simple Language
|
|
|
|
|
|
|
|
|
|
I've shown a couple of examples our our language now, and there won't be that
|
|
|
|
|
much more to it. We can start with _expressions_: things that evaluate to
|
|
|
|
|
something. Some examples of expressions are `1`, `x`, and `2-(x+y)`. For our
|
|
|
|
|
specific language, the precise set of possible expressions can be given
|
|
|
|
|
by the following [Context-Free Grammar](https://en.wikipedia.org/wiki/Context-free_grammar):
|
|
|
|
|
|
|
|
|
|
{{< latex >}}
|
|
|
|
|
\begin{array}{rcll}
|
|
|
|
|
e & ::= & x & \text{(variables)} \\
|
|
|
|
|
& | & z & \text{(integer literals)} \\
|
|
|
|
|
& | & e + e & \text{(addition)} \\
|
|
|
|
|
& | & e - e & \text{(subtraction)}
|
|
|
|
|
\end{array}
|
|
|
|
|
{{< /latex >}}
|
|
|
|
|
|
|
|
|
|
The above can be read as follows:
|
|
|
|
|
|
|
|
|
|
> An expression \(e\) is one of the following things:
|
|
|
|
|
> 1. Some variable \(x\) [importantly \(x\) is a placeholder for _any_ variable,
|
|
|
|
|
> which could be `x` or `y` in our program code; specifically, \(x\) is
|
|
|
|
|
> a [_metavariable_](https://en.wikipedia.org/wiki/Metavariable).]
|
|
|
|
|
> 2. Some integer \(z\) [once again, \(z\) can be any integer, like 1, -42, etc.].
|
|
|
|
|
> 3. The addition of two other expressions [which could themselves be additions etc.].
|
|
|
|
|
> 4. The subtraction of two other expressions [which could also themselves be additions, subtractions, etc.].
|
|
|
|
|
|
|
|
|
|
Since expressions can be nested within other expressions --- which is necessary
|
|
|
|
|
to allow complicated code like `2-(x+y)` above --- they form a tree. Each node
|
|
|
|
|
is one of the elements of the grammar above (variable, addition, etc.). If
|
|
|
|
|
a node contains sub-expressions (like addition and subtraction do), then
|
|
|
|
|
these sub-expressions form sub-trees of the given node. This data structure
|
|
|
|
|
is called an [Abstract Syntax Tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree).
|
|
|
|
|
|
|
|
|
|
Notably, though `2-(x+y)` has parentheses, our grammar above does not include
|
|
|
|
|
include them as a case. The reason for this is that the structure of an
|
|
|
|
|
abstract syntax tree is sufficient to encode the order in which the operations
|
2024-11-03 17:06:19 -08:00
|
|
|
|
should be evaluated. Since I lack a nice way of drawing ASTs, I will use
|
|
|
|
|
an ASCII drawing to show an example.
|
2024-08-18 18:05:39 -07:00
|
|
|
|
|
2024-11-03 17:06:19 -08:00
|
|
|
|
```
|
|
|
|
|
Expression: 2 - (x+y)
|
|
|
|
|
(-)
|
|
|
|
|
/ \
|
|
|
|
|
2 (+)
|
|
|
|
|
/ \
|
|
|
|
|
x y
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Expression: (2-x) + y
|
|
|
|
|
(+)
|
|
|
|
|
/ \
|
|
|
|
|
(-) y
|
|
|
|
|
/ \
|
|
|
|
|
2 x
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Above, in the first AST, `(+)` is a child of the `(-)` node, which means
|
|
|
|
|
that it's a sub-expression. As a result, that subexpression is evaluated first,
|
|
|
|
|
before evaluating `(-)`, and so, the AST expresents `2-(x+y)`. In the other
|
|
|
|
|
example, `(-)` is a child of `(+)`, and is therefore evaluated first. The resulting
|
|
|
|
|
association encoded by that AST is `(2-x)+y`.
|
2024-08-18 18:05:39 -07:00
|
|
|
|
|
|
|
|
|
To an Agda programmer, the one-of-four-things definition above should read
|
|
|
|
|
quite similarly to the definition of an algebraic data type. Indeed, this
|
|
|
|
|
is how we can encode the abstract syntax tree of expressions:
|
|
|
|
|
|
|
|
|
|
{{< codelines "Agda" "agda-spa/Language/Base.agda" 12 16 >}}
|
|
|
|
|
|
|
|
|
|
The only departure from the grammar above is that I had to invent constructors
|
|
|
|
|
for the variable and integer cases, since Agda doesn't support implicit coercions.
|
|
|
|
|
This adds a little bit of extra overhead, requiring, for example, that we write
|
|
|
|
|
numbers as `# 42` instead of `42`.
|
|
|
|
|
|
|
|
|
|
Having defined expressions, the next thing on the menu is _statements_. Unlike
|
|
|
|
|
expressions, which just produce values, statements "do something"; an example
|
|
|
|
|
of a statement might be the following Python line:
|
|
|
|
|
|
|
|
|
|
```Python
|
|
|
|
|
print("Hello, world!")
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The `print` function doesn't produce any value, but it does perform an action;
|
|
|
|
|
it prints its argument to the console!
|
|
|
|
|
|
|
|
|
|
For the formalization, it turns out to be convenient to separate "simple"
|
|
|
|
|
statements from "complex" ones. Pragmatically speaking, the difference is that
|
|
|
|
|
between the "simple" and the "complex" is control flow; simple statements
|
2024-08-31 11:37:28 -07:00
|
|
|
|
will be guaranteed to always execute without any decisions or jumps.
|
2024-08-18 18:05:39 -07:00
|
|
|
|
The reason for this will become clearer in subsequent posts; I will foreshadow
|
|
|
|
|
a bit by saying that consecutive simple statements can be placed into a single
|
|
|
|
|
[basic block](https://en.wikipedia.org/wiki/Basic_block).
|
|
|
|
|
|
|
|
|
|
The following is a group of three simple statements:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
x = 1
|
|
|
|
|
y = x + 2
|
|
|
|
|
noop
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
These will always be executed in the same order, exactly once. Here, `noop`
|
2024-08-31 11:37:28 -07:00
|
|
|
|
is a convenient type of statement that simply does nothing.
|
2024-08-18 18:05:39 -07:00
|
|
|
|
|
|
|
|
|
On the other hand, the following statement is not simple:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
while x {
|
|
|
|
|
x = x - 1
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
2024-08-31 11:37:28 -07:00
|
|
|
|
It's not simple because it makes decisions about how the code should be executed;
|
2024-08-18 18:05:39 -07:00
|
|
|
|
if `x` is nonzero, it will try executing the statement in the body of the loop
|
|
|
|
|
(`x = x - 1`). Otherwise, it would skip evaluating that statement, and carry on
|
|
|
|
|
with subsequent code.
|
|
|
|
|
|
|
|
|
|
I first define simple statements using the `BasicStmt` type:
|
|
|
|
|
|
|
|
|
|
{{< codelines "Agda" "agda-spa/Language/Base.agda" 18 20 >}}
|
|
|
|
|
|
|
|
|
|
Complex statements are just called `Stmt`; they include loops, conditionals and
|
2024-08-31 11:37:28 -07:00
|
|
|
|
sequences ---
|
2024-08-18 18:05:39 -07:00
|
|
|
|
{{< sidenote "right" "then-note" "\(s_1\ \text{then}\ s_2\)" >}}
|
|
|
|
|
The standard notation for sequencing in imperative languages is
|
|
|
|
|
\(s_1; s_2\). However, Agda gives special meaning to the semicolon,
|
|
|
|
|
and I couldn't find any passable symbolic alternatives.
|
2024-08-31 11:37:28 -07:00
|
|
|
|
{{< /sidenote >}} is a sequence where \(s_2\) is evaluated after \(s_1\).
|
2024-08-18 18:05:39 -07:00
|
|
|
|
Complex statements subsume simple statements, which I model using the constructor
|
|
|
|
|
`⟨_⟩`.
|
|
|
|
|
|
|
|
|
|
{{< codelines "Agda" "agda-spa/Language/Base.agda" 25 29 >}}
|
|
|
|
|
|
|
|
|
|
For an example of using this encoding, take the following simple program:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
var = 1
|
|
|
|
|
if var {
|
|
|
|
|
x = 1
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The Agda version is:
|
|
|
|
|
|
|
|
|
|
{{< codelines "Agda" "agda-spa/Main.agda" 27 34 >}}
|
|
|
|
|
|
|
|
|
|
Notice how we used `noop` to express the fact that the `else` branch of the
|
|
|
|
|
conditional does nothing.
|
2024-10-13 13:31:47 -07:00
|
|
|
|
|
|
|
|
|
### The Semantics of Our Language
|
|
|
|
|
|
|
|
|
|
We now have all the language constructs that I'll be showing off --- because
|
|
|
|
|
those are all the concepts that I've formalized. What's left is to define
|
|
|
|
|
how they behave. We will do this using a logical tool called
|
|
|
|
|
[_inference rules_](https://en.wikipedia.org/wiki/Rule_of_inference). I've
|
|
|
|
|
written about them a number of times; they're ubiquitous, particularly in the
|
|
|
|
|
sorts of things I like explore on this site. The [section on inference rules]({{< relref "01_aoc_coq#inference-rules" >}})
|
|
|
|
|
from my Advent of Code series is pretty relevant, and [the notation section from
|
|
|
|
|
a post in my compiler series]({{< relref "03_compiler_typechecking#some-notation" >}}) says
|
|
|
|
|
much the same thing; I won't be re-describing them here.
|
|
|
|
|
|
|
|
|
|
There are three pieces which demand semantics: expressions, simple statements,
|
|
|
|
|
and non-simple statements. The semantics of each of the three requires the
|
|
|
|
|
semantics of the items that precede it. We will therefore start with expressions.
|
|
|
|
|
|
|
|
|
|
#### Expressions
|
|
|
|
|
|
|
|
|
|
The trickiest thing about expression is that the value of an expression depends
|
|
|
|
|
on the "context": `x+1` can evaluate to `43` if `x` is `42`, or it can evaluate
|
|
|
|
|
to `0` if `x` is `-1`. To evaluate an expression, we will therefore need to
|
|
|
|
|
assign values to all of the variables in that expression. A mapping that
|
|
|
|
|
assigns values to variables is typically called an _environment_. We will write
|
|
|
|
|
\(\varnothing\) for "empty environment", and \(\{\texttt{x} \mapsto 42, \texttt{y} \mapsto -1 \}\) for
|
|
|
|
|
an environment that maps the variable \(\texttt{x}\) to 42, and the variable \(\texttt{y}\) to -1.
|
|
|
|
|
|
|
|
|
|
Now, a bit of notation. We will use the letter \(\rho\) to represent environments
|
|
|
|
|
(and if several environments are involved, we will occasionally number them
|
|
|
|
|
as \(\rho_1\), \(\rho_2\), etc.) We will use the letter \(e\) to stand for
|
|
|
|
|
expressions, and the letter \(v\) to stand for values. Finally, we'll write
|
|
|
|
|
\(\rho, e \Downarrow v\) to say that "in an environment \(\rho\), expression \(e\)
|
|
|
|
|
evaluates to value \(v\)". Our two previous examples of evaluating `x+1` can
|
|
|
|
|
thus be written as follows:
|
|
|
|
|
|
|
|
|
|
{{< latex >}}
|
|
|
|
|
\{ \texttt{x} \mapsto 42 \}, \texttt{x}+1 \Downarrow 43 \\
|
|
|
|
|
\{ \texttt{x} \mapsto -1 \}, \texttt{x}+1 \Downarrow 0 \\
|
|
|
|
|
{{< /latex >}}
|
|
|
|
|
|
|
|
|
|
Now, on to the actual rules for how to evaluate expressions. Most simply,
|
2024-11-03 17:50:11 -08:00
|
|
|
|
integer literals like `1` just evaluate to themselves.
|
2024-10-13 13:31:47 -07:00
|
|
|
|
|
|
|
|
|
{{< latex >}}
|
|
|
|
|
\frac{n \in \text{Int}}{\rho, n \Downarrow n}
|
|
|
|
|
{{< /latex >}}
|
|
|
|
|
|
|
|
|
|
Note that the letter \(\rho\) is completely unused in the above rule. That's
|
|
|
|
|
because no matter what values _variables_ have, a number still evaluates to
|
|
|
|
|
the same value. As we've already established, the same is not true for a
|
|
|
|
|
variable like \(\texttt{x}\). To evaluate such a variable, we need to retrieve
|
|
|
|
|
the value it's mapped to in the current environment, which we will write as
|
|
|
|
|
\(\rho(\texttt{x})\). This gives the following inference rule:
|
|
|
|
|
|
|
|
|
|
{{< latex >}}
|
|
|
|
|
\frac{\rho(x) = v}{\rho, x \Downarrow v}
|
|
|
|
|
{{< /latex >}}
|
|
|
|
|
|
|
|
|
|
All that's left is to define addition and subtraction. For an expression in the
|
|
|
|
|
form \(e_1+e_2\), we first need to evaluate the two subexpressions \(e_1\)
|
|
|
|
|
and \(e_2\), and then add the two resulting numbers. As a result, the addition
|
|
|
|
|
rule includes two additional premises, one for evaluating each summand.
|
|
|
|
|
|
|
|
|
|
{{< latex >}}
|
|
|
|
|
\frac
|
|
|
|
|
{\rho, e_1 \Downarrow v_1 \quad \rho, e_2 \Downarrow v_2 \quad v_1 + v_2 = v}
|
|
|
|
|
{\rho, e_1+e_2 \Downarrow v}
|
|
|
|
|
{{< /latex >}}
|
|
|
|
|
|
|
|
|
|
The subtraction rule is similar. Below, I've configured an instance of
|
|
|
|
|
[Bergamot]({{< relref "bergamot" >}}) to interpret these exact rules. Try
|
|
|
|
|
typing various expressions like `1`, `1+1`, etc. into the input box below
|
|
|
|
|
to see them evaluate. If you click the "Full Proof Tree" button, you can also view
|
|
|
|
|
the exact rules that were used in computing a particular value. The variables
|
|
|
|
|
`x`, `y`, and `z` are pre-defined for your convenience.
|
|
|
|
|
|
|
|
|
|
{{< bergamot_widget id="expr-widget" query="" prompt="eval(extend(extend(extend(empty, x, 17), y, 42), z, 0), TERM, ?v)" modes="Expression:Expression" >}}
|
2024-11-03 16:05:04 -08:00
|
|
|
|
hidden section "" {
|
|
|
|
|
Eq @ eq(?x, ?x) <-;
|
|
|
|
|
}
|
2024-10-13 13:31:47 -07:00
|
|
|
|
section "" {
|
|
|
|
|
EvalNum @ eval(?rho, lit(?n), ?n) <- int(?n);
|
|
|
|
|
EvalVar @ eval(?rho, var(?x), ?v) <- inenv(?x, ?v, ?rho);
|
|
|
|
|
}
|
|
|
|
|
section "" {
|
|
|
|
|
EvalPlus @ eval(?rho, plus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), add(?v_1, ?v_2, ?v);
|
|
|
|
|
EvalMinus @ eval(?rho, minus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), subtract(?v_1, ?v_2, ?v);
|
|
|
|
|
}
|
|
|
|
|
hidden section "" {
|
|
|
|
|
EnvTake @ inenv(?x, ?v, extend(?rho, ?x, ?v)) <-;
|
2024-11-03 16:05:04 -08:00
|
|
|
|
EnvSkip @ inenv(?x, ?v_1, extend(?rho, ?y, ?v_2)) <- inenv(?x, ?v_1, ?rho), not(eq(?x, ?y));
|
2024-10-13 13:31:47 -07:00
|
|
|
|
}
|
|
|
|
|
{{< /bergamot_widget >}}
|
|
|
|
|
|
|
|
|
|
The Agda equivalent of this looks very similar to the rules themselves. I use
|
|
|
|
|
`⇒ᵉ` instead of \(\Downarrow\), and there's a little bit of tedium with
|
|
|
|
|
wrapping integers into a new `Value` type. I also used a (partial) relation
|
|
|
|
|
`(x, v) ∈ ρ` instead of explicitly defining accessing an environment, since
|
|
|
|
|
it is conceivable for a user to attempt accessing a variable that has not
|
|
|
|
|
been assigned to. Aside from these notational changes, the structure of each
|
|
|
|
|
of the constructors of the evaluation data type matches the inference rules
|
|
|
|
|
I showed above.
|
|
|
|
|
|
|
|
|
|
{{< codelines "Agda" "agda-spa/Language/Semantics.agda" 27 35 >}}
|
|
|
|
|
|
|
|
|
|
#### Simple Statements
|
|
|
|
|
The main difference between formalizing (simple and "normal") statements is
|
|
|
|
|
that they modify the environment. If `x` has one value, writing `x = x + 1` will
|
2024-11-03 17:50:11 -08:00
|
|
|
|
certainly change that value. On the other hand, statements don't produce values.
|
2024-10-13 13:31:47 -07:00
|
|
|
|
So, we will be writing claims like \(\rho_1 , \textit{bs} \Rightarrow \rho_2\)
|
|
|
|
|
to say that the basic statement \(\textit{bs}\), when starting in environment
|
|
|
|
|
\(\rho_1\), will produce environment \(\rho_2\). Here's an example:
|
|
|
|
|
|
|
|
|
|
{{< latex >}}
|
|
|
|
|
\{ \texttt{x} \mapsto 42, \texttt{y} \mapsto 17 \}, \ \texttt{x = x - \text{1}} \Rightarrow \{ \texttt{x} \mapsto 41, \texttt{y} \mapsto 17 \}
|
|
|
|
|
{{< /latex >}}
|
|
|
|
|
|
|
|
|
|
Here, we subtracted one from a variable with value `42`, leaving it with a new
|
|
|
|
|
value of `41`.
|
|
|
|
|
|
|
|
|
|
There are two basic statements, and one of them quite literally does nothing.
|
|
|
|
|
The inference rule for `noop` is very simple:
|
|
|
|
|
|
|
|
|
|
{{< latex >}}
|
|
|
|
|
\rho,\ \texttt{noop} \Rightarrow \rho
|
|
|
|
|
{{< /latex >}}
|
|
|
|
|
|
|
|
|
|
For the assignment rule, we need to know how to evaluate the expression on the
|
|
|
|
|
right side of the equal sign. This is why we needed to define the semantics
|
|
|
|
|
of expressions first. Given those, the evaluation rule for assignment is as
|
|
|
|
|
follows, with \(\rho[x \mapsto v]\) meaning "the environment \(\rho\) but
|
|
|
|
|
mapping the variable \(x\) to value \(v\)".
|
|
|
|
|
|
|
|
|
|
{{< latex >}}
|
|
|
|
|
\frac
|
|
|
|
|
{\rho, e \Downarrow v}
|
|
|
|
|
{\rho, x = e \Rightarrow \rho[x \mapsto v]}
|
|
|
|
|
{{< /latex >}}
|
|
|
|
|
|
|
|
|
|
Those are actually all the rules we need, and below, I am once again configuring
|
|
|
|
|
a Bergamot instance, this time with simple statements. Try out `noop` or some
|
|
|
|
|
sort of variable assignment, like `x = x + 1`.
|
|
|
|
|
|
|
|
|
|
{{< bergamot_widget id="basic-stmt-widget" query="" prompt="stepbasic(extend(extend(extend(empty, x, 17), y, 42), z, 0), TERM, ?env)" modes="Basic Statement:Basic Statement" >}}
|
2024-11-03 16:05:04 -08:00
|
|
|
|
hidden section "" {
|
|
|
|
|
Eq @ eq(?x, ?x) <-;
|
|
|
|
|
}
|
2024-10-13 13:31:47 -07:00
|
|
|
|
hidden section "" {
|
|
|
|
|
EvalNum @ eval(?rho, lit(?n), ?n) <- int(?n);
|
|
|
|
|
EvalVar @ eval(?rho, var(?x), ?v) <- inenv(?x, ?v, ?rho);
|
|
|
|
|
}
|
|
|
|
|
hidden section "" {
|
|
|
|
|
EvalPlus @ eval(?rho, plus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), add(?v_1, ?v_2, ?v);
|
|
|
|
|
EvalMinus @ eval(?rho, minus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), subtract(?v_1, ?v_2, ?v);
|
|
|
|
|
}
|
|
|
|
|
section "" {
|
|
|
|
|
StepNoop @ stepbasic(?rho, noop, ?rho) <-;
|
|
|
|
|
StepAssign @ stepbasic(?rho, assign(?x, ?e), extend(?rho, ?x, ?v)) <- eval(?rho, ?e, ?v);
|
|
|
|
|
}
|
|
|
|
|
hidden section "" {
|
|
|
|
|
EnvTake @ inenv(?x, ?v, extend(?rho, ?x, ?v)) <-;
|
2024-11-03 16:05:04 -08:00
|
|
|
|
EnvSkip @ inenv(?x, ?v_1, extend(?rho, ?y, ?v_2)) <- inenv(?x, ?v_1, ?rho), not(eq(?x, ?y));
|
2024-10-13 13:31:47 -07:00
|
|
|
|
}
|
|
|
|
|
{{< /bergamot_widget >}}
|
|
|
|
|
|
|
|
|
|
The Agda implementation is once again just a data type with constructors-for-rules.
|
|
|
|
|
This time they also look quite similar to the rules I've shown up until now,
|
|
|
|
|
though I continue to explicitly quantify over variables like `ρ`.
|
|
|
|
|
|
|
|
|
|
{{< codelines "Agda" "agda-spa/Language/Semantics.agda" 37 40 >}}
|
|
|
|
|
|
|
|
|
|
#### Statements
|
|
|
|
|
|
|
|
|
|
Let's work on non-simple statements next. The easiest rule to define is probably
|
|
|
|
|
sequencing. When we use `then` (or `;`) to combine two statements, what we
|
|
|
|
|
actually want is to execute the first statement, which may change variables,
|
|
|
|
|
and then execute the second statement while keeping the changes from the first.
|
|
|
|
|
This means there are three environments: \(\rho_1\) for the initial state before
|
|
|
|
|
either statement is executed, \(\rho_2\) for the state between executing the
|
|
|
|
|
first and second statement, and \(\rho_3\) for the final state after both
|
|
|
|
|
are done executing. This leads to the following rule:
|
|
|
|
|
|
|
|
|
|
{{< latex >}}
|
|
|
|
|
\frac
|
|
|
|
|
{ \rho_1, s_1 \Rightarrow \rho_2 \quad \rho_2, s_2 \Rightarrow \rho_3 }
|
|
|
|
|
{ \rho_1, s_1; s_2 \Rightarrow \rho_3 }
|
|
|
|
|
{{< /latex >}}
|
|
|
|
|
|
|
|
|
|
We will actually need two rules to evaluate the conditional statement: one
|
|
|
|
|
for when the condition evaluates to "true", and one for when the condition
|
|
|
|
|
evaluates to "false". Only, I never specified booleans as being part of
|
|
|
|
|
the language, which means that we will need to come up what "false" and "true"
|
|
|
|
|
are. I will take my cue from C++ and use zero as "false", and any other number
|
|
|
|
|
as "true".
|
|
|
|
|
|
|
|
|
|
If the condition of an `if`-`else` statement is "true" (nonzero), then the
|
|
|
|
|
effect of executing the `if`-`else` should be the same as executing the "then"
|
|
|
|
|
part of the statement, while completely ignoring the "else" part.
|
|
|
|
|
|
|
|
|
|
{{< latex >}}
|
|
|
|
|
\frac
|
|
|
|
|
{ \rho_1 , e \Downarrow v \quad v \neq 0 \quad \rho_1, s_1 \Rightarrow \rho_2}
|
|
|
|
|
{ \rho_1, \textbf{if}\ e\ \{ s_1 \}\ \textbf{else}\ \{ s_2 \}\ \Rightarrow \rho_2 }
|
|
|
|
|
{{< /latex >}}
|
|
|
|
|
|
|
|
|
|
Notice that in the above rule, we used the evaluation judgement \(\rho_1, e \Downarrow v\)
|
|
|
|
|
to evaluate the _expression_ that serves as the condition. We then had an
|
|
|
|
|
additional premise that requires the truthiness of the resulting value \(v\).
|
|
|
|
|
The rule for evaluating a conditional with a "false" branch is very similar.
|
|
|
|
|
|
|
|
|
|
{{< latex >}}
|
|
|
|
|
\frac
|
|
|
|
|
{ \rho_1 , e \Downarrow v \quad v = 0 \quad \rho_1, s_2 \Rightarrow \rho_2}
|
|
|
|
|
{ \rho_1, \textbf{if}\ e\ \{ s_1 \}\ \textbf{else}\ \{ s_2 \}\ \Rightarrow \rho_2 }
|
|
|
|
|
{{< /latex >}}
|
|
|
|
|
|
2024-11-03 16:58:06 -08:00
|
|
|
|
Now that we have rules for conditional statements, it will be surprisingly easy
|
|
|
|
|
to define the rules for `while` loops. A `while` loop will also have two rules,
|
|
|
|
|
one for when its condition is truthy and one for when it's falsey. However,
|
|
|
|
|
unlike the "false" case, a while loop will do nothing, leaving the environment
|
|
|
|
|
unchanged:
|
|
|
|
|
|
|
|
|
|
{{< latex >}}
|
|
|
|
|
\frac
|
|
|
|
|
{ \rho_1 , e \Downarrow v \quad v = 0 }
|
|
|
|
|
{ \rho_1 , \textbf{while}\ e\ \{ s \}\ \Rightarrow \rho_1 }
|
|
|
|
|
{{< /latex >}}
|
|
|
|
|
|
|
|
|
|
The trickiest rule is for when the condition of a `while` loop is true.
|
|
|
|
|
We evaluate the body once, starting in environment \(\rho_1\) and finishing
|
|
|
|
|
in \(\rho_2\), but then we're not done. In fact, we have to go back to the top,
|
|
|
|
|
and check the condition again, starting over. As a result, we include another
|
|
|
|
|
premise, that tells us that evaluating the loop starting at \(\rho_2\), we
|
|
|
|
|
eventually end in state \(\rho_3\). This encodes the "rest of the iterations"
|
|
|
|
|
in addition to the one we just performed. The environment \(\rho_3\) is our
|
|
|
|
|
final state, so that's what we use in the rule's conclusion.
|
|
|
|
|
|
|
|
|
|
{{< latex >}}
|
|
|
|
|
\frac
|
|
|
|
|
{ \rho_1 , e \Downarrow v \quad v \neq 0 \quad \rho_1 , s \Rightarrow \rho_2 \quad \rho_2 , \textbf{while}\ e\ \{ s \}\ \Rightarrow \rho_3 }
|
|
|
|
|
{ \rho_1 , \textbf{while}\ e\ \{ s \}\ \Rightarrow \rho_3 }
|
|
|
|
|
{{< /latex >}}
|
|
|
|
|
|
|
|
|
|
And that's it! We have now seen every rule that defines the little object language
|
2024-11-03 17:50:11 -08:00
|
|
|
|
I've been using for my Agda work. Below is a Bergamot widget that implements
|
|
|
|
|
these rules. Try the following program, which computes the `x`th power of two,
|
|
|
|
|
and stores it in `y`:
|
2024-11-03 16:58:06 -08:00
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
x = 5; y = 1; while (x) { y = y + y; x = x - 1 }
|
|
|
|
|
```
|
|
|
|
|
|
2024-10-13 13:31:47 -07:00
|
|
|
|
{{< bergamot_widget id="stmt-widget" query="" prompt="step(extend(extend(extend(empty, x, 17), y, 42), z, 0), TERM, ?env)" modes="Statement:Statement" >}}
|
2024-11-03 16:05:04 -08:00
|
|
|
|
hidden section "" {
|
|
|
|
|
Eq @ eq(?x, ?x) <-;
|
|
|
|
|
}
|
2024-10-13 13:31:47 -07:00
|
|
|
|
hidden section "" {
|
|
|
|
|
EvalNum @ eval(?rho, lit(?n), ?n) <- int(?n);
|
|
|
|
|
EvalVar @ eval(?rho, var(?x), ?v) <- inenv(?x, ?v, ?rho);
|
|
|
|
|
}
|
|
|
|
|
hidden section "" {
|
|
|
|
|
EvalPlus @ eval(?rho, plus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), add(?v_1, ?v_2, ?v);
|
|
|
|
|
EvalMinus @ eval(?rho, minus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), subtract(?v_1, ?v_2, ?v);
|
|
|
|
|
}
|
|
|
|
|
hidden section "" {
|
|
|
|
|
StepNoop @ stepbasic(?rho, noop, ?rho) <-;
|
|
|
|
|
StepAssign @ stepbasic(?rho, assign(?x, ?e), extend(?rho, ?x, ?v)) <- eval(?rho, ?e, ?v);
|
|
|
|
|
}
|
|
|
|
|
hidden section "" {
|
|
|
|
|
StepNoop @ stepbasic(?rho, noop, ?rho) <-;
|
|
|
|
|
StepAssign @ stepbasic(?rho, assign(?x, ?e), extend(?rho, ?x, ?v)) <- eval(?rho, ?e, ?v);
|
|
|
|
|
}
|
|
|
|
|
hidden section "" {
|
|
|
|
|
StepLiftBasic @ step(?rho_1, ?s, ?rho_2) <- stepbasic(?rho_1, ?s, ?rho_2);
|
|
|
|
|
}
|
|
|
|
|
section "" {
|
2024-11-03 16:05:04 -08:00
|
|
|
|
StepIfTrue @ step(?rho_1, if(?e, ?s_1, ?s_2), ?rho_2) <- eval(?rho_1, ?e, ?v), not(eq(?v, 0)), step(?rho_1, ?s_1, ?rho_2);
|
|
|
|
|
StepIfFalse @ step(?rho_1, if(?e, ?s_1, ?s_2), ?rho_2) <- eval(?rho_1, ?e, ?v), eq(?v, 0), step(?rho_1, ?s_2, ?rho_2);
|
|
|
|
|
StepWhileTrue @ step(?rho_1, while(?e, ?s), ?rho_3) <- eval(?rho_1, ?e, ?v), not(eq(?v, 0)), step(?rho_1, ?s, ?rho_2), step(?rho_2, while(?e, ?s), ?rho_3);
|
|
|
|
|
StepWhileFalse @ step(?rho_1, while(?e, ?s), ?rho_1) <- eval(?rho_1, ?e, ?v), eq(?v, 0);
|
2024-10-13 13:31:47 -07:00
|
|
|
|
StepSeq @ step(?rho_1, seq(?s_1, ?s_2), ?rho_3) <- step(?rho_1, ?s_1, ?rho_2), step(?rho_2, ?s_2, ?rho_3);
|
|
|
|
|
}
|
|
|
|
|
hidden section "" {
|
|
|
|
|
EnvTake @ inenv(?x, ?v, extend(?rho, ?x, ?v)) <-;
|
2024-11-03 16:05:04 -08:00
|
|
|
|
EnvSkip @ inenv(?x, ?v_1, extend(?rho, ?y, ?v_2)) <- inenv(?x, ?v_1, ?rho), not(eq(?x, ?y));
|
2024-10-13 13:31:47 -07:00
|
|
|
|
}
|
|
|
|
|
{{< /bergamot_widget >}}
|
2024-11-03 16:58:06 -08:00
|
|
|
|
|
2024-11-03 17:50:11 -08:00
|
|
|
|
As with all the other rules we've seen, the mathematical notation above can
|
|
|
|
|
be directly translated into Agda:
|
|
|
|
|
|
|
|
|
|
{{< codelines "Agda" "agda-spa/Language/Semantics.agda" 47 64 >}}
|
|
|
|
|
|
2024-11-03 16:58:06 -08:00
|
|
|
|
### Semantics as Ground Truth
|
|
|
|
|
|
|
|
|
|
Prior to this post, we had been talking about using lattices and monotone
|
|
|
|
|
functions for program analysis. The key problem with using this framework to
|
|
|
|
|
define analyses is that there are many monotone functions that produce complete
|
|
|
|
|
nonsese; their output is, at best, unrelated to the program they're supposed
|
|
|
|
|
to analyze. We don't want to write such functions, since having incorrect
|
|
|
|
|
information about the programs in question is unhelpful.
|
|
|
|
|
|
|
|
|
|
What does it mean for a function to produce correct information, though?
|
|
|
|
|
In the context of sign analysis, it would mean that if we say a variable `x` is `+`,
|
|
|
|
|
then evaluating the program will leave us in a state in which `x` is posive.
|
|
|
|
|
The semantics we defined in this post give us the "evaluating the program piece".
|
|
|
|
|
They establish what the programs _actually_ do, and we can use this ground
|
|
|
|
|
truth when checking that our analyses are correct. In subsequent posts, I will
|
|
|
|
|
prove the exact property I informally stated above: __for the program analyses
|
|
|
|
|
we define, things they "claim" about our program will match what actually happens
|
|
|
|
|
when executing the program using our semantics__.
|
|
|
|
|
|
|
|
|
|
A piece of the puzzle still remains: how are we going to use the monotone
|
|
|
|
|
functions we've been talking so much about? We need to figure out what to feed
|
|
|
|
|
to our analyses before we can prove their correctness.
|
|
|
|
|
|
|
|
|
|
I have an answer to that question: we will be using _control flow graphs_ (CFGs).
|
|
|
|
|
These are another program representation, one that's more commonly found in
|
|
|
|
|
compilers. I will show what they look like in the next post. I hope to see you
|
|
|
|
|
there!
|