550 lines
24 KiB
Markdown
550 lines
24 KiB
Markdown
---
|
||
title: "Implementing and Verifying \"Static Program Analysis\" in Agda, Part 5: Our Programming Language"
|
||
series: "Static Program Analysis in Agda"
|
||
description: "In this post, I define the language that well serve as the object of our vartious analyses"
|
||
date: 2024-11-03T17:50:27-08:00
|
||
tags: ["Agda", "Programming Languages"]
|
||
custom_js: ["parser.js"]
|
||
bergamot:
|
||
render_presets:
|
||
default: "bergamot/rendering/imp.bergamot"
|
||
input_modes:
|
||
- name: "Expression"
|
||
fn: "window.parseExpr"
|
||
- name: "Basic Statement"
|
||
fn: "window.parseBasicStmt"
|
||
- name: "Statement"
|
||
fn: "window.parseStmt"
|
||
---
|
||
|
||
In the previous several posts, I've formalized the notion of lattices, which
|
||
are an essential ingredient to formalizing the analyses in Anders Møller's
|
||
lecture notes. However, there can be no program analysis without a program
|
||
to analyze! In this post, I will define the (very simple) language that we
|
||
will be analyzing. An essential aspect of the language is its
|
||
[semantics](https://en.wikipedia.org/wiki/Semantics_(computer_science)), which
|
||
simply speaking explains what each feature of the language does. At the end
|
||
of the previous article, I gave the following _inference rule_ which defined
|
||
(partially) how the `if`-`else` statement in the language works.
|
||
|
||
{{< latex >}}
|
||
\frac{\rho_1, e \Downarrow z \quad \neg (z = 0) \quad \rho_1,s_1 \Downarrow \rho_2}
|
||
{\rho_1, \textbf{if}\ e\ \textbf{then}\ s_1\ \textbf{else}\ s_2\ \Downarrow\ \rho_2}
|
||
{{< /latex >}}
|
||
|
||
Like I mentioned then, this rule reads as follows:
|
||
|
||
> If the condition of an `if`-`else` statement evaluates to a nonzero value,
|
||
> then to evaluate the statement, you evaluate its `then` branch.
|
||
|
||
Another similar --- but crucially, not equivlalent -- rule is the following:
|
||
|
||
{{< latex >}}
|
||
\frac{\rho_1, e \Downarrow z \quad z = 1 \quad \rho_1,s_1 \Downarrow \rho_2}
|
||
{\rho_1, \textbf{if}\ e\ \textbf{then}\ s_1\ \textbf{else}\ s_2\ \Downarrow\ \rho_2}
|
||
{{< /latex >}}
|
||
|
||
This time, the English interpretation of the rule is as follows:
|
||
|
||
> If the condition of an `if`-`else` statement evaluates to one,
|
||
> then to evaluate the statement, you evaluate its `then` branch.
|
||
|
||
These rules are certainly not equivalent. For instance, the former allows
|
||
the "then" branch to be executed when the condition is `2`; however, in
|
||
the latter, the value of the conditional must be `1`. If our analysis were
|
||
intelligent (our first few will not be), then this difference would change
|
||
its output when determining the signs of the following program:
|
||
|
||
```
|
||
x = 2
|
||
if x {
|
||
y = - 1
|
||
} else {
|
||
y = 1
|
||
}
|
||
```
|
||
|
||
Using the first, more "relaxed" rule, the condition would be considered "true",
|
||
and the sign of `y` would be `-`. On the other hand, using the second,
|
||
"stricter" rule, the sign of `y` would be `+`. I stress that in this case,
|
||
I am showing a flow-sensitive analysis (one that can understand control flow
|
||
and make more specific predictions); for our simplest analyses, we will not
|
||
be aiming for flow-sensitivity. There is plenty of work to do even then.
|
||
|
||
The point of showing these two distinct rules is that we need to be very precise
|
||
about how the language will behave, because our analyses depend on that behavior.
|
||
|
||
Let's not get ahead of ourselves, though. I've motivated the need for semantics,
|
||
but there is much groundwork to be laid before we delve into the precise rules
|
||
of our language. After all, to define the language's semantics, we need to
|
||
have a language.
|
||
|
||
### The Syntax of Our Simple Language
|
||
|
||
I've shown a couple of examples our our language now, and there won't be that
|
||
much more to it. We can start with _expressions_: things that evaluate to
|
||
something. Some examples of expressions are `1`, `x`, and `2-(x+y)`. For our
|
||
specific language, the precise set of possible expressions can be given
|
||
by the following [Context-Free Grammar](https://en.wikipedia.org/wiki/Context-free_grammar):
|
||
|
||
{{< latex >}}
|
||
\begin{array}{rcll}
|
||
e & ::= & x & \text{(variables)} \\
|
||
& | & z & \text{(integer literals)} \\
|
||
& | & e + e & \text{(addition)} \\
|
||
& | & e - e & \text{(subtraction)}
|
||
\end{array}
|
||
{{< /latex >}}
|
||
|
||
The above can be read as follows:
|
||
|
||
> An expression \(e\) is one of the following things:
|
||
> 1. Some variable \(x\) [importantly \(x\) is a placeholder for _any_ variable,
|
||
> which could be `x` or `y` in our program code; specifically, \(x\) is
|
||
> a [_metavariable_](https://en.wikipedia.org/wiki/Metavariable).]
|
||
> 2. Some integer \(z\) [once again, \(z\) can be any integer, like 1, -42, etc.].
|
||
> 3. The addition of two other expressions [which could themselves be additions etc.].
|
||
> 4. The subtraction of two other expressions [which could also themselves be additions, subtractions, etc.].
|
||
|
||
Since expressions can be nested within other expressions --- which is necessary
|
||
to allow complicated code like `2-(x+y)` above --- they form a tree. Each node
|
||
is one of the elements of the grammar above (variable, addition, etc.). If
|
||
a node contains sub-expressions (like addition and subtraction do), then
|
||
these sub-expressions form sub-trees of the given node. This data structure
|
||
is called an [Abstract Syntax Tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree).
|
||
|
||
Notably, though `2-(x+y)` has parentheses, our grammar above does not include
|
||
include them as a case. The reason for this is that the structure of an
|
||
abstract syntax tree is sufficient to encode the order in which the operations
|
||
should be evaluated. Since I lack a nice way of drawing ASTs, I will use
|
||
an ASCII drawing to show an example.
|
||
|
||
```
|
||
Expression: 2 - (x+y)
|
||
(-)
|
||
/ \
|
||
2 (+)
|
||
/ \
|
||
x y
|
||
|
||
|
||
Expression: (2-x) + y
|
||
(+)
|
||
/ \
|
||
(-) y
|
||
/ \
|
||
2 x
|
||
```
|
||
|
||
Above, in the first AST, `(+)` is a child of the `(-)` node, which means
|
||
that it's a sub-expression. As a result, that subexpression is evaluated first,
|
||
before evaluating `(-)`, and so, the AST expresents `2-(x+y)`. In the other
|
||
example, `(-)` is a child of `(+)`, and is therefore evaluated first. The resulting
|
||
association encoded by that AST is `(2-x)+y`.
|
||
|
||
To an Agda programmer, the one-of-four-things definition above should read
|
||
quite similarly to the definition of an algebraic data type. Indeed, this
|
||
is how we can encode the abstract syntax tree of expressions:
|
||
|
||
{{< codelines "Agda" "agda-spa/Language/Base.agda" 12 16 >}}
|
||
|
||
The only departure from the grammar above is that I had to invent constructors
|
||
for the variable and integer cases, since Agda doesn't support implicit coercions.
|
||
This adds a little bit of extra overhead, requiring, for example, that we write
|
||
numbers as `# 42` instead of `42`.
|
||
|
||
Having defined expressions, the next thing on the menu is _statements_. Unlike
|
||
expressions, which just produce values, statements "do something"; an example
|
||
of a statement might be the following Python line:
|
||
|
||
```Python
|
||
print("Hello, world!")
|
||
```
|
||
|
||
The `print` function doesn't produce any value, but it does perform an action;
|
||
it prints its argument to the console!
|
||
|
||
For the formalization, it turns out to be convenient to separate "simple"
|
||
statements from "complex" ones. Pragmatically speaking, the difference is that
|
||
between the "simple" and the "complex" is control flow; simple statements
|
||
will be guaranteed to always execute without any decisions or jumps.
|
||
The reason for this will become clearer in subsequent posts; I will foreshadow
|
||
a bit by saying that consecutive simple statements can be placed into a single
|
||
[basic block](https://en.wikipedia.org/wiki/Basic_block).
|
||
|
||
The following is a group of three simple statements:
|
||
|
||
```
|
||
x = 1
|
||
y = x + 2
|
||
noop
|
||
```
|
||
|
||
These will always be executed in the same order, exactly once. Here, `noop`
|
||
is a convenient type of statement that simply does nothing.
|
||
|
||
On the other hand, the following statement is not simple:
|
||
|
||
```
|
||
while x {
|
||
x = x - 1
|
||
}
|
||
```
|
||
|
||
It's not simple because it makes decisions about how the code should be executed;
|
||
if `x` is nonzero, it will try executing the statement in the body of the loop
|
||
(`x = x - 1`). Otherwise, it would skip evaluating that statement, and carry on
|
||
with subsequent code.
|
||
|
||
I first define simple statements using the `BasicStmt` type:
|
||
|
||
{{< codelines "Agda" "agda-spa/Language/Base.agda" 18 20 >}}
|
||
|
||
Complex statements are just called `Stmt`; they include loops, conditionals and
|
||
sequences ---
|
||
{{< sidenote "right" "then-note" "\(s_1\ \text{then}\ s_2\)" >}}
|
||
The standard notation for sequencing in imperative languages is
|
||
\(s_1; s_2\). However, Agda gives special meaning to the semicolon,
|
||
and I couldn't find any passable symbolic alternatives.
|
||
{{< /sidenote >}} is a sequence where \(s_2\) is evaluated after \(s_1\).
|
||
Complex statements subsume simple statements, which I model using the constructor
|
||
`⟨_⟩`.
|
||
|
||
{{< codelines "Agda" "agda-spa/Language/Base.agda" 25 29 >}}
|
||
|
||
For an example of using this encoding, take the following simple program:
|
||
|
||
```
|
||
var = 1
|
||
if var {
|
||
x = 1
|
||
}
|
||
```
|
||
|
||
The Agda version is:
|
||
|
||
{{< codelines "Agda" "agda-spa/Main.agda" 27 34 >}}
|
||
|
||
Notice how we used `noop` to express the fact that the `else` branch of the
|
||
conditional does nothing.
|
||
|
||
### The Semantics of Our Language
|
||
|
||
We now have all the language constructs that I'll be showing off --- because
|
||
those are all the concepts that I've formalized. What's left is to define
|
||
how they behave. We will do this using a logical tool called
|
||
[_inference rules_](https://en.wikipedia.org/wiki/Rule_of_inference). I've
|
||
written about them a number of times; they're ubiquitous, particularly in the
|
||
sorts of things I like explore on this site. The [section on inference rules]({{< relref "01_aoc_coq#inference-rules" >}})
|
||
from my Advent of Code series is pretty relevant, and [the notation section from
|
||
a post in my compiler series]({{< relref "03_compiler_typechecking#some-notation" >}}) says
|
||
much the same thing; I won't be re-describing them here.
|
||
|
||
There are three pieces which demand semantics: expressions, simple statements,
|
||
and non-simple statements. The semantics of each of the three requires the
|
||
semantics of the items that precede it. We will therefore start with expressions.
|
||
|
||
#### Expressions
|
||
|
||
The trickiest thing about expression is that the value of an expression depends
|
||
on the "context": `x+1` can evaluate to `43` if `x` is `42`, or it can evaluate
|
||
to `0` if `x` is `-1`. To evaluate an expression, we will therefore need to
|
||
assign values to all of the variables in that expression. A mapping that
|
||
assigns values to variables is typically called an _environment_. We will write
|
||
\(\varnothing\) for "empty environment", and \(\{\texttt{x} \mapsto 42, \texttt{y} \mapsto -1 \}\) for
|
||
an environment that maps the variable \(\texttt{x}\) to 42, and the variable \(\texttt{y}\) to -1.
|
||
|
||
Now, a bit of notation. We will use the letter \(\rho\) to represent environments
|
||
(and if several environments are involved, we will occasionally number them
|
||
as \(\rho_1\), \(\rho_2\), etc.) We will use the letter \(e\) to stand for
|
||
expressions, and the letter \(v\) to stand for values. Finally, we'll write
|
||
\(\rho, e \Downarrow v\) to say that "in an environment \(\rho\), expression \(e\)
|
||
evaluates to value \(v\)". Our two previous examples of evaluating `x+1` can
|
||
thus be written as follows:
|
||
|
||
{{< latex >}}
|
||
\{ \texttt{x} \mapsto 42 \}, \texttt{x}+1 \Downarrow 43 \\
|
||
\{ \texttt{x} \mapsto -1 \}, \texttt{x}+1 \Downarrow 0 \\
|
||
{{< /latex >}}
|
||
|
||
Now, on to the actual rules for how to evaluate expressions. Most simply,
|
||
integer literals like `1` just evaluate to themselves.
|
||
|
||
{{< latex >}}
|
||
\frac{n \in \text{Int}}{\rho, n \Downarrow n}
|
||
{{< /latex >}}
|
||
|
||
Note that the letter \(\rho\) is completely unused in the above rule. That's
|
||
because no matter what values _variables_ have, a number still evaluates to
|
||
the same value. As we've already established, the same is not true for a
|
||
variable like \(\texttt{x}\). To evaluate such a variable, we need to retrieve
|
||
the value it's mapped to in the current environment, which we will write as
|
||
\(\rho(\texttt{x})\). This gives the following inference rule:
|
||
|
||
{{< latex >}}
|
||
\frac{\rho(x) = v}{\rho, x \Downarrow v}
|
||
{{< /latex >}}
|
||
|
||
All that's left is to define addition and subtraction. For an expression in the
|
||
form \(e_1+e_2\), we first need to evaluate the two subexpressions \(e_1\)
|
||
and \(e_2\), and then add the two resulting numbers. As a result, the addition
|
||
rule includes two additional premises, one for evaluating each summand.
|
||
|
||
{{< latex >}}
|
||
\frac
|
||
{\rho, e_1 \Downarrow v_1 \quad \rho, e_2 \Downarrow v_2 \quad v_1 + v_2 = v}
|
||
{\rho, e_1+e_2 \Downarrow v}
|
||
{{< /latex >}}
|
||
|
||
The subtraction rule is similar. Below, I've configured an instance of
|
||
[Bergamot]({{< relref "bergamot" >}}) to interpret these exact rules. Try
|
||
typing various expressions like `1`, `1+1`, etc. into the input box below
|
||
to see them evaluate. If you click the "Full Proof Tree" button, you can also view
|
||
the exact rules that were used in computing a particular value. The variables
|
||
`x`, `y`, and `z` are pre-defined for your convenience.
|
||
|
||
{{< bergamot_widget id="expr-widget" query="" prompt="eval(extend(extend(extend(empty, x, 17), y, 42), z, 0), TERM, ?v)" modes="Expression:Expression" >}}
|
||
hidden section "" {
|
||
Eq @ eq(?x, ?x) <-;
|
||
}
|
||
section "" {
|
||
EvalNum @ eval(?rho, lit(?n), ?n) <- int(?n);
|
||
EvalVar @ eval(?rho, var(?x), ?v) <- inenv(?x, ?v, ?rho);
|
||
}
|
||
section "" {
|
||
EvalPlus @ eval(?rho, plus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), add(?v_1, ?v_2, ?v);
|
||
EvalMinus @ eval(?rho, minus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), subtract(?v_1, ?v_2, ?v);
|
||
}
|
||
hidden section "" {
|
||
EnvTake @ inenv(?x, ?v, extend(?rho, ?x, ?v)) <-;
|
||
EnvSkip @ inenv(?x, ?v_1, extend(?rho, ?y, ?v_2)) <- inenv(?x, ?v_1, ?rho), not(eq(?x, ?y));
|
||
}
|
||
{{< /bergamot_widget >}}
|
||
|
||
The Agda equivalent of this looks very similar to the rules themselves. I use
|
||
`⇒ᵉ` instead of \(\Downarrow\), and there's a little bit of tedium with
|
||
wrapping integers into a new `Value` type. I also used a (partial) relation
|
||
`(x, v) ∈ ρ` instead of explicitly defining accessing an environment, since
|
||
it is conceivable for a user to attempt accessing a variable that has not
|
||
been assigned to. Aside from these notational changes, the structure of each
|
||
of the constructors of the evaluation data type matches the inference rules
|
||
I showed above.
|
||
|
||
{{< codelines "Agda" "agda-spa/Language/Semantics.agda" 27 35 >}}
|
||
|
||
#### Simple Statements
|
||
The main difference between formalizing (simple and "normal") statements is
|
||
that they modify the environment. If `x` has one value, writing `x = x + 1` will
|
||
certainly change that value. On the other hand, statements don't produce values.
|
||
So, we will be writing claims like \(\rho_1 , \textit{bs} \Rightarrow \rho_2\)
|
||
to say that the basic statement \(\textit{bs}\), when starting in environment
|
||
\(\rho_1\), will produce environment \(\rho_2\). Here's an example:
|
||
|
||
{{< latex >}}
|
||
\{ \texttt{x} \mapsto 42, \texttt{y} \mapsto 17 \}, \ \texttt{x = x - \text{1}} \Rightarrow \{ \texttt{x} \mapsto 41, \texttt{y} \mapsto 17 \}
|
||
{{< /latex >}}
|
||
|
||
Here, we subtracted one from a variable with value `42`, leaving it with a new
|
||
value of `41`.
|
||
|
||
There are two basic statements, and one of them quite literally does nothing.
|
||
The inference rule for `noop` is very simple:
|
||
|
||
{{< latex >}}
|
||
\rho,\ \texttt{noop} \Rightarrow \rho
|
||
{{< /latex >}}
|
||
|
||
For the assignment rule, we need to know how to evaluate the expression on the
|
||
right side of the equal sign. This is why we needed to define the semantics
|
||
of expressions first. Given those, the evaluation rule for assignment is as
|
||
follows, with \(\rho[x \mapsto v]\) meaning "the environment \(\rho\) but
|
||
mapping the variable \(x\) to value \(v\)".
|
||
|
||
{{< latex >}}
|
||
\frac
|
||
{\rho, e \Downarrow v}
|
||
{\rho, x = e \Rightarrow \rho[x \mapsto v]}
|
||
{{< /latex >}}
|
||
|
||
Those are actually all the rules we need, and below, I am once again configuring
|
||
a Bergamot instance, this time with simple statements. Try out `noop` or some
|
||
sort of variable assignment, like `x = x + 1`.
|
||
|
||
{{< bergamot_widget id="basic-stmt-widget" query="" prompt="stepbasic(extend(extend(extend(empty, x, 17), y, 42), z, 0), TERM, ?env)" modes="Basic Statement:Basic Statement" >}}
|
||
hidden section "" {
|
||
Eq @ eq(?x, ?x) <-;
|
||
}
|
||
hidden section "" {
|
||
EvalNum @ eval(?rho, lit(?n), ?n) <- int(?n);
|
||
EvalVar @ eval(?rho, var(?x), ?v) <- inenv(?x, ?v, ?rho);
|
||
}
|
||
hidden section "" {
|
||
EvalPlus @ eval(?rho, plus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), add(?v_1, ?v_2, ?v);
|
||
EvalMinus @ eval(?rho, minus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), subtract(?v_1, ?v_2, ?v);
|
||
}
|
||
section "" {
|
||
StepNoop @ stepbasic(?rho, noop, ?rho) <-;
|
||
StepAssign @ stepbasic(?rho, assign(?x, ?e), extend(?rho, ?x, ?v)) <- eval(?rho, ?e, ?v);
|
||
}
|
||
hidden section "" {
|
||
EnvTake @ inenv(?x, ?v, extend(?rho, ?x, ?v)) <-;
|
||
EnvSkip @ inenv(?x, ?v_1, extend(?rho, ?y, ?v_2)) <- inenv(?x, ?v_1, ?rho), not(eq(?x, ?y));
|
||
}
|
||
{{< /bergamot_widget >}}
|
||
|
||
The Agda implementation is once again just a data type with constructors-for-rules.
|
||
This time they also look quite similar to the rules I've shown up until now,
|
||
though I continue to explicitly quantify over variables like `ρ`.
|
||
|
||
{{< codelines "Agda" "agda-spa/Language/Semantics.agda" 37 40 >}}
|
||
|
||
#### Statements
|
||
|
||
Let's work on non-simple statements next. The easiest rule to define is probably
|
||
sequencing. When we use `then` (or `;`) to combine two statements, what we
|
||
actually want is to execute the first statement, which may change variables,
|
||
and then execute the second statement while keeping the changes from the first.
|
||
This means there are three environments: \(\rho_1\) for the initial state before
|
||
either statement is executed, \(\rho_2\) for the state between executing the
|
||
first and second statement, and \(\rho_3\) for the final state after both
|
||
are done executing. This leads to the following rule:
|
||
|
||
{{< latex >}}
|
||
\frac
|
||
{ \rho_1, s_1 \Rightarrow \rho_2 \quad \rho_2, s_2 \Rightarrow \rho_3 }
|
||
{ \rho_1, s_1; s_2 \Rightarrow \rho_3 }
|
||
{{< /latex >}}
|
||
|
||
We will actually need two rules to evaluate the conditional statement: one
|
||
for when the condition evaluates to "true", and one for when the condition
|
||
evaluates to "false". Only, I never specified booleans as being part of
|
||
the language, which means that we will need to come up what "false" and "true"
|
||
are. I will take my cue from C++ and use zero as "false", and any other number
|
||
as "true".
|
||
|
||
If the condition of an `if`-`else` statement is "true" (nonzero), then the
|
||
effect of executing the `if`-`else` should be the same as executing the "then"
|
||
part of the statement, while completely ignoring the "else" part.
|
||
|
||
{{< latex >}}
|
||
\frac
|
||
{ \rho_1 , e \Downarrow v \quad v \neq 0 \quad \rho_1, s_1 \Rightarrow \rho_2}
|
||
{ \rho_1, \textbf{if}\ e\ \{ s_1 \}\ \textbf{else}\ \{ s_2 \}\ \Rightarrow \rho_2 }
|
||
{{< /latex >}}
|
||
|
||
Notice that in the above rule, we used the evaluation judgement \(\rho_1, e \Downarrow v\)
|
||
to evaluate the _expression_ that serves as the condition. We then had an
|
||
additional premise that requires the truthiness of the resulting value \(v\).
|
||
The rule for evaluating a conditional with a "false" branch is very similar.
|
||
|
||
{{< latex >}}
|
||
\frac
|
||
{ \rho_1 , e \Downarrow v \quad v = 0 \quad \rho_1, s_2 \Rightarrow \rho_2}
|
||
{ \rho_1, \textbf{if}\ e\ \{ s_1 \}\ \textbf{else}\ \{ s_2 \}\ \Rightarrow \rho_2 }
|
||
{{< /latex >}}
|
||
|
||
Now that we have rules for conditional statements, it will be surprisingly easy
|
||
to define the rules for `while` loops. A `while` loop will also have two rules,
|
||
one for when its condition is truthy and one for when it's falsey. However,
|
||
unlike the "false" case, a while loop will do nothing, leaving the environment
|
||
unchanged:
|
||
|
||
{{< latex >}}
|
||
\frac
|
||
{ \rho_1 , e \Downarrow v \quad v = 0 }
|
||
{ \rho_1 , \textbf{while}\ e\ \{ s \}\ \Rightarrow \rho_1 }
|
||
{{< /latex >}}
|
||
|
||
The trickiest rule is for when the condition of a `while` loop is true.
|
||
We evaluate the body once, starting in environment \(\rho_1\) and finishing
|
||
in \(\rho_2\), but then we're not done. In fact, we have to go back to the top,
|
||
and check the condition again, starting over. As a result, we include another
|
||
premise, that tells us that evaluating the loop starting at \(\rho_2\), we
|
||
eventually end in state \(\rho_3\). This encodes the "rest of the iterations"
|
||
in addition to the one we just performed. The environment \(\rho_3\) is our
|
||
final state, so that's what we use in the rule's conclusion.
|
||
|
||
{{< latex >}}
|
||
\frac
|
||
{ \rho_1 , e \Downarrow v \quad v \neq 0 \quad \rho_1 , s \Rightarrow \rho_2 \quad \rho_2 , \textbf{while}\ e\ \{ s \}\ \Rightarrow \rho_3 }
|
||
{ \rho_1 , \textbf{while}\ e\ \{ s \}\ \Rightarrow \rho_3 }
|
||
{{< /latex >}}
|
||
|
||
And that's it! We have now seen every rule that defines the little object language
|
||
I've been using for my Agda work. Below is a Bergamot widget that implements
|
||
these rules. Try the following program, which computes the `x`th power of two,
|
||
and stores it in `y`:
|
||
|
||
```
|
||
x = 5; y = 1; while (x) { y = y + y; x = x - 1 }
|
||
```
|
||
|
||
{{< bergamot_widget id="stmt-widget" query="" prompt="step(extend(extend(extend(empty, x, 17), y, 42), z, 0), TERM, ?env)" modes="Statement:Statement" >}}
|
||
hidden section "" {
|
||
Eq @ eq(?x, ?x) <-;
|
||
}
|
||
hidden section "" {
|
||
EvalNum @ eval(?rho, lit(?n), ?n) <- int(?n);
|
||
EvalVar @ eval(?rho, var(?x), ?v) <- inenv(?x, ?v, ?rho);
|
||
}
|
||
hidden section "" {
|
||
EvalPlus @ eval(?rho, plus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), add(?v_1, ?v_2, ?v);
|
||
EvalMinus @ eval(?rho, minus(?e_1, ?e_2), ?v) <- eval(?rho, ?e_1, ?v_1), eval(?rho, ?e_2, ?v_2), subtract(?v_1, ?v_2, ?v);
|
||
}
|
||
hidden section "" {
|
||
StepNoop @ stepbasic(?rho, noop, ?rho) <-;
|
||
StepAssign @ stepbasic(?rho, assign(?x, ?e), extend(?rho, ?x, ?v)) <- eval(?rho, ?e, ?v);
|
||
}
|
||
hidden section "" {
|
||
StepNoop @ stepbasic(?rho, noop, ?rho) <-;
|
||
StepAssign @ stepbasic(?rho, assign(?x, ?e), extend(?rho, ?x, ?v)) <- eval(?rho, ?e, ?v);
|
||
}
|
||
hidden section "" {
|
||
StepLiftBasic @ step(?rho_1, ?s, ?rho_2) <- stepbasic(?rho_1, ?s, ?rho_2);
|
||
}
|
||
section "" {
|
||
StepIfTrue @ step(?rho_1, if(?e, ?s_1, ?s_2), ?rho_2) <- eval(?rho_1, ?e, ?v), not(eq(?v, 0)), step(?rho_1, ?s_1, ?rho_2);
|
||
StepIfFalse @ step(?rho_1, if(?e, ?s_1, ?s_2), ?rho_2) <- eval(?rho_1, ?e, ?v), eq(?v, 0), step(?rho_1, ?s_2, ?rho_2);
|
||
StepWhileTrue @ step(?rho_1, while(?e, ?s), ?rho_3) <- eval(?rho_1, ?e, ?v), not(eq(?v, 0)), step(?rho_1, ?s, ?rho_2), step(?rho_2, while(?e, ?s), ?rho_3);
|
||
StepWhileFalse @ step(?rho_1, while(?e, ?s), ?rho_1) <- eval(?rho_1, ?e, ?v), eq(?v, 0);
|
||
StepSeq @ step(?rho_1, seq(?s_1, ?s_2), ?rho_3) <- step(?rho_1, ?s_1, ?rho_2), step(?rho_2, ?s_2, ?rho_3);
|
||
}
|
||
hidden section "" {
|
||
EnvTake @ inenv(?x, ?v, extend(?rho, ?x, ?v)) <-;
|
||
EnvSkip @ inenv(?x, ?v_1, extend(?rho, ?y, ?v_2)) <- inenv(?x, ?v_1, ?rho), not(eq(?x, ?y));
|
||
}
|
||
{{< /bergamot_widget >}}
|
||
|
||
As with all the other rules we've seen, the mathematical notation above can
|
||
be directly translated into Agda:
|
||
|
||
{{< codelines "Agda" "agda-spa/Language/Semantics.agda" 47 64 >}}
|
||
|
||
### Semantics as Ground Truth
|
||
|
||
Prior to this post, we had been talking about using lattices and monotone
|
||
functions for program analysis. The key problem with using this framework to
|
||
define analyses is that there are many monotone functions that produce complete
|
||
nonsese; their output is, at best, unrelated to the program they're supposed
|
||
to analyze. We don't want to write such functions, since having incorrect
|
||
information about the programs in question is unhelpful.
|
||
|
||
What does it mean for a function to produce correct information, though?
|
||
In the context of sign analysis, it would mean that if we say a variable `x` is `+`,
|
||
then evaluating the program will leave us in a state in which `x` is posive.
|
||
The semantics we defined in this post give us the "evaluating the program piece".
|
||
They establish what the programs _actually_ do, and we can use this ground
|
||
truth when checking that our analyses are correct. In subsequent posts, I will
|
||
prove the exact property I informally stated above: __for the program analyses
|
||
we define, things they "claim" about our program will match what actually happens
|
||
when executing the program using our semantics__.
|
||
|
||
A piece of the puzzle still remains: how are we going to use the monotone
|
||
functions we've been talking so much about? We need to figure out what to feed
|
||
to our analyses before we can prove their correctness.
|
||
|
||
I have an answer to that question: we will be using _control flow graphs_ (CFGs).
|
||
These are another program representation, one that's more commonly found in
|
||
compilers. I will show what they look like in the next post. I hope to see you
|
||
there!
|