Write up the "verified" portion of the forward analysis

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
This commit is contained in:
Danila Fedorin 2024-12-25 19:03:51 -08:00
parent fa180ee24e
commit 3b9c2edcdd
4 changed files with 569 additions and 1 deletions

View File

@ -55,6 +55,7 @@ Now, let's start with the least element of our lattice, denoted \(\bot\).
A lattice of finite height is guaranteed to have such an element. If it didn't, A lattice of finite height is guaranteed to have such an element. If it didn't,
we could always extend chains by tacking on a smaller element to their bottom, we could always extend chains by tacking on a smaller element to their bottom,
and then the lattice wouldn't have a finite height anymore. and then the lattice wouldn't have a finite height anymore.
{#start-least}
Now, apply \(f\) to \(\bot\) to get \(f(\bot)\). Since \(\bot\) is the least Now, apply \(f\) to \(\bot\) to get \(f(\bot)\). Since \(\bot\) is the least
element, it must be true that \(\bot \le f(\bot)\). Now, if it's "less than or equal", element, it must be true that \(\bot \le f(\bot)\). Now, if it's "less than or equal",

View File

@ -262,6 +262,7 @@ expressions, and the letter \(v\) to stand for values. Finally, we'll write
\(\rho, e \Downarrow v\) to say that "in an environment \(\rho\), expression \(e\) \(\rho, e \Downarrow v\) to say that "in an environment \(\rho\), expression \(e\)
evaluates to value \(v\)". Our two previous examples of evaluating `x+1` can evaluates to value \(v\)". Our two previous examples of evaluating `x+1` can
thus be written as follows: thus be written as follows:
{#notation-for-environments}
{{< latex >}} {{< latex >}}
\{ \texttt{x} \mapsto 42 \}, \texttt{x}+1 \Downarrow 43 \\ \{ \texttt{x} \mapsto 42 \}, \texttt{x}+1 \Downarrow 43 \\

View File

@ -10,7 +10,21 @@ draft: true
In the previous post, I showed that the Control Flow graphs we built of our In the previous post, I showed that the Control Flow graphs we built of our
programs match how they are really executed. This means that we can rely programs match how they are really executed. This means that we can rely
on these graphs to compute program information. In this post, we finally on these graphs to compute program information. In this post, we finally
get to compute that information. Let's jump right into it! get to compute that information. Here's a quick bit paraphrasing from last time
that provides a summary of our approach:
1. We will construct a finite-height lattice. Every single element of this
lattice will contain information about each variable at each node in the
Control Flow Graph.
2. We will then define a monotonic function that update this information using
the structure encoded in the CFGs edges and nodes.
3. Then, using the fixed-point algorithm, we will find the least element of the
lattice, which will give us a precise description of all program variables at
all points in the program.
4. Because we have just validated our CFGs to be faithful to the languages
semantics, well be able to prove that our algorithm produces accurate results.
Let's jump right into it!
### Choosing a Lattice ### Choosing a Lattice
A lot of this time, we have been [talking about lattices]({{< relref "01_spa_agda_lattices" >}}), A lot of this time, we have been [talking about lattices]({{< relref "01_spa_agda_lattices" >}}),
@ -59,6 +73,7 @@ split our programs into sequences of statements that are guaranteed to execute
together --- basic blocks. For our analysis, we'll keep per-variable for together --- basic blocks. For our analysis, we'll keep per-variable for
each basic block in the program. Since basic blocks are nodes in the Control Flow each basic block in the program. Since basic blocks are nodes in the Control Flow
Graph of our program, our whole lattice will be as follows: Graph of our program, our whole lattice will be as follows:
{#whole-lattice}
{{< latex >}} {{< latex >}}
\text{Info} \triangleq \text{NodeId} \to (\text{Variable} \to \text{Sign}) \text{Info} \triangleq \text{NodeId} \to (\text{Variable} \to \text{Sign})
@ -112,6 +127,7 @@ exact values of `x`, `y`, and `z`, leaving only their signs). The exact details
of how this partial evaluation is done are analysis-specific; in general, we of how this partial evaluation is done are analysis-specific; in general, we
simply require an analysis to provide an evaluator. We will define simply require an analysis to provide an evaluator. We will define
[an evaluator for the sign lattice below](#instantiating-with-the-sign-lattice). [an evaluator for the sign lattice below](#instantiating-with-the-sign-lattice).
{#general-evaluator}
{{< codelines "agda" "agda-spa/Analysis/Forward.agda" 166 167 >}} {{< codelines "agda" "agda-spa/Analysis/Forward.agda" 166 167 >}}
@ -136,6 +152,7 @@ current block. Early on, we saw that [the \((\sqcup)\) operator models disjuncti
\((\sqcup)\) to the variable-sign maps of all predecessors. The \((\sqcup)\) to the variable-sign maps of all predecessors. The
[reference _Static Program Analysis_ text](https://cs.au.dk/~amoeller/spa/) [reference _Static Program Analysis_ text](https://cs.au.dk/~amoeller/spa/)
calls this operation \(\text{JOIN}\): calls this operation \(\text{JOIN}\):
{#join-preds}
{{< latex >}} {{< latex >}}
\textit{JOIN}(v) = \bigsqcup_{w \in \textit{pred}(v)} \llbracket w \rrbracket \textit{JOIN}(v) = \bigsqcup_{w \in \textit{pred}(v)} \llbracket w \rrbracket
@ -272,6 +289,7 @@ Actually, we haven't yet seen that `updateVariablesFromStmt`. This is
a function that we can define using the user-provided abtract interpretation a function that we can define using the user-provided abtract interpretation
`eval`. Specifically, it handles the job of updating the sign of a variable `eval`. Specifically, it handles the job of updating the sign of a variable
once it has been assigned to (or doing nothing if the statement is a no-op). once it has been assigned to (or doing nothing if the statement is a no-op).
{#define-updateVariablesFromStmt}
{{< codelines "agda" "agda-spa/Analysis/Forward.agda" 191 193 >}} {{< codelines "agda" "agda-spa/Analysis/Forward.agda" 191 193 >}}

View File

@ -0,0 +1,548 @@
---
title: "Implementing and Verifying \"Static Program Analysis\" in Agda, Part 9: Verifying the Forward Analysis"
series: "Static Program Analysis in Agda"
description: "In this post, I prove that the sign analysis from the previous is correct"
date: 2024-12-17T19:51:00-08:00
tags: ["Agda", "Programming Languages"]
draft: true
left_align_code: true
---
In the previous post, we put together a number of powerful pieces of machinery
to construct a sign analysis. However, we still haven't verified that this
analysis produces correct results. For the most part, we already have the
tools required to demonstrate correctness; the most important one
is the [validity of our CFGs]({{< relref "07_spa_agda_semantics_and_cfg" >}})
relative to [the semantics of the little language]({{< relref "05_spa_agda_semantics" >}}).
### High-Level Algorithm
We'll keep working with the sign lattice as an example, keeping in mind
how what we do generalizes to a any lattice \(L\) describing a variable's
state. The general shape of our argument will be as follows, where I've underlined and
numbered assumptions or aspects that we have yet to provide.
1. Our fixed-point analysis from the previous section gave us a result \(r\)
that satisfies the following equation:
{{< latex >}}
r = \text{update}(\text{join}(r))
{{< /latex >}}
Above \(\text{join}\) applies the [predecessor-combining function]({{< relref "08_spa_agda_forward#join-preds" >}})
from the previous post to each state (corresponding to `joinAll` in Agda)
and \(\text{update}\) performs one round of abstract interpretation.
2. Because of the [correspondence of our semantics and CFGs]({{< relref "07_spa_agda_semantics_and_cfg" >}}),
each program evaluation in the form \(\rho, s \Rightarrow \rho'\)
corresponds to a path through the Control Flow Graph. Along the path,
each node contains simple statements, which correspond to intermediate steps
in evaluating the program. These will also be in the form
\(\rho_1, b \Rightarrow \rho_2\).
3. We will proceed iteratively, stepping through the trace one basic block at
a time. At each node in the graph:
* We will assume that the beginning state (the variables in \(\rho_1\)) are
{{< internal "correctly-described" >}}
correctly described
{{< /internal >}}
by one of the predecessors of the current node. Since
{{< internal "disjunction" >}}
joining represents "or"
{{< /internal >}},
that is the same
as saying that \(\text{join}(r)\)
contains an accurate description of \(\rho_1\).
* Because
{{< internal "abstract-interpretation" >}}
the abstract interpretation function preserves accurate descriptions
{{< /internal >}},
if \(\text{join}(r)\) contains an accurate description \(\rho_1\), then applying our
abstract interpretation function via \(\text{update}\) should result in
a map that contains an accurate-described \(\rho_2\). In other words, \(\text{update}(\text{join}(r))\)
describes \(\rho_2\) at the current block.
{{< internal "equivalence" >}}
By the equation above
{{< /internal >}}, that's the same as saying
\(r\) describes \(\rho_2\) at the current block.
* Since the trace is a path through a graph, there must be an edge from
the current basic block to the next. This means that the current basic
block is a predecessor of the next one. From the previous point, we know
that \(\rho_2\) is accurately described by this predecessor, fulfilling
our earlier assumption and allowing us to continue iteration.
So, what are the missing pieces?
1. We need to define what it means for a lattice (like our sign lattice)
to "correctly describe" what happens when evaluating a program for real.
For example, the \(+\) in sign analysis describes values that are bigger than zero,
and a map like `{x:+}` states that `x` can only take on positive values.
2. We've seen before [the \((\sqcup)\) operator models disjunction
("A or B")]({{< relref "01_spa_agda_lattices#lub-glub-or-and" >}}), but
that was only an informal observation; we'll need to specify it preceisely.
3. Each analysis [provides an abstract interpretation `eval` function]({{< relref "08_spa_agda_forward#general-evaluator" >}}).
However, until now, nothing has formally constrained this function; we could
return \(+\) in every case, even though that would not be accurate. We will
need, for each analysis, a proof that its `eval` preserves accurate descriptions.
4. The equalities between our lattice elements [are actually equivalences]({{< relref "01_spa_agda_lattices#definitional-equality" >}}),
which helps us use simpler representations of data structures. Thus, even
in statements of the fixed point algorithm, our final result is a value \(a\)
such that \(a \approx f(a)\). We need to prove that our notion of equivalent
lattice elements plays nicely with correctness.
Let's start with the first bullet point.
### A Formal Definition of Correctness
When a variable is mapped to a particular sign (like `{ "x": + }`),
what that really says is that the value of `x` is greater than zero. Recalling
from [the post about our language's semantics]({{< relref "05_spa_agda_semantics#notation-for-environments" >}})
that we use the symbol \(\rho\) to represent mappings of variables to
their values, we might write this claim as:
{{< latex >}}
\rho(\texttt{x}) > 0
{{< /latex >}}
This is a good start, but it's a little awkward defining the meaning of "plus"
by referring to the context in which it's used (the `{ "x": ... }` portion
of the expression above). Instead, let's associate with each sign (like \(+\)) a
predicate: a function that takes a value, and makes a claim about that value
("this is positive"):
{{< latex >}}
\llbracket + \rrbracket\ v = v > 0
{{< /latex >}}
The notation above is a little weird unless you, like me, have a background in
Programming Language Theory (❤️). This comes from [denotational semantics](https://en.wikipedia.org/wiki/Denotational_semantics);
generally, one writes:
{{< latex >}}
\llbracket \text{thing} \rrbracket = \text{the meaning of the thing}
{{< /latex >}}
Where \(\llbracket \cdot \rrbracket\) is really a function (we call it
the _semantic function_) that maps things to
their meaning. Then, the above equation is similar to the more familiar
\(f(x) = x+1\): function and arguments on the left, definition on the right. When
the "meaning of the thing" is itself a function, we could write it explicitly
using lambda-notation:
{{< latex >}}
\llbracket \text{thing} \rrbracket = \lambda x.\ \text{body of the function}
{{< /latex >}}
Or, we could use the Haskell style and write the new variable on the left of
the equality:
{{< latex >}}
\llbracket \text{thing} \rrbracket\ x = \text{body of the function}
{{< /latex >}}
That is precisely what I'm doing above with \(\llbracket + \rrbracket\).
With this in mind, we could define the entire semantic function for the
sign lattice as follows:
{{< latex >}}
\llbracket + \rrbracket\ v = v\ \texttt{>}\ 0 \\
\llbracket 0 \rrbracket\ v = v\ \texttt{=}\ 0 \\
\llbracket - \rrbracket\ v = v\ \texttt{<}\ 0 \\
\llbracket \top \rrbracket\ v = \text{true} \\
\llbracket \bot \rrbracket\ v = \text{false}
{{< /latex >}}
In Agda, the integer type already distinguishes between "negative natural" or
"positive natural" cases, which made it possible to define the semantic function
{{< sidenote "right" "without-note" "without using inequalities." >}}
Reasoning about inequalities is painful, sometimes requiring a number of
lemmas to arrive at a result that is intuitively obvious. Coq has a powerful
tactic called <a href="https://coq.inria.fr/doc/v8.11/refman/addendum/micromega.html#coq:tacn.lia"><code>lia</code></a>
that automatically solves systems of inequalities, and I use it liberally.
However, lacking such a tactic in Agda, I would like to avoid inequalities
if they are not needed.
{{< /sidenote >}}
{{< codelines "agda" "agda-spa/Analysis/Sign.agda" 114 119 >}}
Notably, \(\llbracket \top \rrbracket\ v\) always holds, and
\(\llbracket \bot \rrbracket\ v\) never does. __In general__, we will always
need to define a semantic function for whatever lattice we are choosing for
our analysis.
It's important to remember from the previous post that the sign lattice
(or, more generally, our lattice \(L\)) is only a component of the
[lattice we use to instantiate the analysis]({{< relref "08_spa_agda_forward#whole-lattice" >}}).
We at least need to define what it means for the \(\text{Variable} \to \text{Sign}\)
portion of that lattice to be correct. This way, we'll have correctness
criteria for each key (CFG node) in the top-level \(\text{Info}\) lattice.
Since a map from variables to their sign characterizes not a single value \(v\)
but a whole environment \(\rho\), something like this is a good start:
{{< latex >}}
\llbracket \texttt{\{} x_1: s_1, ..., x_n: s_n \texttt{\}} \rrbracket\ \rho = \llbracket s_1 \rrbracket\ \rho(x_1)\ \text{and}\ ...\ \text{and}\ \llbracket s_n \rrbracket\ \rho(x_n)
{{< /latex >}}
As a concrete example, we might get:
{{< latex >}}
\llbracket \texttt{\{} \texttt{x}: +, \texttt{y}: - \texttt{\}} \rrbracket\ \rho = \rho(\texttt{x})\ \texttt{>}\ 0\ \text{and}\ \rho(\texttt{y})\ \texttt{<}\ 0
{{< /latex >}}
This is pretty good, but not quite right. For instance, the initial state of
the program --- before running the analysis --- assigns \(\bot\) to each
element. This is true because our fixed-point algorithm [starts with the least
element of the lattice]({{< relref "04_spa_agda_fixedpoint#start-least" >}}).
But even for a single-variable map `{x: ⊥ }`, the semantic function above would
give:
{{< latex >}}
\llbracket \texttt{\{} \texttt{x}: \bot \texttt{\}} \rrbracket\ \rho = \text{false}
{{< /latex >}}
That's clearly not right: our initial state should be possible, lest
the entire proof be just a convoluted [_ex falso_](https://en.wikipedia.org/wiki/Principle_of_explosion)!
There is another tricky aspect of our analysis, which is primarily defined
[using the join (\(\sqcup\)) operator]({{< relref "08_spa_agda_forward#join-preds" >}}).
Observe the following example:
```C
// initial state: { x: ⊥ }
if b {
x = 1; // state: { x: + }
} else {
// state unchanged: { x: ⊥ }
}
// state: { x: + } ⊔ { x: ⊥ } = { x: + }
```
Notice that in the final state, the sign of `x` is `+`, even though when
`b` is `false`, the variable is never set. In a simple language like ours,
without variable declaration points, this is probably the best we could hope
for. The crucial observation, though, is that the oddness only comes into
play with variables that are not set. In the "initial state" case, none
of the variables have been modified; in the `else` case of the conditional,
`x` was never assigned to. We can thus relax our condition to an if-then:
if a variable is in our environment \(\rho\), then the variable-sign lattice's
interpretation accurately describes it.
{{< latex >}}
\begin{array}{ccc}
\llbracket \texttt{\{} x_1: s_1, ..., x_n: s_n \texttt{\}} \rrbracket\ \rho & = & & \textbf{if}\ x_1 \in \rho\ \textbf{then}\ \llbracket s_1 \rrbracket\ \rho(x_1)\ \\ & & \text{and} & ... \\ & & \text{and} & \textbf{if}\ x_n \in \rho\ \textbf{then}\ \llbracket s_n \rrbracket\ \rho(x_n)
\end{array}
{{< /latex >}}
The first "weird" case now results in the following:
{{< latex >}}
\llbracket \texttt{\{} \texttt{x}: \bot \texttt{\}} \rrbracket\ \rho = \textbf{if}\ \texttt{x} \in \rho\ \textbf{then}\ \text{false}
{{< /latex >}}
Which is just another way of saying:
{{< latex >}}
\llbracket \texttt{\{} \texttt{x}: \bot \texttt{\}} \rrbracket\ \rho = \texttt{x} \notin \rho
{{< /latex >}}
In the second case, the interpretation also results in a true statement:
{{< latex >}}
\llbracket \texttt{\{} \texttt{x}: + \texttt{\}} \rrbracket\ \rho = \textbf{if}\ \texttt{x} \in \rho\ \textbf{then}\ \texttt{x} > 0
{{< /latex >}}
In Agda, I encode the fact that a verified analysis needs a semantic function
\(\llbracket\cdot\rrbracket\) for its element lattice \(L\) by taking such
a function as an argument called `⟦_⟧ˡ`:
{{< codelines "Agda" "agda-spa/Analysis/Forward.agda" 246 253 "hl_lines=5" >}}
I then define the semantic function for the variable-sign lattice in the following
way, which eschews the "..." notation in favor of a more Agda-compatible (and
equivalent) form:
{{< codelines "Agda" "agda-spa/Analysis/Forward.agda" 255 256 >}}
The above reads roughly as follows:
> For every variable `k` and sign [or, more generally, lattice element] `l` in
> the variable map lattice, if `k` is in the environment `ρ`, then it satisfies
> the predicate given by the semantic function applied to `l`.
Let's recap: we have defined a semantic function for our sign lattice, and
noted that to define a verified analysis, we always need such a semantic function.
We then showed how to construct a semantic function for a whole variable map
(of type \(\text{Variable} \to \text{Sign}\), or \(\text{Variable}\to L\)
in general). We also wrote some Agda code doing all this. As a result, we
have filled in the missing piece for {{< internalref "correctly-described" >}}property{{< /internalref >}}.
However, the way that we brought in the semantic function in the Agda code
above hints that there's more to be discussed. What's `latticeInterpretationˡ`?
In answering that question, we'll provide evidence for
{{< internalref "disjunction" >}}property{{< /internalref >}}
and
{{< internalref "equivalence" >}}property{{< /internalref >}}.
### Properties of the Semantic Function
As we briefly saw earlier, we loosened the notion of equality to that equivalences,
which made it possible to ignore things like the ordering of key-value pairs
in maps. That's great and all, but nothing is stopping us from defining semantic functions that violate our equivalence!
Supposing \(a \approx f(a)\), as far
as Agda is concerned, even though \(a\) and \(f(a)\) are "equivalent",
\(\llbracket a \rrbracket\) and \(\llbracket f(a) \rrbracket\) may be
totally different. For a semantic function to be correct, it must produce
the same predicate for equivalent elements of lattice \(L\). That's
{{< internalref "equivalence" >}}missing piece{{< /internalref >}}.
Another property of semantic functions that we will need to formalize
is that \((\sqcup)\) represents disjunction.
This comes into play when we reason about the correctness of predecessors in
a Control Flow Graph. Recall that during the last step of processing a given node,
when we are trying to move on to the next node in the trace, we have knowledge
that the current node's variable map accurately describes the intermediate
environment. In other words, \(\llbracket \textit{vs}_i \rrbracket\ \rho_2\) holds, where
\(\textit{vs}_i\) is the variable map for the current node. We can generalize this
kowledge a little, and get:
{{< latex >}}
\llbracket \textit{vs}_1 \rrbracket\ \rho_2\ \text{or}\ ...\ \text{or}\ \llbracket \textit{vs}_n \rrbracket\ \rho_2
{{< /latex >}}
However, the assumption that we _need_ to hold when moving on to a new node
is in terms of \(\textit{JOIN}\), which combines all the predecessors' maps
\(\textit{vs}_1, ..., \textit{vs}_n\) using \((\sqcup)\). Thus, we will need to be in a world where
the following claim is true:
{{< latex >}}
\llbracket \textit{vs}_1 \sqcup ... \sqcup \textit{vs}_n \rrbracket\ \rho
{{< /latex >}}
To get from one to the other, we will need to rely explicitly on the fact
that \((\sqcup)\) encodes "or". It's not necessary for the forward analysis,
but a similar property ought to hold for \((\sqcap)\) and "and". This
constraint provides {{< internalref "disjunction" >}}missing piece{{< /internalref >}}.
I defined a new data type that bundles a semantic function with proofs of
the properties in this section; that's precisely what `latticeInterpretationˡ`
is:
{{< codelines "Agda" "agda-spa/Language/Semantics.agda" 66 73 >}}
In short, to leverage the framework for verified analysis, you would need to
provide a semantic function that interacts properly with `≈` and ``.
### Correctness of the Evaluator
All that's left is {{< internalref "abstract-interpretation" >}}the last missing piece, {{< /internalref >}},
which requires that `eval` matches the semantics of our language. Recall
the signature of `eval`:
{{< codelines "agda" "agda-spa/Analysis/Forward.agda" 166 166 >}}
It operates on expressions and variable maps, which themselves associate a
sign (or, generally, an element of lattice \(L\)), with each variable. The
"real" evaluation judgement, on the other hand, is in the form
\(\rho, e \Downarrow v\), and reads "expression \(e\) in environment \(\rho\)
evaluates to value \(v\)". In Agda:
{{< codelines "agda" "agda-spa/Language/Semantics.agda" 27 27 >}}
Let's line up the types of `eval` and the judgement. I'll swap the order of arguments
for `eval` to make the correspondence easier to see:
{{< latex >}}
\begin{array}{ccccccc}
\text{eval} & : & (\text{Variable} \to \text{Sign}) & \to & \text{Expr} & \to & \text{Sign} \\
\cdot,\cdot\Downarrow\cdot & : & (\text{Variable} \to \text{Value}) & \to & \text{Expr} & \to & \text{Value} & \to & \text{Set} \\
& & \underbrace{\phantom{(\text{Variable} \to \text{Value})}}_{\text{environment-like inputs}} & & & & \underbrace{\phantom{Value}}_{\text{value-like outputs}}
\end{array}
{{< /latex >}}
Squinting a little, it's almost like the signature of `eval` is the signature
for the evaluation judgement, but it forgets a few details (the exact values
of the variables) in favor of abstractions (their signs). To show that `eval`
behaves correctly, we'll want to prove that this forgetful correspondence holds.
Concretely, for any expression \(e\), take some environment \(\rho\), and "forget"
the exact values, getting a sign map \(\textit{vs}\). Now, evaluate the expression
to some value \(v\) using the semantics, and also, compute the expression's
expected sign \(s\) using `eval`. The sign should be the same as forgetting
\(v\)'s exact value. Mathematically,
{{< latex >}}
\forall e, \rho, v, \textit{vs}.\ \textbf{if}\ \llbracket\textit{vs}\rrbracket \rho\ \text{and}\ \rho, e \Downarrow v\ \textbf{then}\ \llbracket \text{eval}\ \textit{vs}\ e\rrbracket v
{{< /latex >}}
In Agda:
{{< codelines "agda" "agda-spa/Analysis/Forward.agda" 286 287 >}}
For a concrete analysis, we need to prove the above claim. In the case of
sign analysis, this boils down to a rather cumbersome proof by cases. I will collapse
the proofs to save some space and avoid overwhelming the reader.
{{< codelines "agda" "agda-spa/Analysis/Sign.agda" 237 258 "" "**(Click here to expand the proof of correctness for plus)**" >}}
{{< codelines "agda" "agda-spa/Analysis/Sign.agda" 261 282 "" "**(Click here to expand the proof of correctness for minus)**" >}}
{{< codelines "agda" "agda-spa/Analysis/Sign.agda" 284 294 "" >}}
This completes {{< internalref "abstract-interpretation" >}}our last missing piece,{{< /internalref >}}.
All that's left is to put everything together.
### Proving The Analysis Correct
#### Lifting Expression Evaluation Correctness to Statements
The individual analyses (like the sign analysis) provide only an evaluation
function for _expressions_, and thus only have to prove correctness of
that function. However, our language is made up of statements, with judgements
in the form \(\rho, s \Rightarrow \rho'\). Now that we've shown (or assumed)
that `eval` behaves correctly when evaluating expressions, we should show
that this correctness extends to evaluating statements, which in the
forward analysis implementation is handled by the
[`updateVariablesFromStmt` function]({{< relref "08_spa_agda_forward#define-updateVariablesFromStmt" >}}).
The property we need to show looks very similar to the property for `eval`:
{{< latex >}}
\forall b, \rho, \rho', \textit{vs}.\ \textbf{if}\ \llbracket\textit{vs}\rrbracket \rho\ \text{and}\ \rho, b \Rightarrow \rho'\ \textbf{then}\ \llbracket \text{updateVariablesFromStmt}\ \textit{vs}\ b\rrbracket \rho'
{{< /latex >}}
In Agda:
{{< codelines "agda" "agda-spa/Analysis/Forward.agda" 291 291 >}}
The proof is straightforward, and relies on the semantics of the [map update]({{< relref "08_spa_agda_forward#generalized-update" >}}).
Specifically, in the case of an assignment statement \(x \leftarrow e\), all we
do is store the new sign computed from \(e\) into the map at \(x\). To
prove the correctness of the entire final environment \(\rho'\), there are
two cases to consider:
* A variable in question is the newly-updated \(x\). In this case, since
`eval` produces correct signs, the variable clearly has the correct sign.
This is the first highlighted chunk in the below code.
* A variable in question is different from \(x\). In this case, its value
in the environment \(\rho'\) should be the same as it was prior, and
its sign in the updated variable map is the same as it was in the original.
Since the original map correctly described the original environment, we know
the sign is correct. This is the second highlighted chunk in the below
code.
The corresponding Agda proof is as follows:
{{< codelines "agda" "agda-spa/Analysis/Forward.agda" 291 305 "hl_lines=5-7 10-15" >}}
From this, it follows with relative ease that each basic block in the lattice,
when evaluated, produces an environment that matches the prediction of our
forward analysis.
{{< codelines "agda" "agda-spa/Analysis/Forward.agda" 318 318 >}}
#### Walking the Trace
Finally, we get to the meat of the proof, which follows the [outline](#high-level-algorithm). First,
let's take a look at `stepTrace`, which implements the second bullet in
our iterative procedure. I'll show the code, then we can discuss it
in detail.
{{< codelines "agda" "agda-spa/Analysis/Forward.agda" 324 342 >}}
The first `let`-bound variable, `⟦joinAll-result⟧ρ₁` is kind of an intermediate
result, which I was forced to introduced because `rewrite` caused Agda to
allocate ~100GB of memory. It simply makes use of the fact that `joinAll`, the
function that performs predecessor joining for each node in the CFG, sets
every key of the map accordingly.
The second `let`-bound variable, `⟦analyze-result⟧`, steps through a given
node's basic block and leverages our proof of statement-correctness to validate
that the final environment `ρ₂` matches the predication of the analyzer.
The last two `let`-bound variables apply the equation we wrote above:
{{< latex >}}
r = \text{update}(\text{join}(r))
{{< /latex >}}
Recall that `analyze` is the combination of `update` and `join`:
{{< codelines "agda" "agda-spa/Analysis/Forward.agda" 226 227 >}}
Finally, the `in` portion of the code uses `⟦⟧ᵛ-respects-≈ᵛ`, a proof
of {{< internalref "equivalence" >}}property{{< /internalref >}}, to produce
the final claim in terms of the `result` map.
Knowing how to step, we can finally walk the entire trace, implementing
the iterative process:
{{< codelines "agda" "agda-spa/Analysis/Forward.agda" 344 357 >}}
The first step --- assuming that one of the predecessors of the
current node satisfies the initial environment `ρ₁` --- is captured by
the presence of the argument `⟦joinForKey-s₁⟧ρ₁`. We expect the calling code
to provide a proof of that.
The second step, in both cases, is implemented using `stepTrace`,
as we saw above. That results in a proof that at the end of the current basic
block, the final environment `ρ₂` is accurately described.
From there, we move on to the third iterative step, if necessary. The
sub-expression `edge⇒incoming s₁→s₂` validates that, since we have an edge
from the current node to the next, we are listed as a predecessor. This,
in turn, means that we are included in the list of states-to-join for the
\(\textit{JOIN}\) function. That fact is stored in `s₁∈incomingStates`.
Finally, relying on
{{< internalref "disjunction" >}}property{{< /internalref >}},
we construct an assumption fit for a recursive invocation of `walkTrace`,
and move on to the next CFG node. The `foldr` here is motivated by the fact
that "summation" using \((\sqcup)\) is a fold.
When the function terminates, what we have is a proof that the final program
state is accurately described by the results of our program analysis. All
that's left is to kick off the walk. To do that, observe that the initial state
has no predecessors (how could it, if it's at the beginning of the program?).
That, in turn, means that this state maps every variable to the bottom element.
Such a variable configuration only permits the empty environment \(\rho = \varnothing\).
If the program evaluation starts in an empty environment, we have the assumption
needed to kick off the iteration.
{{< codelines "agda" "agda-spa/Analysis/Forward.agda" 359 366 "hl_lines=7" >}}
Take a look at the highlighted line in the above code block in particular.
It states precisely what we were hoping to see: that, when evaluating
a program, the final state when it terminates is accurately described by
the `result` of our static program analysis at the `finalState` in the CFG.
We have done it!
### Future Work
It took a lot of machinery to get where we are, but there's still lots of
things to do.
1. __Correctness beyond the final state__: the statement we've arrived at
only shows that the final state of the program matches the results of
the analysis. In fact, the property hold for all intermediate states, too.
The only snag is that it's more difficult to _state_ such a claim.
To do something like that, we probably need a notion of "incomplete evaluations"
of our language, which run our program but stop at some point before the end.
A full execution would be a special case of such an "incomplete evaluation"
that stops in the final state. Then, we could restate `analyze-correct`
in terms of partial evaluations, which would strengthen it.
2. __A more robust language and evaluation process__: we noted above that
our join-based analysis is a little bit weird, particularly in the
cases of uninitialized variables. There are ways to adjust our language
(e.g., introducing variable declaration points) and analysis functions
(e.g., only allowing assignment for declared variables) to reduce
the weirdness somewhat. They just lead to a more complicated language.
3. __A more general correctness condition__: converting lattice elements into
predicates on values gets us far. However, some types of analyses make claims
about more than the _current_ values of variables. For instance, _live variable
analysis_ checks if a variable's current value is going to be used in the
future. Such an analysis can help guide register (re)allocation. To
talk about future uses of a variable, the predicate will need to be formulated
in terms of the entire evaluation proof tree. This opens a whole can
of worms that I haven't begun to examine.
Now that I'm done writing up my code so far, I will start exploring these
various avenues of work. In the meantime, though, thanks for reading!