2024-11-13 19:57:08 -08:00
|
|
|
|
---
|
|
|
|
|
title: "Implementing and Verifying \"Static Program Analysis\" in Agda, Part 6: Control Flow Graphs"
|
|
|
|
|
series: "Static Program Analysis in Agda"
|
2024-11-27 16:34:03 -08:00
|
|
|
|
description: "In this post, I show an Agda definition of Control Flow Graphs and their construction"
|
2024-11-27 16:27:26 -08:00
|
|
|
|
date: 2024-11-27T16:26:42-07:00
|
2024-11-13 19:57:08 -08:00
|
|
|
|
tags: ["Agda", "Programming Languages"]
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
In the previous section, I've given a formal definition of the programming
|
|
|
|
|
language that I've been trying to analyze. This formal definition serves
|
|
|
|
|
as the "ground truth" for how our little imperative programs are executed;
|
2024-11-16 15:25:26 -08:00
|
|
|
|
however, program analyses (especially in practice) seldom take the formal
|
|
|
|
|
semantics as input. Instead, they focus on more pragmatic program
|
2024-11-13 19:57:08 -08:00
|
|
|
|
representations from the world of compilers. One such representation are
|
2024-11-16 15:25:26 -08:00
|
|
|
|
_Control Flow Graphs (CFGs)_. That's what I want to discuss in this post.
|
2024-11-13 19:57:08 -08:00
|
|
|
|
|
|
|
|
|
Let's start by building some informal intuition. CFGs are pretty much what
|
2024-11-16 15:25:26 -08:00
|
|
|
|
their name suggests: they are a type of [graph](https://en.wikipedia.org/wiki/Graph_(discrete_mathematics));
|
|
|
|
|
their edges show how execution might jump from one piece of code to
|
2024-11-13 19:57:08 -08:00
|
|
|
|
another (how control might flow).
|
|
|
|
|
|
|
|
|
|
For example, take the below program.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
x = ...;
|
|
|
|
|
if x {
|
|
|
|
|
x = 1;
|
|
|
|
|
} else {
|
|
|
|
|
x = 0;
|
|
|
|
|
}
|
|
|
|
|
y = x;
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The CFG might look like this:
|
|
|
|
|
|
|
|
|
|
{{< figure src="if-cfg.png" label="CFG for simple `if`-`else` code." class="small" >}}
|
|
|
|
|
|
|
|
|
|
Here, the initialization of `x` with `...`, as well as the `if` condition (just `x`),
|
|
|
|
|
are guaranteed to execute one after another, so they occupy a single node. From there,
|
|
|
|
|
depending on the condition, the control flow can jump to one of the
|
2024-11-16 15:25:26 -08:00
|
|
|
|
branches of the `if` statement: the "then" branch if the condition is truthy,
|
|
|
|
|
and the "else" branch if the condition is falsy. As a result, there are two
|
2024-11-13 19:57:08 -08:00
|
|
|
|
arrows coming out of the initial node. Once either branch is executed, control
|
|
|
|
|
always jumps to the code right after the `if` statement (the `y = x`). Thus,
|
|
|
|
|
both the `x = 1` and `x = 0` nodes have a single arrow to the `y = x` node.
|
|
|
|
|
|
|
|
|
|
As another example, if you had a loop:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
x = ...;
|
|
|
|
|
while x {
|
|
|
|
|
x = x - 1;
|
|
|
|
|
}
|
|
|
|
|
y = x;
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The CFG would look like this:
|
|
|
|
|
|
|
|
|
|
{{< figure src="while-cfg.png" label="CFG for simple `while` code." class="small" >}}
|
|
|
|
|
|
2024-11-16 15:25:26 -08:00
|
|
|
|
Here, the condition of the loop (`x`) is not always guaranteed to execute together
|
2024-11-13 19:57:08 -08:00
|
|
|
|
with the code that initializes `x`. That's because the condition of the loop
|
|
|
|
|
is checked after every iteration, whereas the code before the loop is executed
|
|
|
|
|
only once. As a result, `x = ...` and `x` occupy distinct CFG nodes. From there,
|
|
|
|
|
the control flow can proceed in two different ways, depending on the value
|
|
|
|
|
of `x`. If `x` is truthy, the program will proceed to the loop body (decrementing `x`).
|
|
|
|
|
If `x` is falsy, the program will skip the loop body altogether, and go to the
|
|
|
|
|
code right after the loop (`y = x`). This is indicated by the two arrows
|
|
|
|
|
going out of the `x` node. After executing the body, we return to the condition
|
|
|
|
|
of the loop to see if we need to run another iteration. Because of this, the
|
|
|
|
|
decrementing node has an arrow back to the loop condition.
|
|
|
|
|
|
|
|
|
|
Now, let's be a bit more precise. Control Flow Graphs are defined as follows:
|
|
|
|
|
|
|
|
|
|
* __The nodes__ are [_basic blocks_](https://en.wikipedia.org/wiki/Basic_block).
|
|
|
|
|
Paraphrasing Wikipedia's definition, a basic block is a piece of code that
|
|
|
|
|
has only one entry point and one exit point.
|
|
|
|
|
|
|
|
|
|
The one-entry-point rule means that it's not possible to jump into the middle
|
|
|
|
|
of the basic block, executing only half of its instructions. The execution of
|
|
|
|
|
a basic block always begins at the top. Symmetrically, the one-exit-point
|
2024-11-16 15:25:26 -08:00
|
|
|
|
rule means that you can't jump away to other code, skipping some instructions.
|
|
|
|
|
The execution of a basic block always ends at the bottom.
|
2024-11-13 19:57:08 -08:00
|
|
|
|
|
|
|
|
|
As a result of these constraints, when running a basic block, you are
|
|
|
|
|
guaranteed to execute every instruction in exactly the order they occur in,
|
|
|
|
|
and execute each instruction exactly once.
|
|
|
|
|
* __The edges__ are jumps between basic blocks. We've already seen how
|
|
|
|
|
`if` and `while` statements introduce these jumps.
|
|
|
|
|
|
2024-11-16 15:25:26 -08:00
|
|
|
|
Basic blocks can only be made of code that doesn't jump (otherwise,
|
2024-11-13 19:57:08 -08:00
|
|
|
|
we violate the single-exit-point policy). In the previous post,
|
|
|
|
|
we defined exactly this kind of code as [simple statements]({{< relref "05_spa_agda_semantics#introduce-simple-statements" >}}).
|
|
|
|
|
So, in our control flow graph, nodes will be sequences of simple statements.
|
|
|
|
|
{#list-basic-stmts}
|
|
|
|
|
|
|
|
|
|
### Control Flow Graphs in Agda
|
|
|
|
|
|
|
|
|
|
#### Basic Definition
|
|
|
|
|
At an abstract level, it's easy to say "it's just a graph where X is Y" about
|
|
|
|
|
anything. It's much harder to give a precise definition of such a graph,
|
|
|
|
|
particularly if you want to rule out invalid graphs (e.g., ones with edges
|
2024-11-16 15:25:26 -08:00
|
|
|
|
pointing nowhere). In Agda, I chose the represent a CFG with two lists: one of nodes,
|
2024-11-13 19:57:08 -08:00
|
|
|
|
and one of edges. Each node is simply a list of `BasicStmt`s, as
|
|
|
|
|
I described in a preceding paragraph. An edge is simply a pair of numbers,
|
|
|
|
|
each number encoding the index of the node connected by the edge.
|
|
|
|
|
|
|
|
|
|
Here's where it gets a little complicated. I don't want to use plain natural
|
|
|
|
|
numbers for indices, because that means you can easily introduce "broken" edge.
|
2024-11-27 16:27:26 -08:00
|
|
|
|
For example, what if you have 4 nodes, and you have an edge `(5, 5)`? To avoid
|
|
|
|
|
this, I picked the finite natural numbers represented by
|
|
|
|
|
[`Fin`](https://agda.github.io/agda-stdlib/v2.0/Data.Fin.Base.html#1154)
|
|
|
|
|
as endpoints for edges.
|
2024-11-13 19:57:08 -08:00
|
|
|
|
|
|
|
|
|
```Agda
|
|
|
|
|
data Fin : ℕ → Set where
|
|
|
|
|
zero : Fin (suc n)
|
|
|
|
|
suc : (i : Fin n) → Fin (suc n)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Specifically, `Fin n` is the type of natural numbers less than `n`. Following
|
|
|
|
|
this definition, `Fin 3` represents the numbers `0`, `1` and `2`. These are
|
|
|
|
|
represented using the same constructors as `Nat`: `zero` and `suc`. The type
|
|
|
|
|
of `zero` is `Fin (suc n)` for any `n`; this makes sense because zero is less
|
2024-11-16 15:25:26 -08:00
|
|
|
|
than any number plus one. For `suc`, the bound `n` of the input `i` is incremented
|
2024-11-13 19:57:08 -08:00
|
|
|
|
by one, leading to another `suc n` in the final type. This makes sense because if
|
|
|
|
|
`i < n`, then `i + 1 < n + 1`. I've previously explained this data type
|
|
|
|
|
[in another post on this site]({{< relref "01_aoc_coq#aside-vectors-and-finite-mathbbn" >}}).
|
|
|
|
|
|
|
|
|
|
Here's my definition of `Graph`s written using `Fin`:
|
|
|
|
|
|
|
|
|
|
{{< codelines "Agda" "agda-spa/Language/Graphs.agda" 24 39 >}}
|
|
|
|
|
|
|
|
|
|
I explicitly used a `size` field, which determines how many nodes are in the
|
2024-11-16 15:25:26 -08:00
|
|
|
|
graph, and serves as the upper bound for the edge indices. From there, an
|
|
|
|
|
index `Index` into the node list is
|
2024-11-13 19:57:08 -08:00
|
|
|
|
{{< sidenote "right" "size-note" "just a natural number less than `size`," >}}
|
|
|
|
|
Ther are <code>size</code> natural numbers less than <code>size</code>:<br>
|
|
|
|
|
<code>0, 1, ..., size - 1</code>.
|
|
|
|
|
{{< /sidenote >}}
|
|
|
|
|
and an edge is just a pair of indices. The graph then contains a vector
|
|
|
|
|
(exact-length list) `nodes` of all the basic blocks, and then a list of
|
|
|
|
|
edges `edges`.
|
|
|
|
|
|
|
|
|
|
There are two fields here that I have not yet said anything about: `inputs`
|
|
|
|
|
and `outputs`. When we have a complete CFG for our programs, these fields are
|
|
|
|
|
totally unnecessary. However, as we are _building_ the CFG, these will come
|
|
|
|
|
in handy, by telling us how to stitch together smaller sub-graphs that we've
|
|
|
|
|
already built. Let's talk about that next.
|
|
|
|
|
|
|
|
|
|
#### Combining Graphs
|
|
|
|
|
Suppose you're building a CFG for a program in the following form:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
code1;
|
|
|
|
|
code2;
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Where `code1` and `code2` are arbitrary pieces of code, which could include
|
|
|
|
|
statements, loops, and pretty much anything else. Besides the fact that they
|
|
|
|
|
occur one after another, these pieces of code are unrelated, and we can
|
|
|
|
|
build CFGs for each one them independently. However, the fact that `code1` and
|
|
|
|
|
`code2` are in sequence means that the full control flow graph for the above
|
|
|
|
|
program should have edges going from the nodes in `code1` to the nodes in `code2`.
|
|
|
|
|
Of course, not _every_ node in `code1` should have such edges: that would
|
|
|
|
|
mean that after executing any "basic" sequence of instructions, you could suddenly
|
|
|
|
|
decide to skip the rest of `code1` and move on to executing `code2`.
|
|
|
|
|
|
|
|
|
|
Thus, we need to be more precise about what edges we need to insert; we want
|
|
|
|
|
to insert edges between the "final" nodes in `code1` (where control ends up
|
|
|
|
|
after `code1` is finished executing) and the "initial" nodes in `code2` (where
|
|
|
|
|
control would begin once we started executing `code2`). Those are the `outputs`
|
|
|
|
|
and `inputs`, respectively. When stitching together sequenced control graphs,
|
|
|
|
|
we will connect each of the outputs of one to each of the inputs of the other.
|
|
|
|
|
|
2024-11-16 15:25:26 -08:00
|
|
|
|
This is defined by the operation `g₁ ↦ g₂`, which sequences two graphs `g₁` and `g₂`:
|
2024-11-13 19:57:08 -08:00
|
|
|
|
|
|
|
|
|
{{< codelines "Agda" "agda-spa/Language/Graphs.agda" 72 83 >}}
|
2024-11-16 15:25:26 -08:00
|
|
|
|
|
|
|
|
|
The definition starts out pretty innocuous, but gets a bit complicated by the
|
|
|
|
|
end. The sum of the numbers of nodes in the two operands becomes the new graph
|
|
|
|
|
size, and the nodes from the two graphs are all included in the result. Then,
|
|
|
|
|
the definitions start making use of various operators like `↑ˡᵉ` and `↑ʳᵉ`;
|
|
|
|
|
these deserve an explanation.
|
|
|
|
|
|
|
|
|
|
The tricky thing is that when we're concatenating lists of nodes, we are changing
|
|
|
|
|
some of the indices of the elements within. For instance, in the lists
|
|
|
|
|
`[x]` and `[y]`, the indices of both `x` and `y` are `0`; however, in the
|
|
|
|
|
concatenated list `[x, y]`, the index of `x` is still `0`, but the index of `y`
|
|
|
|
|
is `1`. More generally, when we concatenate two lists `l1` and `l2`, the indices
|
2024-11-27 16:27:26 -08:00
|
|
|
|
into `l1` remain unchanged, whereas the indices `l2` are shifted by `length l1`.
|
2024-11-28 20:33:08 -08:00
|
|
|
|
{#fin-reindexing}
|
2024-11-16 15:25:26 -08:00
|
|
|
|
|
|
|
|
|
Actually, that's not all there is to it. The _values_ of the indices into
|
|
|
|
|
the left list don't change, but their types do! They start as `Fin (length l1)`,
|
|
|
|
|
but for the whole list, these same indices will have type `Fin (length l1 + length l2))`.
|
|
|
|
|
|
|
|
|
|
To help deal with this, Agda provides the operators
|
|
|
|
|
[`↑ˡ`](https://agda.github.io/agda-stdlib/v2.0/Data.Fin.Base.html#2355)
|
|
|
|
|
and [`↑ʳ`](https://agda.github.io/agda-stdlib/v2.0/Data.Fin.Base.html#2522)
|
|
|
|
|
that implement this re-indexing and re-typing. The former implements "re-indexing
|
|
|
|
|
on the left" -- given an index into the left list `l1`, it changes its type
|
|
|
|
|
by adding the other list's length to it, but keeps the index value itself
|
|
|
|
|
unchanged. The latter implements "re-indexing on the right" -- given an index
|
|
|
|
|
into the right list `l2`, it adds the length of the first list to it (shifting it),
|
|
|
|
|
and does the same to its type.
|
|
|
|
|
|
|
|
|
|
The definition leads to the following equations:
|
|
|
|
|
|
|
|
|
|
```Agda
|
|
|
|
|
l1 : Vec A n
|
|
|
|
|
l2 : Vec A m
|
|
|
|
|
|
|
|
|
|
idx1 : Fin n -- index into l1
|
|
|
|
|
idx2 : Fin m -- index into l2
|
|
|
|
|
|
|
|
|
|
l1 [ idx1 ] ≡ (l1 ++ l2) [ idx1 ↑ˡ m ]
|
|
|
|
|
l2 [ idx2 ] ≡ (l1 ++ l2) [ n ↑ʳ idx2 ]
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The operators used in the definition above are just versions of the same
|
|
|
|
|
re-indexing operators. The `↑ˡᵉ` operator applies `↑ˡ` to all the (__e__)dges
|
2024-11-27 16:27:26 -08:00
|
|
|
|
in a graph, and the `↑ˡⁱ` applies it to all the (__i__)ndices in a list
|
2024-11-16 15:25:26 -08:00
|
|
|
|
(like `inputs` and `outputs`).
|
|
|
|
|
|
|
|
|
|
Given these definitions, hopefully the intent with the rest of the definition
|
|
|
|
|
is not too hard to see. The edges in the new graph come from three places:
|
|
|
|
|
the graph `g₁` and `g₂`, and from creating a new edge from each of the outputs
|
|
|
|
|
of `g₁` to each of the inputs of `g₂`. We keep the inputs of `g₁` as the
|
|
|
|
|
inputs of the whole graph (since `g₁` comes first), and symmetrically we keep
|
|
|
|
|
the outputs of `g₂`. Of course, we do have to re-index them to keep them
|
|
|
|
|
pointing at the right nodes.
|
|
|
|
|
|
|
|
|
|
Another operation we will need is "overlaying" two graphs: this will be like
|
|
|
|
|
placing them in parallel, without adding jumps between the two. We use this
|
|
|
|
|
operation when combining the sub-CFGs of the "if" and "else" branches of an
|
|
|
|
|
`if`/`else`, which both follow the condition, and both proceed to the code after
|
|
|
|
|
the conditional.
|
|
|
|
|
|
|
|
|
|
{{< codelines "Agda" "agda-spa/Language/Graphs.agda" 59 70 >}}
|
|
|
|
|
|
|
|
|
|
Everything here is just concatenation; we pool together the nodes, edges,
|
|
|
|
|
inputs, and outputs, and the main source of complexity is the re-indexing.
|
|
|
|
|
|
|
|
|
|
The one last operation, which we will use for `while` loops, is looping. This
|
|
|
|
|
operation simply connects the outputs of a graph back to its inputs (allowing
|
|
|
|
|
looping), and also allows the body to be skipped. This is slightly different
|
|
|
|
|
from the graph for `while` loops I showed above; the reason for that is that
|
|
|
|
|
I currently don't include the conditional expressions in my CFG. This is a
|
|
|
|
|
limitation that I will address in future work.
|
|
|
|
|
|
|
|
|
|
{{< codelines "Agda" "agda-spa/Language/Graphs.agda" 85 95 >}}
|
|
|
|
|
|
|
|
|
|
Given these thee operations, I construct Control Flow Graphs as follows, where
|
|
|
|
|
`singleton` creates a new CFG node with the given list of simple statements:
|
|
|
|
|
|
|
|
|
|
{{< codelines "Agda" "agda-spa/Language/Graphs.agda" 122 126 >}}
|
|
|
|
|
|
|
|
|
|
Throughout this, I've been liberal to include empty CFG nodes as was convenient.
|
|
|
|
|
This is a departure from the formal definition I gave above, but it makes
|
|
|
|
|
things much simpler.
|
|
|
|
|
|
|
|
|
|
### Additional Functions
|
|
|
|
|
|
|
|
|
|
To integrate Control Flow Graphs into our lattice-based program analyses, we'll
|
|
|
|
|
need to do a couple of things. First, upon reading the
|
|
|
|
|
[reference _Static Program Analysis_ text](https://cs.au.dk/~amoeller/spa/),
|
|
|
|
|
one sees a lot of quantification over the predecessors or successors of a
|
|
|
|
|
given CFG node. For example, the following equation is from Chapter 5:
|
|
|
|
|
|
|
|
|
|
{{< latex >}}
|
|
|
|
|
\textit{JOIN}(v) = \bigsqcup_{w \in \textit{pred}(v)} \llbracket w \rrbracket
|
|
|
|
|
{{< /latex >}}
|
|
|
|
|
|
|
|
|
|
To compute the \(\textit{JOIN}\) function (which we have not covered yet) for
|
|
|
|
|
a given CFG node, we need to iterate over all of its predecessors, and
|
|
|
|
|
combine their static information using \(\sqcup\), which I first
|
|
|
|
|
[explained several posts ago]({{< relref "01_spa_agda_lattices#least-upper-bound" >}}).
|
|
|
|
|
To be able to iterate over them, we need to be able to retrieve the predecessors
|
|
|
|
|
of a node from a graph!
|
|
|
|
|
|
|
|
|
|
Our encoding does not make computing the predecessors particularly easy; to
|
|
|
|
|
check if two nodes are connected, we need to check if an `Index`-`Index` pair
|
|
|
|
|
corresponding to the nodes is present in the `edges` list. To this end, we need
|
|
|
|
|
to be able to compare edges for equality. Fortunately, it's relatively
|
|
|
|
|
straightforward to show that our edges can be compared in such a way;
|
|
|
|
|
after all, they are just pairs of `Fin`s, and `Fin`s and products support
|
|
|
|
|
these comparisons.
|
|
|
|
|
|
|
|
|
|
{{< codelines "Agda" "agda-spa/Language/Graphs.agda" 149 152 >}}
|
|
|
|
|
|
|
|
|
|
Next, if we can compare edges for equality, we can check if an edge is in
|
|
|
|
|
a list. Agda provides a built-in function for this:
|
|
|
|
|
|
|
|
|
|
{{< codelines "Agda" "agda-spa/Language/Graphs.agda" 154 154 >}}
|
|
|
|
|
|
|
|
|
|
To find the predecessors of a particular node, we go through all other nodes
|
|
|
|
|
in the graph and see if there's an edge there between those nodes and the
|
|
|
|
|
current one. This is preferable to simply iterating over the edges because
|
|
|
|
|
we may have duplicates in that list (why not?).
|
|
|
|
|
|
|
|
|
|
{{< codelines "Agda" "agda-spa/Language/Graphs.agda" 165 166 >}}
|
|
|
|
|
|
2024-11-27 16:27:26 -08:00
|
|
|
|
Above, `indices` is a list of all the node identifiers in the graph. Since the
|
|
|
|
|
graph has `size` nodes, the indices of all these nodes are simply the values
|
|
|
|
|
`0`, `1`, ..., `size - 1`. I defined a special function `finValues` to compute
|
|
|
|
|
this list, together with a proof that this list is unique.
|
|
|
|
|
|
|
|
|
|
{{< codelines "Agda" "agda-spa/Language/Graphs.agda" 127 143 >}}
|
|
|
|
|
|
|
|
|
|
Another important property of `finValues` is that each node identifier is
|
|
|
|
|
present in the list, so that our computation written by traversing the node
|
|
|
|
|
list do not "miss" nodes.
|
|
|
|
|
|
|
|
|
|
{{< codelines "Agda" "agda-spa/Language/Graphs.agda" 145 147 >}}
|
|
|
|
|
|
|
|
|
|
We can specialize these definitions for a particular graph `g`:
|
|
|
|
|
|
|
|
|
|
{{< codelines "Agda" "agda-spa/Language/Graphs.agda" 156 163 >}}
|
|
|
|
|
|
|
|
|
|
To recap, we now have:
|
|
|
|
|
* A way to build control flow graphs from programs
|
|
|
|
|
* A list (unique'd and complete) of all nodes in the control flow graph so that
|
|
|
|
|
we can iterate over them when the algorithm demands.
|
|
|
|
|
* A 'predecessors' function, which will be used by our static program analyses,
|
|
|
|
|
implemented as an iteration over the list of nodes.
|
|
|
|
|
|
|
|
|
|
All that's left is to connect our `predecessors` function to edges in the graph.
|
|
|
|
|
The following definitions say that when an edge is in the graph, the starting
|
|
|
|
|
node is listed as a predecessor of the ending node, and vise versa.
|
|
|
|
|
|
|
|
|
|
{{< codelines "Agda" "agda-spa/Language/Graphs.agda" 168 177 >}}
|
|
|
|
|
|
|
|
|
|
### Connecting Two Distinct Representations
|
|
|
|
|
|
|
|
|
|
I've described Control Flow Graphs as a compiler-centric representation of the
|
|
|
|
|
program. Unlike the formal semantics from the previous post, CFGs do not reason
|
|
|
|
|
about the dynamic behavior of the code. Instead, they capture the possible
|
|
|
|
|
paths that execution can take through the instructions. In that
|
|
|
|
|
sense, they are more of an approximation of what the program will do. This is
|
|
|
|
|
good: because of [Rice's theorem](https://en.wikipedia.org/wiki/Rice%27s_theorem),
|
|
|
|
|
we can't do anything other than approximating without running the program.
|
|
|
|
|
|
|
|
|
|
However, an incorrect approximation is of no use at all. Since the CFGs we build
|
|
|
|
|
will be the core data type used by our program analyses, it's important that they
|
|
|
|
|
are an accurate, if incomplete, representation. Specifically, because most
|
|
|
|
|
of our analyses reason about possible outcomes --- we report what sign each
|
|
|
|
|
variable __could__ have, for instance --- it's important that we don't accidentally
|
|
|
|
|
omit cases that can happen in practice from our CFGs. Formally, this means
|
|
|
|
|
that for each possible execution of a program according to its semantics,
|
|
|
|
|
{{< sidenote "right" "converse-note" "there exists a corresponding path through the graph." >}}
|
|
|
|
|
The converse is desirable too: that the graph has only paths that correspond
|
|
|
|
|
to possible executions of the program. One graph that violates this property is
|
|
|
|
|
the strongly-connected graph of all basic blocks in a program. Analyzing
|
|
|
|
|
such a graph would give us an overly-conservative estimation; since anything
|
|
|
|
|
can happen, most of our answers will likely be too general to be of any use. If,
|
|
|
|
|
on the other hand, only the necessary graph connections exist, we can be more
|
|
|
|
|
precise.<br>
|
|
|
|
|
<br>
|
|
|
|
|
However, proving this converse property (or even stating it precisely) is much
|
|
|
|
|
harder, because our graphs are somewhat conservative already. There exist
|
|
|
|
|
programs in which the condition of an <code>if</code>-statement is always
|
|
|
|
|
evaluated to <code>false</code>, but our graphs always have edges for both
|
|
|
|
|
the "then" and "else" cases. Determining whether a condition is always false
|
|
|
|
|
(e.g.) is undecidable thanks to Rice's theorem (again), so we can't rule it out.
|
|
|
|
|
Instead, we could broaden "all possible executions"
|
|
|
|
|
to "all possible executions where branching conditions can produce arbitrary
|
|
|
|
|
results", but this is something else entirely.<br>
|
|
|
|
|
<br>
|
|
|
|
|
For the time being, I will leave this converse property aside. As a result,
|
|
|
|
|
our approximations might be "too careful". However, they will at the very least
|
|
|
|
|
be sound.
|
|
|
|
|
{{< /sidenote >}}
|
|
|
|
|
|
|
|
|
|
In the next post, I will prove that this property holds for the graphs shown
|
|
|
|
|
here and the formal semantics I defined earlier. I hope to see you there!
|