Start working on the control flow graphs post
Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
This commit is contained in:
parent
8a471c6b45
commit
f2e424944e
@ -104,5 +104,5 @@ Here are the posts that I’ve written so far for this series:
|
||||
* {{< draftlink "Lattices of Finite Height" "03_spa_agda_fixed_height" >}}
|
||||
* {{< draftlink "The Fixed-Point Algorithm" "04_spa_agda_fixedpoint" >}}
|
||||
* {{< draftlink "Our Programming Language" "05_spa_agda_semantics" >}}
|
||||
* {{< draftlink "Control Flow Graphs" "06_spa_agda_cfg" >}}.
|
||||
* {{< draftlink "Connecting Semantics and Control Flow Graphs" "07_spa_agda_semantics_and_cfg" >}}.
|
||||
* {{< draftlink "Control Flow Graphs" "06_spa_agda_cfg" >}}
|
||||
* {{< draftlink "Connecting Semantics and Control Flow Graphs" "07_spa_agda_semantics_and_cfg" >}}
|
||||
|
@ -171,6 +171,7 @@ will be guaranteed to always execute without any decisions or jumps.
|
||||
The reason for this will become clearer in subsequent posts; I will foreshadow
|
||||
a bit by saying that consecutive simple statements can be placed into a single
|
||||
[basic block](https://en.wikipedia.org/wiki/Basic_block).
|
||||
{#introduce-simple-statements}
|
||||
|
||||
The following is a group of three simple statements:
|
||||
|
||||
|
15
content/blog/06_spa_agda_cfg/if-cfg.dot
Normal file
15
content/blog/06_spa_agda_cfg/if-cfg.dot
Normal file
@ -0,0 +1,15 @@
|
||||
digraph G {
|
||||
graph[dpi=300 fontsize=14 fontname="Courier New"];
|
||||
node[shape=rectangle style="filled" fillcolor="#fafafa" penwidth=0.5 color="#aaaaaa"];
|
||||
edge[arrowsize=0.3 color="#444444"]
|
||||
|
||||
node_begin [label="x = ...;\lx\l"]
|
||||
node_then [label="x = 1\l"]
|
||||
node_else [label="x = 0\l"]
|
||||
node_end [label="y = x\l"]
|
||||
|
||||
node_begin -> node_then
|
||||
node_begin -> node_else
|
||||
node_then -> node_end
|
||||
node_else -> node_end
|
||||
}
|
BIN
content/blog/06_spa_agda_cfg/if-cfg.png
Normal file
BIN
content/blog/06_spa_agda_cfg/if-cfg.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 21 KiB |
179
content/blog/06_spa_agda_cfg/index.md
Normal file
179
content/blog/06_spa_agda_cfg/index.md
Normal file
@ -0,0 +1,179 @@
|
||||
---
|
||||
title: "Implementing and Verifying \"Static Program Analysis\" in Agda, Part 6: Control Flow Graphs"
|
||||
series: "Static Program Analysis in Agda"
|
||||
description: "In this post, I show how I show an Agda definition of control flow graph building"
|
||||
date: 2024-11-13T17:32:42-07:00
|
||||
tags: ["Agda", "Programming Languages"]
|
||||
draft: true
|
||||
---
|
||||
|
||||
In the previous section, I've given a formal definition of the programming
|
||||
language that I've been trying to analyze. This formal definition serves
|
||||
as the "ground truth" for how our little imperative programs are executed;
|
||||
however, program analyses (especially in practice) seldom use the formal semantics
|
||||
as their subject matter. Instead, they focus on more pragmatic program
|
||||
representations from the world of compilers. One such representation are
|
||||
_Control Flow Graphs (CFGs)_.
|
||||
|
||||
Let's start by building some informal intuition. CFGs are pretty much what
|
||||
their name suggests. They are a type of [graph](https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)).
|
||||
Edges in CFGs represent how execution might jump from one piece of code to
|
||||
another (how control might flow).
|
||||
|
||||
For example, take the below program.
|
||||
|
||||
|
||||
```
|
||||
x = ...;
|
||||
if x {
|
||||
x = 1;
|
||||
} else {
|
||||
x = 0;
|
||||
}
|
||||
y = x;
|
||||
```
|
||||
|
||||
The CFG might look like this:
|
||||
|
||||
{{< figure src="if-cfg.png" label="CFG for simple `if`-`else` code." class="small" >}}
|
||||
|
||||
Here, the initialization of `x` with `...`, as well as the `if` condition (just `x`),
|
||||
are guaranteed to execute one after another, so they occupy a single node. From there,
|
||||
depending on the condition, the control flow can jump to one of the
|
||||
branches of the `if` statement: the "then" branch if the condition is true,
|
||||
and the "else" branch if the condition is false. As a result, there are two
|
||||
arrows coming out of the initial node. Once either branch is executed, control
|
||||
always jumps to the code right after the `if` statement (the `y = x`). Thus,
|
||||
both the `x = 1` and `x = 0` nodes have a single arrow to the `y = x` node.
|
||||
|
||||
As another example, if you had a loop:
|
||||
|
||||
```
|
||||
x = ...;
|
||||
while x {
|
||||
x = x - 1;
|
||||
}
|
||||
y = x;
|
||||
```
|
||||
|
||||
The CFG would look like this:
|
||||
|
||||
{{< figure src="while-cfg.png" label="CFG for simple `while` code." class="small" >}}
|
||||
|
||||
Here, condition of the loop (`x`) is not always guaranteed to execute together
|
||||
with the code that initializes `x`. That's because the condition of the loop
|
||||
is checked after every iteration, whereas the code before the loop is executed
|
||||
only once. As a result, `x = ...` and `x` occupy distinct CFG nodes. From there,
|
||||
the control flow can proceed in two different ways, depending on the value
|
||||
of `x`. If `x` is truthy, the program will proceed to the loop body (decrementing `x`).
|
||||
If `x` is falsy, the program will skip the loop body altogether, and go to the
|
||||
code right after the loop (`y = x`). This is indicated by the two arrows
|
||||
going out of the `x` node. After executing the body, we return to the condition
|
||||
of the loop to see if we need to run another iteration. Because of this, the
|
||||
decrementing node has an arrow back to the loop condition.
|
||||
|
||||
Now, let's be a bit more precise. Control Flow Graphs are defined as follows:
|
||||
|
||||
* __The nodes__ are [_basic blocks_](https://en.wikipedia.org/wiki/Basic_block).
|
||||
Paraphrasing Wikipedia's definition, a basic block is a piece of code that
|
||||
has only one entry point and one exit point.
|
||||
|
||||
The one-entry-point rule means that it's not possible to jump into the middle
|
||||
of the basic block, executing only half of its instructions. The execution of
|
||||
a basic block always begins at the top. Symmetrically, the one-exit-point
|
||||
rule means that you can't jump away to other code (even within the same block),
|
||||
skipping some instructions. The execution of a basic block always ends at
|
||||
the bottom.
|
||||
|
||||
As a result of these constraints, when running a basic block, you are
|
||||
guaranteed to execute every instruction in exactly the order they occur in,
|
||||
and execute each instruction exactly once.
|
||||
* __The edges__ are jumps between basic blocks. We've already seen how
|
||||
`if` and `while` statements introduce these jumps.
|
||||
|
||||
Basic blocks can only be made of code that doen't jump (otherwise,
|
||||
we violate the single-exit-point policy). In the previous post,
|
||||
we defined exactly this kind of code as [simple statements]({{< relref "05_spa_agda_semantics#introduce-simple-statements" >}}).
|
||||
So, in our control flow graph, nodes will be sequences of simple statements.
|
||||
{#list-basic-stmts}
|
||||
|
||||
### Control Flow Graphs in Agda
|
||||
|
||||
#### Basic Definition
|
||||
At an abstract level, it's easy to say "it's just a graph where X is Y" about
|
||||
anything. It's much harder to give a precise definition of such a graph,
|
||||
particularly if you want to rule out invalid graphs (e.g., ones with edges
|
||||
pointing nowhere). In Agda, I chose the represent a two lists: one of nodes,
|
||||
and one of edges. Each node is simply a list of `BasicStmt`s, as
|
||||
I described in a preceding paragraph. An edge is simply a pair of numbers,
|
||||
each number encoding the index of the node connected by the edge.
|
||||
|
||||
Here's where it gets a little complicated. I don't want to use plain natural
|
||||
numbers for indices, because that means you can easily introduce "broken" edge.
|
||||
For example, what if you have 4 nodes, and you have an edge `(5, 5)`? Therefore,
|
||||
I picked the finite natural numbers represented by [`Fin`](https://agda.github.io/agda-stdlib/v2.0/Data.Fin.Base.html#1154).
|
||||
|
||||
```Agda
|
||||
data Fin : ℕ → Set where
|
||||
zero : Fin (suc n)
|
||||
suc : (i : Fin n) → Fin (suc n)
|
||||
```
|
||||
|
||||
Specifically, `Fin n` is the type of natural numbers less than `n`. Following
|
||||
this definition, `Fin 3` represents the numbers `0`, `1` and `2`. These are
|
||||
represented using the same constructors as `Nat`: `zero` and `suc`. The type
|
||||
of `zero` is `Fin (suc n)` for any `n`; this makes sense because zero is less
|
||||
than any number plus one. For `suc,` the bound `n` of the input `i` is incremented
|
||||
by one, leading to another `suc n` in the final type. This makes sense because if
|
||||
`i < n`, then `i + 1 < n + 1`. I've previously explained this data type
|
||||
[in another post on this site]({{< relref "01_aoc_coq#aside-vectors-and-finite-mathbbn" >}}).
|
||||
|
||||
Here's my definition of `Graph`s written using `Fin`:
|
||||
|
||||
{{< codelines "Agda" "agda-spa/Language/Graphs.agda" 24 39 >}}
|
||||
|
||||
I explicitly used a `size` field, which determines how many nodes are in the
|
||||
graph, and serves both as the upper bound the edge indices as well as the
|
||||
size `nodes` field. From there, an index `Index` into the node list is
|
||||
{{< sidenote "right" "size-note" "just a natural number less than `size`," >}}
|
||||
Ther are <code>size</code> natural numbers less than <code>size</code>:<br>
|
||||
<code>0, 1, ..., size - 1</code>.
|
||||
{{< /sidenote >}}
|
||||
and an edge is just a pair of indices. The graph then contains a vector
|
||||
(exact-length list) `nodes` of all the basic blocks, and then a list of
|
||||
edges `edges`.
|
||||
|
||||
There are two fields here that I have not yet said anything about: `inputs`
|
||||
and `outputs`. When we have a complete CFG for our programs, these fields are
|
||||
totally unnecessary. However, as we are _building_ the CFG, these will come
|
||||
in handy, by telling us how to stitch together smaller sub-graphs that we've
|
||||
already built. Let's talk about that next.
|
||||
|
||||
#### Combining Graphs
|
||||
Suppose you're building a CFG for a program in the following form:
|
||||
|
||||
```
|
||||
code1;
|
||||
code2;
|
||||
```
|
||||
|
||||
Where `code1` and `code2` are arbitrary pieces of code, which could include
|
||||
statements, loops, and pretty much anything else. Besides the fact that they
|
||||
occur one after another, these pieces of code are unrelated, and we can
|
||||
build CFGs for each one them independently. However, the fact that `code1` and
|
||||
`code2` are in sequence means that the full control flow graph for the above
|
||||
program should have edges going from the nodes in `code1` to the nodes in `code2`.
|
||||
Of course, not _every_ node in `code1` should have such edges: that would
|
||||
mean that after executing any "basic" sequence of instructions, you could suddenly
|
||||
decide to skip the rest of `code1` and move on to executing `code2`.
|
||||
|
||||
Thus, we need to be more precise about what edges we need to insert; we want
|
||||
to insert edges between the "final" nodes in `code1` (where control ends up
|
||||
after `code1` is finished executing) and the "initial" nodes in `code2` (where
|
||||
control would begin once we started executing `code2`). Those are the `outputs`
|
||||
and `inputs`, respectively. When stitching together sequenced control graphs,
|
||||
we will connect each of the outputs of one to each of the inputs of the other.
|
||||
|
||||
This is defined by the operation `_↦_`:
|
||||
|
||||
{{< codelines "Agda" "agda-spa/Language/Graphs.agda" 72 83 >}}
|
15
content/blog/06_spa_agda_cfg/while-cfg.dot
Normal file
15
content/blog/06_spa_agda_cfg/while-cfg.dot
Normal file
@ -0,0 +1,15 @@
|
||||
digraph G {
|
||||
graph[dpi=300 fontsize=14 fontname="Courier New"];
|
||||
node[shape=rectangle style="filled" fillcolor="#fafafa" penwidth=0.5 color="#aaaaaa"];
|
||||
edge[arrowsize=0.3 color="#444444"]
|
||||
|
||||
node_begin [label="x = ...;\l"]
|
||||
node_cond [label="x\l"]
|
||||
node_body [label="x = x - 1\l"]
|
||||
node_end [label="y = x\l"]
|
||||
|
||||
node_begin -> node_cond
|
||||
node_cond -> node_body
|
||||
node_cond -> node_end
|
||||
node_body -> node_cond
|
||||
}
|
BIN
content/blog/06_spa_agda_cfg/while-cfg.png
Normal file
BIN
content/blog/06_spa_agda_cfg/while-cfg.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 14 KiB |
Loading…
Reference in New Issue
Block a user