Edit and publish post on control flow graphs
Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
This commit is contained in:
parent
5de2ae1203
commit
19be7eb1f5
@ -2,7 +2,7 @@
|
||||
title: "Implementing and Verifying \"Static Program Analysis\" in Agda, Part 6: Control Flow Graphs"
|
||||
series: "Static Program Analysis in Agda"
|
||||
description: "In this post, I show how I show an Agda definition of control flow graph building"
|
||||
date: 2024-11-13T17:32:42-07:00
|
||||
date: 2024-11-27T16:26:42-07:00
|
||||
tags: ["Agda", "Programming Languages"]
|
||||
draft: true
|
||||
---
|
||||
@ -109,8 +109,10 @@ each number encoding the index of the node connected by the edge.
|
||||
|
||||
Here's where it gets a little complicated. I don't want to use plain natural
|
||||
numbers for indices, because that means you can easily introduce "broken" edge.
|
||||
For example, what if you have 4 nodes, and you have an edge `(5, 5)`? Therefore,
|
||||
I picked the finite natural numbers represented by [`Fin`](https://agda.github.io/agda-stdlib/v2.0/Data.Fin.Base.html#1154).
|
||||
For example, what if you have 4 nodes, and you have an edge `(5, 5)`? To avoid
|
||||
this, I picked the finite natural numbers represented by
|
||||
[`Fin`](https://agda.github.io/agda-stdlib/v2.0/Data.Fin.Base.html#1154)
|
||||
as endpoints for edges.
|
||||
|
||||
```Agda
|
||||
data Fin : ℕ → Set where
|
||||
@ -188,7 +190,7 @@ some of the indices of the elements within. For instance, in the lists
|
||||
`[x]` and `[y]`, the indices of both `x` and `y` are `0`; however, in the
|
||||
concatenated list `[x, y]`, the index of `x` is still `0`, but the index of `y`
|
||||
is `1`. More generally, when we concatenate two lists `l1` and `l2`, the indices
|
||||
into `l1` remain unchanged, whereas the indices `l2` are shifted `length l2`.
|
||||
into `l1` remain unchanged, whereas the indices `l2` are shifted by `length l1`.
|
||||
|
||||
Actually, that's not all there is to it. The _values_ of the indices into
|
||||
the left list don't change, but their types do! They start as `Fin (length l1)`,
|
||||
@ -219,7 +221,7 @@ l2 [ idx2 ] ≡ (l1 ++ l2) [ n ↑ʳ idx2 ]
|
||||
|
||||
The operators used in the definition above are just versions of the same
|
||||
re-indexing operators. The `↑ˡᵉ` operator applies `↑ˡ` to all the (__e__)dges
|
||||
in a graph, and the `↑ˡi` applies it to all the (__i__)ndices in a list
|
||||
in a graph, and the `↑ˡⁱ` applies it to all the (__i__)ndices in a list
|
||||
(like `inputs` and `outputs`).
|
||||
|
||||
Given these definitions, hopefully the intent with the rest of the definition
|
||||
@ -300,6 +302,76 @@ we may have duplicates in that list (why not?).
|
||||
|
||||
{{< codelines "Agda" "agda-spa/Language/Graphs.agda" 165 166 >}}
|
||||
|
||||
{{< todo >}}
|
||||
the rest
|
||||
{{< /todo >}}
|
||||
Above, `indices` is a list of all the node identifiers in the graph. Since the
|
||||
graph has `size` nodes, the indices of all these nodes are simply the values
|
||||
`0`, `1`, ..., `size - 1`. I defined a special function `finValues` to compute
|
||||
this list, together with a proof that this list is unique.
|
||||
|
||||
{{< codelines "Agda" "agda-spa/Language/Graphs.agda" 127 143 >}}
|
||||
|
||||
Another important property of `finValues` is that each node identifier is
|
||||
present in the list, so that our computation written by traversing the node
|
||||
list do not "miss" nodes.
|
||||
|
||||
{{< codelines "Agda" "agda-spa/Language/Graphs.agda" 145 147 >}}
|
||||
|
||||
We can specialize these definitions for a particular graph `g`:
|
||||
|
||||
{{< codelines "Agda" "agda-spa/Language/Graphs.agda" 156 163 >}}
|
||||
|
||||
To recap, we now have:
|
||||
* A way to build control flow graphs from programs
|
||||
* A list (unique'd and complete) of all nodes in the control flow graph so that
|
||||
we can iterate over them when the algorithm demands.
|
||||
* A 'predecessors' function, which will be used by our static program analyses,
|
||||
implemented as an iteration over the list of nodes.
|
||||
|
||||
All that's left is to connect our `predecessors` function to edges in the graph.
|
||||
The following definitions say that when an edge is in the graph, the starting
|
||||
node is listed as a predecessor of the ending node, and vise versa.
|
||||
|
||||
{{< codelines "Agda" "agda-spa/Language/Graphs.agda" 168 177 >}}
|
||||
|
||||
### Connecting Two Distinct Representations
|
||||
|
||||
I've described Control Flow Graphs as a compiler-centric representation of the
|
||||
program. Unlike the formal semantics from the previous post, CFGs do not reason
|
||||
about the dynamic behavior of the code. Instead, they capture the possible
|
||||
paths that execution can take through the instructions. In that
|
||||
sense, they are more of an approximation of what the program will do. This is
|
||||
good: because of [Rice's theorem](https://en.wikipedia.org/wiki/Rice%27s_theorem),
|
||||
we can't do anything other than approximating without running the program.
|
||||
|
||||
However, an incorrect approximation is of no use at all. Since the CFGs we build
|
||||
will be the core data type used by our program analyses, it's important that they
|
||||
are an accurate, if incomplete, representation. Specifically, because most
|
||||
of our analyses reason about possible outcomes --- we report what sign each
|
||||
variable __could__ have, for instance --- it's important that we don't accidentally
|
||||
omit cases that can happen in practice from our CFGs. Formally, this means
|
||||
that for each possible execution of a program according to its semantics,
|
||||
{{< sidenote "right" "converse-note" "there exists a corresponding path through the graph." >}}
|
||||
The converse is desirable too: that the graph has only paths that correspond
|
||||
to possible executions of the program. One graph that violates this property is
|
||||
the strongly-connected graph of all basic blocks in a program. Analyzing
|
||||
such a graph would give us an overly-conservative estimation; since anything
|
||||
can happen, most of our answers will likely be too general to be of any use. If,
|
||||
on the other hand, only the necessary graph connections exist, we can be more
|
||||
precise.<br>
|
||||
<br>
|
||||
However, proving this converse property (or even stating it precisely) is much
|
||||
harder, because our graphs are somewhat conservative already. There exist
|
||||
programs in which the condition of an <code>if</code>-statement is always
|
||||
evaluated to <code>false</code>, but our graphs always have edges for both
|
||||
the "then" and "else" cases. Determining whether a condition is always false
|
||||
(e.g.) is undecidable thanks to Rice's theorem (again), so we can't rule it out.
|
||||
Instead, we could broaden "all possible executions"
|
||||
to "all possible executions where branching conditions can produce arbitrary
|
||||
results", but this is something else entirely.<br>
|
||||
<br>
|
||||
For the time being, I will leave this converse property aside. As a result,
|
||||
our approximations might be "too careful". However, they will at the very least
|
||||
be sound.
|
||||
{{< /sidenote >}}
|
||||
|
||||
In the next post, I will prove that this property holds for the graphs shown
|
||||
here and the formal semantics I defined earlier. I hope to see you there!
|
||||
|
Loading…
Reference in New Issue
Block a user