Edit and publish post on control flow graphs

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
2024-11-27 16:27:26 -08:00 · 2024-11-27 16:27:26 -08:00 · 19be7eb1f5
commit 19be7eb1f5
parent 5de2ae1203
1 changed files with 80 additions and 8 deletions
--- a/content/blog/06_spa_agda_cfg/index.md
+++ b/content/blog/06_spa_agda_cfg/index.md
@ -2,7 +2,7 @@
 title: "Implementing and Verifying \"Static Program Analysis\" in Agda, Part 6: Control Flow Graphs"
 series: "Static Program Analysis in Agda"
 description: "In this post, I show how I show an Agda definition of control flow graph building"
-date: 2024-11-13T17:32:42-07:00
+date: 2024-11-27T16:26:42-07:00
 tags: ["Agda", "Programming Languages"]
 draft: true
 ---
@ -109,8 +109,10 @@ each number encoding the index of the node connected by the edge.

 Here's where it gets a little complicated. I don't want to use plain natural
 numbers for indices, because that means you can easily introduce "broken" edge.
-For example, what if you have 4 nodes, and you have an edge `(5, 5)`? Therefore,
-I picked the finite natural numbers represented by [`Fin`](https://agda.github.io/agda-stdlib/v2.0/Data.Fin.Base.html#1154).
+For example, what if you have 4 nodes, and you have an edge `(5, 5)`? To avoid
+this,  I picked the finite natural numbers represented by
+[`Fin`](https://agda.github.io/agda-stdlib/v2.0/Data.Fin.Base.html#1154)
+as endpoints for edges.

 ```Agda
 data Fin : ℕ → Set where
@ -188,7 +190,7 @@ some of the indices of the elements within. For instance, in the lists
 `[x]` and `[y]`, the indices of both `x` and `y` are `0`; however, in the
 concatenated list `[x, y]`, the index of `x` is still `0`, but the index of `y`
 is `1`. More generally, when we concatenate two lists `l1` and `l2`, the indices
-into `l1` remain unchanged, whereas the indices `l2` are shifted `length l2`.
+into `l1` remain unchanged, whereas the indices `l2` are shifted by `length l1`.

 Actually, that's not all there is to it. The _values_ of the indices into
 the left list don't change, but their types do! They start as `Fin (length l1)`,
@ -219,7 +221,7 @@ l2 [ idx2 ] ≡ (l1 ++ l2) [ n ↑ʳ idx2 ]

 The operators used in the definition above are just versions of the same
 re-indexing operators. The `↑ˡᵉ` operator applies `↑ˡ` to all the (__e__)dges
-in a graph, and the `↑ˡi` applies it to all the (__i__)ndices in a list
+in a graph, and the `↑ˡⁱ` applies it to all the (__i__)ndices in a list
 (like `inputs` and `outputs`).

 Given these definitions, hopefully the intent with the rest of the definition
@ -300,6 +302,76 @@ we may have duplicates in that list (why not?).

 {{< codelines "Agda" "agda-spa/Language/Graphs.agda" 165 166 >}}

-{{< todo >}}
-the rest
-{{< /todo >}}
+Above, `indices` is a list of all the node identifiers in the graph. Since the
+graph has `size` nodes, the indices of all these nodes are simply the values
+`0`, `1`, ..., `size - 1`. I defined a special function `finValues` to compute
+this list, together with a proof that this list is unique.
+
+{{< codelines "Agda" "agda-spa/Language/Graphs.agda" 127 143 >}}
+
+Another important property of `finValues` is that each node identifier is
+present in the list, so that our computation written by traversing the node
+list do not "miss" nodes.
+
+{{< codelines "Agda" "agda-spa/Language/Graphs.agda" 145 147 >}}
+
+We can specialize these definitions for a particular graph `g`:
+
+{{< codelines "Agda" "agda-spa/Language/Graphs.agda" 156 163 >}}
+
+To recap, we now have:
+* A way to build control flow graphs from programs
+* A list (unique'd and complete) of all nodes in the control flow graph so that
+  we can iterate over them when the algorithm demands.
+* A 'predecessors' function, which will be used by our static program analyses,
+  implemented as an iteration over the list of nodes.
+
+All that's left is to connect our `predecessors` function to edges in the graph.
+The following definitions say that when an edge is in the graph, the starting
+node is listed as a predecessor of the ending node, and vise versa.
+
+{{< codelines "Agda" "agda-spa/Language/Graphs.agda" 168 177 >}}
+
+### Connecting Two Distinct Representations
+
+I've described Control Flow Graphs as a compiler-centric representation of the
+program. Unlike the formal semantics from the previous post, CFGs do not reason
+about the dynamic behavior of the code. Instead, they capture the possible
+paths that execution can take through the instructions. In that
+sense, they are more of an approximation of what the program will do. This is
+good: because of [Rice's theorem](https://en.wikipedia.org/wiki/Rice%27s_theorem),
+we can't do anything other than approximating without running the program.
+
+However, an incorrect approximation is of no use at all. Since the CFGs we build
+will be the core data type used by our program analyses, it's important that they
+are an accurate, if incomplete, representation. Specifically, because most
+of our analyses reason about possible outcomes --- we report what sign each
+variable __could__ have, for instance --- it's important that we don't accidentally
+omit cases that can happen in practice from our CFGs. Formally, this means
+that for each possible execution of a program according to its semantics,
+{{< sidenote "right" "converse-note" "there exists a corresponding path through the graph." >}}
+The converse is desirable too: that the graph has only paths that correspond
+to possible executions of the program. One graph that violates this property is
+the strongly-connected graph of all basic blocks in a program. Analyzing
+such a graph would give us an overly-conservative estimation; since anything
+can happen, most of our answers will likely be too general to be of any use. If,
+on the other hand, only the necessary graph connections exist, we can be more
+precise.<br>
+<br>
+However, proving this converse property (or even stating it precisely) is much
+harder, because our graphs are somewhat conservative already. There exist
+programs in which the condition of an <code>if</code>-statement is always
+evaluated to <code>false</code>, but our graphs always have edges for both
+the "then" and "else" cases. Determining whether a condition is always false
+(e.g.) is undecidable thanks to Rice's theorem (again), so we can't rule it out.
+Instead, we could broaden "all possible executions"
+to "all possible executions where branching conditions can produce arbitrary
+results", but this is something else entirely.<br>
+<br>
+For the time being, I will leave this converse property aside. As a result,
+our approximations might be "too careful". However, they will at the very least
+be sound.
+{{< /sidenote >}}
+
+In the next post, I will prove that this property holds for the graphs shown
+here and the formal semantics I defined earlier. I hope to see you there!