Write more

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
Update Agda SPA imp
2024-11-16 15:25:26 -08:00 · 2024-11-16 15:16:57 -08:00 · 2024-11-16 14:44:02 -08:00
4 changed files with 147 additions and 19 deletions
--- a/code/agda-spa
+++ b/code/agda-spa
@ -1 +1 @@
-Subproject commit 828b652d3b9266e27ef7cf5a8a7fb82e3fd3133f
+Subproject commit 913121488069a20cdfd40777a8777eb3744c415e
--- a/content/blog/01_spa_agda_lattices.md
+++ b/content/blog/01_spa_agda_lattices.md
@ -203,6 +203,7 @@ element that's greater (less specific) than either `s1` or `s2`". Conventionally
 this function is written as \(a \sqcup b\) (or in our case, \(s_1 \sqcup s_2\)).
 The \((\sqcup)\) symbol is also called the _join_ of \(a\) and \(b\).
 We can define it for our signs so far using the following [Cayley table](https://en.wikipedia.org/wiki/Cayley_table).
 {#least-upper-bound}
 {{< latex >}}
 \begin{array}{c|cccc}
--- a/content/blog/03_spa_agda_fixed_height.md
+++ b/content/blog/03_spa_agda_fixed_height.md
@ -97,7 +97,8 @@ ordering relation `R`/`<` are expected to play together nicely (if `a < b`, and
 {{< codelines "agda" "agda-spa/Chain.agda" 3 7 >}}
 From there, the definition of the `Chain` data type is much like the definition
-of a vector, but indexed by the endpoints, and containing witnesses of `R`/`<`
+of [a vector from  `Data.Vec`](https://agda.github.io/agda-stdlib/v2.0/Data.Vec.Base.html#1111),
 but indexed by the endpoints, and containing witnesses of `R`/`<`
 between its elements. The indexing allows for representing
 the type of chains between particular lattice elements, and serves to ensure
 concatenation and other operations don't merge disparate chains.
--- a/content/blog/06_spa_agda_cfg/index.md
+++ b/content/blog/06_spa_agda_cfg/index.md
@ -10,14 +10,14 @@ draft: true
 In the previous section, I've given a formal definition of the programming
 language that I've been trying to analyze. This formal definition serves
 as the "ground truth" for how our little imperative programs are executed;
-however, program analyses (especially in practice) seldom use the formal semantics
+however, program analyses (especially in practice) seldom take the formal
-as their subject matter. Instead, they focus on more pragmatic program
+semantics as input. Instead, they focus on more pragmatic program
 representations from the world of compilers. One such representation are
-_Control Flow Graphs (CFGs)_.
+_Control Flow Graphs (CFGs)_. That's what I want to discuss in this post.
 Let's start by building some informal intuition. CFGs are pretty much what
-their name suggests. They are a type of [graph](https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)).
+their name suggests: they are a type of [graph](https://en.wikipedia.org/wiki/Graph_(discrete_mathematics));
-Edges in CFGs represent how execution might jump from one piece of code to
+their edges show how execution might jump from one piece of code to
 another (how control might flow).
 For example, take the below program.
@ -40,8 +40,8 @@ The CFG might look like this:
 Here, the initialization of `x` with `...`, as well as the `if` condition (just `x`),
 are guaranteed to execute one after another, so they occupy a single node. From there,
 depending on the condition, the control flow can jump to one of the
-branches of the `if` statement: the "then" branch if the condition is true,
+branches of the `if` statement: the "then" branch if the condition is truthy,
-and the "else" branch if the condition is false. As a result, there are two
+and the "else" branch if the condition is falsy. As a result, there are two
 arrows coming out of the initial node. Once either branch is executed, control
 always jumps to the code right after the `if` statement (the `y = x`). Thus,
 both the `x = 1` and `x = 0` nodes have a single arrow to the `y = x` node.
@ -60,7 +60,7 @@ The CFG would look like this:
 {{< figure src="while-cfg.png" label="CFG for simple `while` code." class="small" >}}
-Here, condition of the loop (`x`) is not always guaranteed to execute together
+Here, the condition of the loop (`x`) is not always guaranteed to execute together
 with the code that initializes `x`. That's because the condition of the loop
 is checked after every iteration, whereas the code before the loop is executed
 only once. As a result, `x = ...` and `x` occupy distinct CFG nodes. From there,
@ -81,9 +81,8 @@ Now, let's be a bit more precise. Control Flow Graphs are defined as follows:
  The one-entry-point rule means that it's not possible to jump into the middle
  of the basic block, executing only half of its instructions. The execution of
  a basic block always begins at the top. Symmetrically, the one-exit-point
-  rule means that you can't jump away to other code (even within the same block),
+  rule means that you can't jump away to other code, skipping some instructions.
-  skipping some instructions. The execution of a basic block always ends at
+  The execution of a basic block always ends at the bottom.
  the bottom.
  As a result of these constraints, when running a basic block, you are
  guaranteed to execute every instruction in exactly the order they occur in,
@ -91,7 +90,7 @@ Now, let's be a bit more precise. Control Flow Graphs are defined as follows:
 * __The edges__ are jumps between basic blocks. We've already seen how
  `if` and `while` statements introduce these jumps.
-Basic blocks can only be made of code that doen't jump (otherwise,
+Basic blocks can only be made of code that doesn't jump (otherwise,
 we violate the single-exit-point policy). In the previous post,
 we defined exactly this kind of code as [simple statements]({{< relref "05_spa_agda_semantics#introduce-simple-statements" >}}).
 So, in our control flow graph, nodes will be sequences of simple statements.
@ -103,7 +102,7 @@ So, in our control flow graph, nodes will be sequences of simple statements.
 At an abstract level, it's easy to say "it's just a graph where X is Y" about
 anything. It's much harder to give a precise definition of such a graph,
 particularly if you want to rule out invalid graphs (e.g., ones with edges
-pointing nowhere). In Agda, I chose the represent a two lists: one of nodes,
+pointing nowhere). In Agda, I chose the represent a CFG with two lists: one of nodes,
 and one of edges. Each node is simply a list of `BasicStmt`s, as
 I described in a preceding paragraph. An edge is simply a pair of numbers,
 each number encoding the index of the node connected by the edge.
@ -123,7 +122,7 @@ Specifically, `Fin n` is the type of natural numbers less than `n`. Following
 this definition, `Fin 3` represents the numbers `0`, `1` and `2`. These are
 represented using the same constructors as `Nat`: `zero` and `suc`. The type
 of `zero` is `Fin (suc n)` for any `n`; this makes sense because zero is less
-than any number plus one. For `suc,` the bound `n` of the input `i` is incremented
+than any number plus one. For `suc`, the bound `n` of the input `i` is incremented
 by one, leading to another `suc n` in the final type. This makes sense because if
 `i < n`, then `i + 1 < n + 1`. I've previously explained this data type
 [in another post on this site]({{< relref "01_aoc_coq#aside-vectors-and-finite-mathbbn" >}}).
@ -133,8 +132,8 @@ Here's my definition of `Graph`s written using `Fin`:
 {{< codelines "Agda" "agda-spa/Language/Graphs.agda" 24 39 >}}
 I explicitly used a `size` field, which determines how many nodes are in the
-graph, and serves both as the upper bound the edge indices as well as the
+graph, and serves as the upper bound for the edge indices. From there, an
-size `nodes` field. From there, an index `Index` into the node list is
+index `Index` into the node list is
 {{< sidenote "right" "size-note" "just a natural number less than `size`," >}}
 Ther are <code>size</code> natural numbers less than <code>size</code>:<br>
 <code>0, 1, ..., size - 1</code>.
@ -174,6 +173,133 @@ control would begin once we started executing `code2`). Those are the `outputs`
 and `inputs`, respectively. When stitching together sequenced control graphs,
 we will connect each of the outputs of one to each of the inputs of the other.
-This is defined by the operation `_↦_`:
+This is defined by the operation `g₁ ↦ g₂`, which sequences two graphs `g₁` and `g₂`:
 {{< codelines "Agda" "agda-spa/Language/Graphs.agda" 72 83 >}}
 The definition starts out pretty innocuous, but gets a bit complicated by the
 end. The sum of the numbers of nodes in the two operands becomes the new graph
 size, and the nodes from the two graphs are all included in the result. Then,
 the definitions start making use of various operators like `↑ˡᵉ` and `↑ʳᵉ`;
 these deserve an explanation.
 The tricky thing is that when we're concatenating lists of nodes, we are changing
 some of the indices of the elements within. For instance, in the lists
 `[x]` and `[y]`, the indices of both `x` and `y` are `0`; however, in the
 concatenated list `[x, y]`, the index of `x` is still `0`, but the index of `y`
 is `1`. More generally, when we concatenate two lists `l1` and `l2`, the indices
 into `l1` remain unchanged, whereas the indices `l2` are shifted `length l2`.
 Actually, that's not all there is to it. The _values_ of the indices into
 the left list don't change, but their types do! They start as `Fin (length l1)`,
 but for the whole list, these same indices will have type `Fin (length l1 + length l2))`.
 To help deal with this, Agda provides the operators
 [`↑ˡ`](https://agda.github.io/agda-stdlib/v2.0/Data.Fin.Base.html#2355)
 and [`↑ʳ`](https://agda.github.io/agda-stdlib/v2.0/Data.Fin.Base.html#2522)
 that implement this re-indexing and re-typing. The former implements "re-indexing
 on the left" -- given an index into the left list `l1`, it changes its type
 by adding the other list's length to it, but keeps the index value itself
 unchanged. The latter implements "re-indexing on the right" -- given an index
 into the right list `l2`, it adds the length of the first list to it (shifting it),
 and does the same to its type.
 The definition leads to the following equations:
 ```Agda
 l1 : Vec A n
 l2 : Vec A m
 idx1 : Fin n -- index into l1
 idx2 : Fin m -- index into l2
 l1 [ idx1 ] ≡ (l1 ++ l2) [ idx1 ↑ˡ m ]
 l2 [ idx2 ] ≡ (l1 ++ l2) [ n ↑ʳ idx2 ]
 ```
 The operators used in the definition above are just versions of the same
 re-indexing operators. The `↑ˡᵉ` operator applies `↑ˡ` to all the (__e__)dges
 in a graph, and the `↑ˡi` applies it to all the (__i__)ndices in a list
 (like `inputs` and `outputs`).
 Given these definitions, hopefully the intent with the rest of the definition
 is not too hard to see. The edges in the new graph come from three places:
 the graph `g₁` and `g₂`, and from creating a new edge from each of the outputs
 of `g₁` to each of the inputs of  `g₂`. We keep the inputs of `g₁` as the
 inputs of the whole graph (since `g₁` comes first), and symmetrically we keep
 the outputs of `g₂`. Of course, we do have to re-index them to keep them
 pointing at the right nodes.
 Another operation we will need is "overlaying" two graphs: this will be like
 placing them in parallel, without adding jumps between the two. We use this
 operation when combining the sub-CFGs of the "if" and "else" branches of an
 `if`/`else`, which both follow the condition, and both proceed to the code after
 the conditional.
 {{< codelines "Agda" "agda-spa/Language/Graphs.agda" 59 70 >}}
 Everything here is just concatenation; we pool together the nodes, edges,
 inputs, and outputs, and the main source of complexity is the re-indexing.
 The one last operation, which we will use for `while` loops, is looping. This
 operation simply connects the outputs of a graph back to its inputs (allowing
 looping), and also allows the body to be skipped. This is slightly different
 from the graph for `while` loops I showed above; the reason for that is that
 I currently don't include the conditional expressions in my CFG. This is a
 limitation that I will address in future work.
 {{< codelines "Agda" "agda-spa/Language/Graphs.agda" 85 95 >}}
 Given these thee operations, I construct Control Flow Graphs as follows, where
 `singleton` creates a new CFG node with the given list of simple statements:
 {{< codelines "Agda" "agda-spa/Language/Graphs.agda" 122 126 >}}
 Throughout this, I've been liberal to include empty CFG nodes as was convenient.
 This is a departure from the formal definition I gave above, but it makes
 things much simpler.
 ### Additional Functions
 To integrate Control Flow Graphs into our lattice-based program analyses, we'll
 need to do a couple of things. First, upon reading the
 [reference _Static Program Analysis_ text](https://cs.au.dk/~amoeller/spa/),
 one sees a lot of quantification over the predecessors or successors of a
 given CFG node. For example, the following equation is from Chapter 5:
 {{< latex >}}
 \textit{JOIN}(v) = \bigsqcup_{w \in \textit{pred}(v)} \llbracket w \rrbracket
 {{< /latex >}}
 To compute the \(\textit{JOIN}\) function (which we have not covered yet) for
 a given CFG node, we need to iterate over all of its predecessors, and
 combine their static information using \(\sqcup\), which I first
 [explained several posts ago]({{< relref "01_spa_agda_lattices#least-upper-bound" >}}).
 To be able to iterate over them, we need to be able to retrieve the predecessors
 of a node from a graph!
 Our encoding does not make computing the predecessors particularly easy; to
 check if two nodes are connected, we need to check if an `Index`-`Index` pair
 corresponding to the nodes is present in the `edges` list. To this end, we need
 to be able to compare edges for equality. Fortunately, it's relatively
 straightforward to show that our edges can be compared in such a way;
 after all, they are just pairs of `Fin`s, and `Fin`s and products support
 these comparisons.
 {{< codelines "Agda" "agda-spa/Language/Graphs.agda" 149 152 >}}
 Next, if we can compare edges for equality, we can check if an edge is in
 a list. Agda provides a built-in function for this:
 {{< codelines "Agda" "agda-spa/Language/Graphs.agda" 154 154 >}}
 To find the predecessors of a particular node, we go through all other nodes
 in the graph and see if there's an edge there between those nodes and the
 current one. This is preferable to simply iterating over the edges because
 we may have duplicates in that list (why not?).
 {{< codelines "Agda" "agda-spa/Language/Graphs.agda" 165 166 >}}
 {{< todo >}}
 the rest
 {{< /todo >}}
Author	SHA1	Message	Date
Danila Fedorin	e5fb0a2929	Write more Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>	2024-11-16 15:25:26 -08:00
Danila Fedorin	3bd1f0c8a0	Update Agda SPA imp	2024-11-16 15:16:57 -08:00
Danila Fedorin	948759b7d4	Update the SPA repo Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>	2024-11-16 14:44:02 -08:00
		`@ -1 +1 @@`
			`Subproject commit 828b652d3b9266e27ef7cf5a8a7fb82e3fd3133f`				`Subproject commit 913121488069a20cdfd40777a8777eb3744c415e`