diff --git a/content/blog/00_spa_agda_intro.md b/content/blog/00_spa_agda_intro.md index b1205d5..e8e38a0 100644 --- a/content/blog/00_spa_agda_intro.md +++ b/content/blog/00_spa_agda_intro.md @@ -1,6 +1,7 @@ --- title: "Implementing and Verifying \"Static Program Analysis\" in Agda, Part 0: Intro" series: "Static Program Analysis in Agda" +description: "In this post, I give a top-level overview of my work on formally verified static analyses" date: 2024-04-12T14:23:03-07:00 draft: true --- @@ -86,81 +87,3 @@ for a post or two: {{< todo >}} Once the posts are ready, link them here to add some kind of navigation. {{< /todo >}} - -### Monotone Frameworks - -I'll start out as abstractly and vaguely as possible. In general, the sorts of -analyses I'll be formalizing are based on _monotone frameworks_. -The idea with monotone frameworks is to rank information about program state -using some kind of _order_. Intuitively, given two pieces of "information", -one is less than another if it's more specific. Thus, "`x` has a positive sign" -is less than "`x` has any sign", since the former is more specific than the latter. -The sort of information that you are comparing depends on the analysis. In all -cases, the analysis itself is implemented as a function that takes the "information -so far", and updates it based on the program, producing "updated information so far". -Not all such functions are acceptable; it's possible to write an "updater function" -that keeps slightly adjusting its answer. Such a function could keep running -forever, which is a little too long for a program analyzer. We need something -to ensure the analysis ends. - -There are two secret ingredients to ensure that an analysis terminates. -The first is a property called _monotonicity_; a function is monotonic if -it preserves the order between its inputs. That is, if you have two pieces of -information `x1` and `x2`, with `x1 <= x2`, then `f(x1) <= f(x2)`. The second -property is that our "information" has a _finite height_. Roughly, this means -that if you tried to arrange pieces of information in a line, from least to -greatest, your line could only get so long. Combined, this leads to the -following property (I'm being reductive here while I give an overview): - -_With a monotoninc function and a finite-height order, if you start at the -bottom, each invocation of the function moves you up some line. Since the -line can only be so long, you're guaranteed to reach the end eventually._ - -The above three-paragraph explanation omits a lot of details, but it's a start. -To get more precise, we must drill down into several aspects of what I've -said so far. The first of them is, __how can we compare program states using -an order?__ - -### Lattices - -The "information" we'll be talking about will form an algebraic structure -called a [lattice](https://en.wikipedia.org/wiki/Lattice_(order)). Algebraically, -a lattice is simply a set with two binary operations on it. Unlike the familiar -`+`, `-`, and `*` and `/`, the binary operations on a lattice are called -"join" and "meet", and are written as `⊔` and `⊓`. Intuitively, they correspond -to "take the maximum of two values" and "take the minimum of two values". That -may not be all that surprising, since it's the order of values that we care about. -Continuing the analogy, let's talk some properties of "minimum" and "maximum", - -* \\(\\max(a, a) = \\min(a, a) = a\\). The minimum and maximum of one number is - just that number. Mathematically, this property is called _idempotence_. -* \\(\\max(a, b) = \\max(b, a)\\). If you're taking the maximum of two numbers, - it doesn't matter much one you consider first. This property is called - _commutativity_. -* \\(\\max(a, \\max(b, c)) = \\max(\\max(a, b), c)\\). When you have three numbers, - and you're determining the maximum value, it doesn't matter which pair of - numbers you compare first. This property is called _associativity_. - -All of the properties of \\(\\max\\) also hold for \\(\\min\\). There are also -a couple of facts about how \\(\\max\\) and \\(\\min\\) interact _with each other_. -They are usually called the _absorption laws_: - -* \\(\\max(a, \\min(a, b)) = a\\). This one is a little less obvious; \\(a\\) - is either less than or bigger than \\(b\\); so if you try to find the maximum - __and__ the minimum of \\(a\\) and \\(b\\), one of the operations will return - \\(a\\). -* \\(\\min(a, \\max(a, b)) = a\\). The reason for this one is the same as - the reason above. - -Lattices model a specific kind of order; their operations are meant to -generalize \\(\\min\\) and \\(\\max\\). Thus, to make the operations behave -as expected (i.e., the way that \\(\\min\\) and \\(\\max\\) do), they are -required to have all of the properties we've listed so far. We can summarize -the properties in table. - -| Property Name | Definition | -|---------------|:----------------------------------------------------:| -| Idempotence | {{< latex >}}\forall x. x \sqcup x = x{{< /latex >}}
{{< latex >}}\forall x. x \sqcap x = x{{< /latex >}} | -| Commutativity | {{< latex >}}\forall x, y. x \sqcup y = y \sqcup x{{< /latex >}}
{{< latex >}}\forall x, y. x \sqcap y = y \sqcap x{{< /latex >}} | -| Associativity | {{< latex >}}\forall x, y, z. x \sqcup (y \sqcup z) = (x \sqcup y) \sqcup z{{< /latex >}}
{{< latex >}}\forall x, y, z. x \sqcap (y \sqcap z) = (x \sqcap y) \sqcap z{{< /latex >}} | -| Absorption Laws | {{< latex >}}\forall x, y. x \sqcup (x \sqcap y) = x{{< /latex >}}
{{< latex >}}\forall x, y. x \sqcap (x \sqcup y) = x{{< /latex >}} diff --git a/content/blog/01_spa_agda_lattices.md b/content/blog/01_spa_agda_lattices.md new file mode 100644 index 0000000..fd28033 --- /dev/null +++ b/content/blog/01_spa_agda_lattices.md @@ -0,0 +1,324 @@ +--- +title: "Implementing and Verifying \"Static Program Analysis\" in Agda, Part 1: Lattices" +series: "Static Program Analysis in Agda" +date: 2024-04-12T14:23:03-07:00 +draft: true +--- + +This is the first post in a series on +[static program analysis in Agda]({{< relref "static-program-analysis-in-agda" >}}). +See the [introduction]({{< relref "00_spa_agda_intro" >}}) for a little bit +more context. + +The goal of this post is to motivate the algebraic structure called a +[lattice](https://en.wikipedia.org/wiki/Lattice_(order)). Lattices have +{{< sidenote "right" "crdt-note" "broad applications" >}} +See, for instance, Lars Hupel's excellent +introduction to CRDTs +which uses lattices for Conflict-Free Replicated Data Types. CRDTs can be +used to implement peer-to-peer distributed systems. +{{< /sidenote >}} beyond static program analysis, so the work in this post is +interesting in its own right. However, for the purposes of this series, I'm +most interested in lattices as an encoding of program information when performing +analysis. To start motivating lattices in that context, I'll need to start +with _monotone frameworks_. + +### Monotone Frameworks + +The key notion for monotone frameworks is the "specificity" of information. +Take, for instance, an analyzer that tries to figure out if a variable is +positive, negative, or equal to zero (this is called a _sign analysis_, and +we'll be using this example a lot). Of course, the variable could be "none +of the above" -- perhaps if it was initialized from user input, which would +allow both positive and negative numbers. Such an analyzer might return +`+`, `-`, `0`, or `unknown` for any given variable. These outputs are not +created equal: if a variable has sign `+`, we know more about it than if +the sign is `unknown`: we've ruled out negative numbers as possible values! + +Specificity is important to us because we want our analyses to be as precise +as possible. It would be valid for a program analysis to just return +`unknown` for everything, but it wouldn't be very useful. Thus, we want to +rank possible outputs, and try pick the most specific one. The +{{< sidenote "right" "convention-note" "convention" -12 >}} +I say convention, because it doesn't actually matter if we represent more +specific values as "larger" or "smaller". Given a lattice with a particular +order written as <, we can flip the sign in all relations +(turning a < b into a > b), and get back another +lattice. This lattice will have the same properties (more precisely, +the properties will be +dual). So +we shouldn't fret about picking a direction for "what's less than what". +{{< /sidenote >}} +seems to be to make +{{< sidenote "right" "order-note" "more specific things \"smaller\"" 1 >}} +Admittedly, it's a little bit odd to say that something which is "more" than +something else is actually smaller. The intuition that I favor is that +something that's more specific describes fewer objects: there are less +white horses than horses, so "white horse" is more specific than "horse". +The direction of < can be thought of as comparing the number +of objects.
+
+Note that this is only an intuition; there are equally many positive and +negative numbers, but we will not group them together +in our order. +{{< /sidenote >}}, +and less specific things "larger". Coming back to our previous example, we'd +write `+ < unknown`, since `+` is more specific. Of course, the exact +things we're trying to rank depend on the sort of analysis we're trying to +perform. Since I introduced sign analysis, we're ranking signs like `+` and `-`. +For other analyses, the elements will be different. The _comparison_, however, +will be a permanent fixture. + +Suppose now that we have some program analysis, and we're feeding it some input +information. Perhaps we're giving it the signs of variables `x` and `y`, and +hoping for it to give us the sign of a third variable `z`. It would be very +unfortunate if, when given more specific information, the analysis would return +a less specific output! The more you know going in, the more you should know +coming out. Similarly, when given less specific / vaguer information, the +analysis shouldn't produce a more specific answer -- how could it do that? +This leads us to come up with the following rule: + +{{< latex >}} +\textbf{if}\ \text{input}_1 \le \text{input}_2, +\textbf{then}\ \text{analyze}(\text{input}_1) \le \text{analyze}(\text{input}_2) +{{< /latex >}} + +In mathematics, such a property is called _monotonicity_. We say that +"analyze" is a [monotonic function](https://en.wikipedia.org/wiki/Monotonic_function). +This property gives its name to monotone frameworks. For our purposes, this +property means that being more specific "pays off": better information in +means better information out. In Agda, we can encode monotonicity as follows: + +{{< codelines "Agda" "agda-spa/Lattice.agda" 17 21 >}} + +Note that above, I defined `Monotonic` on an arbitrary function, whose +outputs might be of a different type than its inputs. This will come in handy +later. + +The order `<` of our elements and the monotonicity of our analysis are useful +to us for another reason: they help gauge and limit, in a roundabout way, how much +work might be left for our analysis to do. This matters because we don't want +to allow analyses that can take forever to finish -- that's a little too long +for a pragmatic tool used by people. + +The key observation -- which I will describe in detail in a later post -- +is that a monotonic analysis, in a way, "climbs upwards" through an +order. As we continue using this analysis to refine information over and over, +its results get +{{< sidenote "right" "less-specific-note" "less and less specific." >}} +It is not a bad thing for our results to get less specific over time, because +our initial information is probably incomplete. If you've only seen German +shepherds in your life, that might be your picture of what a dog is like. +If you then come across a chihuahua, your initial definition of "dog" would +certainly not accommodate it. To allow for both German shepherds and chihuahuas, +you'd have to loosen the definition of "dog". This new definition would be less +specific, but it would be more accurate. +{{< /sidenote >}} +If we add an additional ingredient, and say that the order has a _fixed height_, +we can deduce that the analysis will eventually stop producing additional +information: either it will keep "climbing", and reach the top (thus having +to stop), or it will stop on its own before reaching the top. This is +the essence of the fixed-point algorithm, which in Agda-like pseudocode can +be stated as follows: + +```Agda +module _ (IsFiniteHeight A ≺) + (f : A → A) + (Monotonicᶠ : Monotonic _≼_ _≼_ f) where + -- There exists a point... + aᶠ : A + + -- Such that applying the monotonic function doesn't change the result. + aᶠ≈faᶠ : aᶠ ≈ f aᶠ +``` + +Moreover, the value we'll get out of the fixed point algorithm will be +the _least fixed point_. For us, this means that the result will be +"the most specific result possible". + +{{< codelines "Agda" "agda-spa/Fixedpoint.agda" 80 80 >}} + +The above explanation omits a lot of details, but it's a start. To get more +precise, we must drill down into several aspects of what I've said so far. +The first of them is, __how can we compare program information using an order?__ + +### Lattices + +Let's start with a question: when it comes to our specificity-based order, +is `-` less than, greater than, or equal to `+`? Surely it's not less specific; +knowing that a number is negative doesn't give you less information than +knowing if that number is positive. Similarly, it's not any more specific, for +the same reason. You could consider it equally specific, but that doesn't +seem quite right either; the information is different, so comparing specificity +feels apples-to-oranges. On the other hand, both `+` and `-` are clearly +more specific than `unknown`. + +The solution to this conundrum is to simply refuse to compare certain elements: +`+` is neither less than, greater than, nor equal to `-`, but `+ < unknown` and +`- < unknown`. Such an ordering is called a [partial order](https://en.wikipedia.org/wiki/Partially_ordered_set). + +Next, another question. Suppose that the user writes code like this: + +``` +if someCondition { + x = exprA; +} else { + x = exprB; +} +y = x; +``` + +If `exprA` has sign `s1`, and `exprB` has sign `s2`, what's the sign of `y`? +It's not necessarily `s1` nor `s2`, since they might not match: `s1` could be `+`, +and `s2` could be `-`, and using either `+` or `-` for `y` would be incorrect. +We're looking for something that can encompass _both_ `s1` and `s2`. +Necessarily, it would be either equally specific or less specific than +either `s1` or `s2`: there isn't any new information coming in about `x`, +and since we don't know which branch is taken, we stand to lose a little +bit of info. However, our goal is always to maximize specificity, since +more specific signs give us more information about our program. + +This gives us the following constraints. Since the combined sign `s` has to +be equally or less specific than either `s1` and `s2`, we have `s1 <= s` and +`s2 <= s`. However, we want to pick `s` such that it's more specific than +any other "combined sign" candidate. Thus, if there's another sign `t`, +with `s1 <= t` and `s2 <= t`, then it must be less specific than `s`: `s <= t`. + +At first, the above constraints might seem quite complicated. We can interpret +them in more familiar territory by looking at numbers instead of signs. +If we have two numbers `n1` and `n2`, what number is the smallest number +that's bigger than either `n1` or `n2`? Why, the maximum of the two, of course! + +There is a reason why I used the constraints above instead of just saying +"maximum". For numbers, `max(a,b)` is either `a` or `b`. However, we saw earlier +that neither `+` nor `-` works as the sign for `y` in our program. Moreover, +we agreed above that our order is _partial_: how can we pick "the bigger of two +elements" if neither is bigger than the other? `max` itself doesn't quite work, +but what we're looking for is something similar. Instead, we simply require a +similar function for our signs. We call this function "[least upper bound](https://en.wikipedia.org/wiki/Least-upper-bound_property)", +since it is the "least (most specific) +element that's greater (less specific) than either `s1` or `s2`". Conventionally, +this function is written as \\(a \\sqcup b\\) (or in our case, \\(s_1 \\sqcup s_2\\)). +We can define it for our signs so far using the following [Cayley table](https://en.wikipedia.org/wiki/Cayley_table). + +{{< latex >}} +\begin{array}{c|cccc} + \sqcup & - & 0 & + & ? \\ + \hline + - & - & ? & ? & ? \\ + 0 & ? & 0 & ? & ? \\ + + & ? & ? & + & ? \\ + ? & ? & ? & ? & ? \\ +\end{array} +{{< /latex >}} + +By using the above table, we can see that \\((+\ \\sqcup\ -)\ =\ ?\\) (aka `unknown`). +This is correct; given the four signs we're working with, that's the most we can say. +Let's explore the analogy to the `max` function a little bit more, by observing +that this function has certain properties: + +* `max(a, a) = a`. The maximum of one number is just that number. + Mathematically, this property is called _idempotence_. Note that + by inspecting the diagonal of the above table, we can confirm that our + \\((\\sqcup)\\) function is idempotent. +* `max(a, b) = max(b, a)`. If you're taking the maximum of two numbers, + it doesn't matter which one you consider first. This property is called + _commutativity_. Note that if you mirror the table along the diagonal, + it doesn't change; this shows that our \\((\\sqcup)\\) function is + commutative. +* `max(a, max(b, c)) = max(max(a, b), c)`. When you have three numbers, + and you're determining the maximum value, it doesn't matter which pair of + numbers you compare first. This property is called _associativity_. You + can use the table above to verify the \\((\\sqcup)\\) is associative, too. + +A set that has a binary operation (like `max` or \\((\\sqcup)\\)) that +satisfies the above properties is called a [semilattice](https://en.wikipedia.org/wiki/Semilattice). In Agda, we can write this definition roughly as follows: + +```Agda +record IsSemilattice {a} (A : Set a) (_⊔_ : A → A → A) : Set a where + field + ⊔-assoc : (x y z : A) → ((x ⊔ y) ⊔ z) ≡ (x ⊔ (y ⊔ z)) + ⊔-comm : (x y : A) → (x ⊔ y) ≡ (y ⊔ x) + ⊔-idemp : (x : A) → (x ⊔ x) ≡ x +``` + +It turns out to be convenient, however, to not require definitional equality +(`≡`). For instance, we might model sets as lists. Definitional equality +would force us to consider lists with the same elements but a different +order to be unequal. Instead, we parameterize our definition of `IsSemilattice` +by a binary relation `_≈_`, which we ask to be an [equivalence relation](https://en.wikipedia.org/wiki/Equivalence_relation). + +{{< codelines "Agda" "agda-spa/Lattice.agda" 23 39 >}} + +Notice that the above code also provides -- but doesn't require -- `_≼_` and +`_≺_`. That's because a least-upper-bound operation encodes an order: +intuitively, if `max(a, b) = b`, then `b` must be larger than `a`. +Lars Hupel's CRDT series includes [an explanation](https://lars.hupel.info/topics/crdt/03-lattices/#there-) of how the ordering operator +and the "least upper bound" function can be constructed from one another. + +As it turns out, the `min` function has very similar properties to `max`: +it's idempotent, commutative, and associative. For a partial order like +ours, the analog to `min` is "greatest lower bound", or "the largest value +that's smaller than both inputs". Such a function is denoted as \\(a\\sqcap b\\). +Intuitively, where \\(s_1 \\sqcup s_2\\) means "combine two signs where +you don't know which one will be used" (like in an `if`/`else`), +\\(s_1 \\sqcap s_2\\) means "combine two signs where you know both of +them to be true". For example, \\((+\ \\sqcap\ ?)\ =\ +\\), because a variable +that's both "any sign" and "positive" must be positive. + +There's just one hiccup: what's the greatest lower bound of `+` and `-`? +it needs to be a value that's less than both of them, but so far, we don't have +such a value. Intuitively, this value should be called something like `impossible`, +because a number that's both positive and negative doesn't exist. So, let's +extend our analyzer to have a new `impossible` value. In fact, it turns +out that this "impossible" value is the least element of our set (we added +it to be the lower bound of `+` and co., which in turn are less than `unknown`). +Similarly, `unknown` is the largest element of our set, since it's greater +than `+` and co, and transitively greater than `impossible`. In mathematics, +it's not uncommon to define the least element as \\(\\bot\\) (read "bottom"), and the +greatest element as \\(\\top\\) (read "top"). With that in mind, the +following are the updated Cayley tables for our operations. + +{{< latex >}} +\begin{array}{c|ccccc} + \sqcup & - & 0 & + & \top & \bot \\ + \hline + - & - & \top & \top & \top & - \\ + 0 & \top & 0 & \top & \top & 0 \\ + + & \top & \top & + & \top & + \\ + \top & \top & \top & \top & \top & \top \\ + \bot & - & 0 & + & \top & \bot \\ +\end{array} + +\qquad + +\begin{array}{c|ccccc} + \sqcap & - & 0 & + & \top & \bot \\ + \hline + - & - & \bot & \bot & - & \bot \\ + 0 & \bot & 0 & \bot & 0 & \bot \\ + + & \bot & \bot & + & + & \bot \\ + \top & - & 0 & + & \top & \bot \\ + \bot & \bot & \bot & \bot & \bot & \bot \\ +\end{array} +{{< /latex >}} + +So, it turns out that our set of possible signs is a semilattice in two +ways. And if "semi" means "half", does two "semi"s make a whole? Indeed it does! + +A lattice is made up of two semilattices. The operations of these two lattices, +however, must satisfy some additional properties. Let's examine the properties +in the context of `min` and `max` as we have before. They are usually called +the _absorption laws_: + +* `max(a, min(a, b)) = a`. `a` is either less than or bigger than `b`; + so if you try to find the maximum __and__ the minimum of `a` and + `b`, one of the operations will return `a`. +* `min(a, max(a, b)) = a`. The reason for this one is the same as + the reason above. + +In Agda, we can therefore write a lattice as follows: + +{{< codelines "Agda" "agda-spa/Lattice.agda" 153 163 >}} + +### Concrete Example: