Move original 'monotone function' text into new post and heavily rework it

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
2024-05-12 18:28:44 -07:00
parent 2b5dcf12d7
commit a20fe07a56
2 changed files with 325 additions and 78 deletions
--- a/content/blog/00_spa_agda_intro.md
+++ b/content/blog/00_spa_agda_intro.md
@@ -1,6 +1,7 @@
 ---
 title: "Implementing and Verifying \"Static Program Analysis\" in Agda, Part 0: Intro"
 series: "Static Program Analysis in Agda"
 description: "In this post, I give a top-level overview of my work on formally verified static analyses"
 date: 2024-04-12T14:23:03-07:00
 draft: true
 ---
@@ -86,81 +87,3 @@ for a post or two:
 {{< todo >}}
 Once the posts are ready, link them here to add some kind of navigation.
 {{< /todo >}}
 ### Monotone Frameworks
 I'll start out as abstractly and vaguely as possible. In general, the sorts of
 analyses I'll be formalizing are based on _monotone frameworks_.
 The idea with monotone frameworks is to rank information about program state
 using some kind of _order_. Intuitively, given two pieces of "information",
 one is less than another if it's more specific. Thus, "`x` has a positive sign"
 is less than "`x` has any sign", since the former is more specific than the latter.
 The sort of information that you are comparing depends on the analysis. In all
 cases, the analysis itself is implemented as a function that takes the "information
 so far", and updates it based on the program, producing "updated information so far".
 Not all such functions are acceptable; it's possible to write an "updater function"
 that keeps slightly adjusting its answer. Such a function could keep running
 forever, which is a little too long for a program analyzer. We need something
 to ensure the analysis ends.
 There are two secret ingredients to ensure that an analysis terminates.
 The first is a property called _monotonicity_; a function is monotonic if
 it preserves the order between its inputs. That is, if you have two pieces of
 information `x1` and `x2`, with `x1 <= x2`, then `f(x1) <= f(x2)`. The second
 property is that our "information" has a _finite height_. Roughly, this means
 that if you tried to arrange pieces of information in a line, from least to
 greatest, your line could only get so long. Combined, this leads to the
 following property (I'm being reductive here while I give an overview):
 _With a monotoninc function and a finite-height order, if you start at the
 bottom, each invocation of the function moves you up some line. Since the
 line can only be so long, you're guaranteed to reach the end eventually._
 The above three-paragraph explanation omits a lot of details, but it's a start.
 To get more precise, we must drill down into several aspects of what I've
 said so far. The first of them is, __how can we compare program states using
 an order?__
 ### Lattices
 The "information" we'll be talking about will form an algebraic structure
 called a [lattice](https://en.wikipedia.org/wiki/Lattice_(order)). Algebraically,
 a lattice is simply a set with two binary operations on it. Unlike the familiar
 `+`, `-`, and `*` and `/`, the binary operations on a lattice are called
 "join" and "meet", and are written as `⊔` and `⊓`. Intuitively, they correspond
 to "take the maximum of two values" and "take the minimum of two values". That
 may not be all that surprising, since it's the order of values that we care about.
 Continuing the analogy, let's talk some properties of "minimum" and "maximum", 
 * \\(\\max(a, a) = \\min(a, a) = a\\). The minimum and maximum of one number is
  just that number. Mathematically, this property is called _idempotence_.
 * \\(\\max(a, b) = \\max(b, a)\\). If you're taking the maximum of two numbers,
  it doesn't matter much one you consider first. This property is called
  _commutativity_.
 * \\(\\max(a, \\max(b, c)) = \\max(\\max(a, b), c)\\). When you have three numbers,
  and you're determining the maximum value, it doesn't matter which pair of
  numbers you compare first. This property is called _associativity_.
 All of the properties of \\(\\max\\) also hold for \\(\\min\\). There are also
 a couple of facts about how \\(\\max\\) and \\(\\min\\) interact _with each other_.
 They are usually called the _absorption laws_:
 * \\(\\max(a, \\min(a, b)) = a\\). This one is a little less obvious; \\(a\\)
  is either less than or bigger than \\(b\\); so if you try to find the maximum
  __and__ the minimum of \\(a\\) and \\(b\\), one of the operations will return
  \\(a\\).
 * \\(\\min(a, \\max(a, b)) = a\\). The reason for this one is the same as
  the reason above.
 Lattices model a specific kind of order; their operations are meant to
 generalize \\(\\min\\) and \\(\\max\\). Thus, to make the operations behave
 as expected (i.e., the way that \\(\\min\\) and \\(\\max\\) do), they are
 required to have all of the properties we've listed so far. We can summarize
 the properties in table.
 | Property Name | Definition                                           |
 |---------------|:----------------------------------------------------:|
 | Idempotence   | {{< latex >}}\forall x. x \sqcup x = x{{< /latex >}}<br>{{< latex >}}\forall x. x \sqcap x = x{{< /latex >}} |
 | Commutativity | {{< latex >}}\forall x, y. x \sqcup y = y \sqcup x{{< /latex >}}<br>{{< latex >}}\forall x, y. x \sqcap y = y \sqcap x{{< /latex >}} |
 | Associativity | {{< latex >}}\forall x, y, z. x \sqcup (y \sqcup z) = (x \sqcup y) \sqcup z{{< /latex >}}<br>{{< latex >}}\forall x, y, z. x \sqcap (y \sqcap z) = (x \sqcap y) \sqcap z{{< /latex >}} |
 | Absorption Laws | {{< latex >}}\forall x, y. x \sqcup (x \sqcap y) = x{{< /latex >}}<br>{{< latex >}}\forall x, y. x \sqcap (x \sqcup y) = x{{< /latex >}}
--- a/content/blog/01_spa_agda_lattices.md
+++ b/content/blog/01_spa_agda_lattices.md
@@ -0,0 +1,324 @@
 ---
 title: "Implementing and Verifying \"Static Program Analysis\" in Agda, Part 1: Lattices"
 series: "Static Program Analysis in Agda"
 date: 2024-04-12T14:23:03-07:00
 draft: true
 ---
 This is the first post in a series on
 [static program analysis in Agda]({{< relref "static-program-analysis-in-agda" >}}).
 See the [introduction]({{< relref "00_spa_agda_intro" >}}) for a little bit
 more context.
 The goal of this post is to motivate the algebraic structure called a
 [lattice](https://en.wikipedia.org/wiki/Lattice_(order)). Lattices have
 {{< sidenote "right" "crdt-note" "broad applications" >}}
 See, for instance, Lars Hupel's excellent
 <a href="https://lars.hupel.info/topics/crdt/01-intro/">introduction to CRDTs</a>
 which uses lattices for Conflict-Free Replicated Data Types. CRDTs can be
 used to implement peer-to-peer distributed systems.
 {{< /sidenote >}} beyond static program analysis, so the work in this post is
 interesting in its own right. However, for the purposes of this series, I'm
 most interested in lattices as an encoding of program information when performing
 analysis. To start motivating lattices in that context, I'll need to start
 with _monotone frameworks_.
 ### Monotone Frameworks
 The key notion for monotone frameworks is the "specificity" of information.
 Take, for instance, an analyzer that tries to figure out if a variable is
 positive, negative, or equal to zero (this is called a _sign analysis_, and
 we'll be using this example a lot). Of course, the variable could be "none
 of the above" -- perhaps if it was initialized from user input, which would
 allow both positive and negative numbers. Such an analyzer might return
 `+`, `-`, `0`, or `unknown` for any given variable. These outputs are not
 created equal: if a variable has sign `+`, we know more about it than if
 the sign is `unknown`: we've ruled out negative numbers as possible values!
 Specificity is important to us because we want our analyses to be as precise
 as possible. It would be valid for a program analysis to just return
 `unknown` for everything, but it wouldn't be very useful. Thus, we want to
 rank possible outputs, and try pick the most specific one. The
 {{< sidenote "right" "convention-note" "convention" -12 >}}
 I say convention, because it doesn't actually matter if we represent more
 specific values as "larger" or "smaller". Given a lattice with a particular
 order written as <code>&lt;</code>, we can flip the sign in all relations
 (turning <code>a &lt; b</code> into <code>a &gt; b</code>), and get back another
 lattice. This lattice will have the same properties (more precisely,
 the properties will be
 <a href="https://en.wikipedia.org/wiki/Duality_(mathematics)">dual</a>). So
 we shouldn't fret about picking a direction for "what's less than what".
 {{< /sidenote >}}
 seems to be to make
 {{< sidenote "right" "order-note" "more specific things \"smaller\"" 1 >}}
 Admittedly, it's a little bit odd to say that something which is "more" than
 something else is actually smaller. The intuition that I favor is that
 something that's more specific describes fewer objects: there are less
 white horses than horses, so "white horse" is more specific than "horse".
 The direction of <code>&lt</code> can be thought of as comparing the number
 of objects.<br>
 <br>
 Note that this is only an intuition; there are equally many positive and
 negative numbers, but we will <em>not</em> group them together
 in our order.
 {{< /sidenote >}},
 and less specific things "larger". Coming back to our previous example, we'd
 write `+ < unknown`, since `+` is more specific. Of course, the exact
 things we're trying to rank depend on the sort of analysis we're trying to
 perform. Since I introduced sign analysis, we're ranking signs like `+` and `-`.
 For other analyses, the elements will be different. The _comparison_, however,
 will be a permanent fixture.
 Suppose now that we have some program analysis, and we're feeding it some input
 information. Perhaps we're giving it the signs of variables `x` and `y`, and
 hoping for it to give us the sign of a third variable `z`. It would be very
 unfortunate if, when given more specific information, the analysis would return
 a less specific output! The more you know going in, the more you should know
 coming out. Similarly, when given less specific / vaguer information, the
 analysis shouldn't produce a more specific answer -- how could it do that?
 This leads us to come up with the following rule:
 {{< latex >}}
 \textbf{if}\ \text{input}_1 \le \text{input}_2,
 \textbf{then}\ \text{analyze}(\text{input}_1) \le \text{analyze}(\text{input}_2)
 {{< /latex >}}
 In mathematics, such a property is called _monotonicity_. We say that
 "analyze" is a [monotonic function](https://en.wikipedia.org/wiki/Monotonic_function).
 This property gives its name to monotone frameworks. For our purposes, this
 property means that being more specific "pays off": better information in
 means better information out. In Agda, we can encode monotonicity as follows:
 {{< codelines "Agda" "agda-spa/Lattice.agda" 17 21 >}}
 Note that above, I defined `Monotonic` on an arbitrary function, whose
 outputs might be of a different type than its inputs. This will come in handy
 later.
 The order `<` of our elements and the monotonicity of our analysis are useful
 to us for another reason: they help gauge and limit, in a roundabout way, how much
 work might be left for our analysis to do. This matters because we don't want
 to allow analyses that can take forever to finish -- that's a little too long
 for a pragmatic tool used by people.
 The key observation -- which I will describe in detail in a later post --
 is that a monotonic analysis, in a way, "climbs upwards" through an
 order. As we continue using this analysis to refine information over and over,
 its results get
 {{< sidenote "right" "less-specific-note" "less and less specific." >}}
 It is not a bad thing for our results to get less specific over time, because
 our initial information is probably incomplete. If you've only seen German
 shepherds in your life, that might be your picture of what a dog is like.
 If you then come across a chihuahua, your initial definition of "dog" would
 certainly not accommodate it. To allow for both German shepherds and chihuahuas,
 you'd have to loosen the definition of "dog". This new definition would be less
 specific, but it would be more accurate.
 {{< /sidenote >}}
 If we add an additional ingredient, and say that the order has a _fixed height_,
 we can deduce that the analysis will eventually stop producing additional
 information: either it will keep "climbing", and reach the top (thus having
 to stop), or it will stop on its own before reaching the top. This is
 the essence of the fixed-point algorithm, which in Agda-like pseudocode can
 be stated as follows:
 ```Agda
 module _ (IsFiniteHeight A ≺)
         (f : A → A)
         (Monotonicᶠ : Monotonic _≼_ _≼_ f) where
    -- There exists a point...
    aᶠ : A
    -- Such that applying the monotonic function doesn't change the result.
    aᶠ≈faᶠ : aᶠ ≈ f aᶠ
 ```
 Moreover, the value we'll get out of the fixed point algorithm will be
 the _least fixed point_. For us, this means that the result will be
 "the most specific result possible".
 {{< codelines "Agda" "agda-spa/Fixedpoint.agda" 80 80 >}}
 The above explanation omits a lot of details, but it's a start. To get more
 precise, we must drill down into several aspects of what I've said so far.
 The first of them is, __how can we compare program information using an order?__
 ### Lattices
 Let's start with a question: when it comes to our specificity-based order,
 is `-` less than, greater than, or equal to `+`? Surely it's not less specific;
 knowing that a number is negative doesn't give you less information than
 knowing if that number is positive. Similarly, it's not any more specific, for
 the same reason. You could consider it equally specific, but that doesn't
 seem quite right either; the information is different, so comparing specificity
 feels apples-to-oranges. On the other hand, both `+` and `-` are clearly
 more specific than `unknown`.
 The solution to this conundrum is to simply refuse to compare certain elements:
 `+` is neither less than, greater than, nor equal to `-`, but `+ < unknown` and
 `- < unknown`. Such an ordering is called a [partial order](https://en.wikipedia.org/wiki/Partially_ordered_set).
 Next, another question. Suppose that the user writes code like this:
 ```
 if someCondition {
  x = exprA;
 } else {
  x = exprB;
 }
 y = x;
 ```
 If `exprA` has sign `s1`, and `exprB` has sign `s2`, what's the sign of `y`?
 It's not necessarily `s1` nor `s2`, since they might not match: `s1` could be `+`,
 and `s2` could be `-`, and using either `+` or `-` for `y` would be incorrect.
 We're looking for something that can encompass _both_ `s1` and `s2`.
 Necessarily, it would be either equally specific or less specific than
 either `s1` or `s2`: there isn't any new information coming in about `x`,
 and since we don't know which branch is taken, we stand to lose a little
 bit of info. However, our goal is always to maximize specificity, since
 more specific signs give us more information about our program.
 This gives us the following constraints. Since the combined sign `s` has to
 be equally or less specific than either `s1` and `s2`, we have `s1 <= s` and
 `s2 <= s`. However, we want to pick `s` such that it's more specific than
 any other "combined sign" candidate. Thus, if there's another sign `t`,
 with `s1 <= t` and `s2 <= t`, then it must be less specific than `s`: `s <= t`.
 At first, the above constraints might seem quite complicated. We can interpret
 them in more familiar territory by looking at numbers instead of signs.
 If we have two numbers `n1` and `n2`, what number is the smallest number
 that's bigger than either `n1` or `n2`? Why, the maximum of the two, of course!
 There is a reason why I used the constraints above instead of just saying
 "maximum". For numbers, `max(a,b)` is either `a` or `b`. However, we saw earlier
 that neither `+` nor `-` works as the sign for `y` in our program. Moreover,
 we agreed above that our order is _partial_: how can we pick "the bigger of two
 elements" if neither is bigger than the other? `max` itself doesn't quite work,
 but what we're looking for is something similar. Instead, we simply require a
 similar function for our signs. We call this function "[least upper bound](https://en.wikipedia.org/wiki/Least-upper-bound_property)",
 since it is the "least (most specific)
 element that's greater (less specific) than either `s1` or `s2`". Conventionally,
 this function is written as \\(a \\sqcup b\\) (or in our case, \\(s_1 \\sqcup s_2\\)).
 We can define it for our signs so far using the following [Cayley table](https://en.wikipedia.org/wiki/Cayley_table).
 {{< latex >}}
 \begin{array}{c|cccc}
 \sqcup & - & 0 & + & ? \\
 \hline
 - & - & ? & ? & ? \\
 0 & ? & 0 & ? & ? \\
 + & ? & ? & + & ? \\
 ? & ? & ? & ? & ? \\
 \end{array}
 {{< /latex >}}
 By using the above table, we can see that \\((+\ \\sqcup\ -)\ =\ ?\\) (aka `unknown`).
 This is correct; given the four signs we're working with, that's the most we can say.
 Let's explore the analogy to the `max` function a little bit more, by observing
 that this function has certain properties:
 * `max(a, a) = a`. The maximum of one number is just that number.
  Mathematically, this property is called _idempotence_. Note that
  by inspecting the diagonal of the above table, we can confirm that our
  \\((\\sqcup)\\) function is idempotent.
 * `max(a, b) = max(b, a)`. If you're taking the maximum of two numbers,
  it doesn't matter which one you consider first. This property is called
  _commutativity_. Note that if you mirror the table along the diagonal,
  it doesn't change; this shows that our \\((\\sqcup)\\) function is
  commutative.
 * `max(a, max(b, c)) = max(max(a, b), c)`. When you have three numbers,
  and you're determining the maximum value, it doesn't matter which pair of
  numbers you compare first. This property is called _associativity_. You
  can use the table above to verify the \\((\\sqcup)\\) is associative, too.
 A set that has a binary operation (like `max` or \\((\\sqcup)\\)) that
 satisfies the above properties is called a [semilattice](https://en.wikipedia.org/wiki/Semilattice). In Agda, we can write this definition roughly as follows:
 ```Agda
 record IsSemilattice {a} (A : Set a) (_⊔_ : A → A → A) : Set a where
    field
        ⊔-assoc : (x y z : A) → ((x ⊔ y) ⊔ z) ≡ (x ⊔ (y ⊔ z))
        ⊔-comm : (x y : A) → (x ⊔ y) ≡ (y ⊔ x)
        ⊔-idemp : (x : A) → (x ⊔ x) ≡ x
 ```
 It turns out to be convenient, however, to not require definitional equality
 (`≡`). For instance, we might model sets as lists. Definitional equality
 would force us to consider lists with the same elements but a different
 order to be unequal. Instead, we parameterize our definition of `IsSemilattice`
 by a binary relation `_≈_`, which we ask to be an [equivalence relation](https://en.wikipedia.org/wiki/Equivalence_relation).
 {{< codelines "Agda" "agda-spa/Lattice.agda" 23 39 >}}
 Notice that the above code also provides -- but doesn't require -- `_≼_` and
 `_≺_`. That's because a least-upper-bound operation encodes an order:
 intuitively, if `max(a, b) = b`, then `b` must be larger than `a`.
 Lars Hupel's CRDT series includes [an explanation](https://lars.hupel.info/topics/crdt/03-lattices/#there-) of how the ordering operator
 and the "least upper bound" function can be constructed from one another.
 As it turns out, the `min` function has very similar properties to `max`:
 it's idempotent, commutative, and associative. For a partial order like
 ours, the analog to `min` is "greatest lower bound", or "the largest value
 that's smaller than both inputs". Such a function is denoted as \\(a\\sqcap b\\).
 Intuitively, where \\(s_1 \\sqcup s_2\\) means "combine two signs where
 you don't know which one will be used" (like in an `if`/`else`), 
 \\(s_1 \\sqcap s_2\\) means "combine two signs where you know both of
 them to be true". For example, \\((+\ \\sqcap\ ?)\ =\ +\\), because a variable
 that's both "any sign" and "positive" must be positive.
 There's just one hiccup: what's the greatest lower bound of `+` and `-`?
 it needs to be a value that's less than both of them, but so far, we don't have
 such a value. Intuitively, this value should be called something like `impossible`,
 because a number that's both positive and negative doesn't exist. So, let's
 extend our analyzer to have a new `impossible` value. In fact, it turns
 out that this "impossible" value is the least element of our set (we added
 it to be the lower bound of `+` and co., which in turn are less than `unknown`).
 Similarly, `unknown` is the largest element of our set, since it's greater
 than `+` and co, and transitively greater than `impossible`. In mathematics,
 it's not uncommon to define the least element as \\(\\bot\\) (read "bottom"), and the
 greatest element as \\(\\top\\) (read "top"). With that in mind, the
 following are the updated Cayley tables for our operations.
 {{< latex >}}
 \begin{array}{c|ccccc}
 \sqcup & - & 0 & + & \top & \bot \\
 \hline
 - & - & \top & \top & \top & - \\
 0 & \top & 0 & \top & \top & 0 \\
 + & \top & \top & + & \top & + \\
 \top & \top & \top & \top & \top & \top \\
 \bot & - & 0 & + & \top & \bot \\
 \end{array}
 \qquad
 \begin{array}{c|ccccc}
 \sqcap & - & 0 & + & \top & \bot \\
 \hline
 - & - & \bot & \bot & - & \bot \\
 0 & \bot & 0 & \bot & 0 & \bot \\
 + & \bot & \bot & + & + & \bot \\
 \top & - & 0 & + & \top & \bot \\
 \bot & \bot & \bot & \bot & \bot & \bot \\
 \end{array}
 {{< /latex >}}
 So, it turns out that our set of possible signs is a semilattice in two
 ways. And if "semi" means "half", does two "semi"s make a whole? Indeed it does!
 A lattice is made up of two semilattices. The operations of these two lattices,
 however, must satisfy some additional properties. Let's examine the properties
 in the context of `min` and `max` as we have before. They are usually called 
 the _absorption laws_:
 * `max(a, min(a, b)) = a`. `a` is either less than or bigger than `b`;
  so if you try to find the maximum __and__ the minimum of `a` and
  `b`, one of the operations will return `a`.
 * `min(a, max(a, b)) = a`. The reason for this one is the same as
  the reason above.
 In Agda, we can therefore write a lattice as follows:
 {{< codelines "Agda" "agda-spa/Lattice.agda" 153 163 >}}
 ### Concrete Example: