24 KiB
title | series | description | date | tags | ||
---|---|---|---|---|---|---|
Implementing and Verifying "Static Program Analysis" in Agda, Part 1: Lattices | Static Program Analysis in Agda | In this post, I introduce an algebraic structure called a lattice, which underpins certain program analyses | 2024-07-06T17:37:43-07:00 |
|
This is the first post in a series on [static program analysis in Agda]({{< relref "static-program-analysis-in-agda" >}}). See the [introduction]({{< relref "00_spa_agda_intro" >}}) for a little bit more context.
The goal of this post is to motivate the algebraic structure called a lattice. Lattices have {{< sidenote "right" "crdt-note" "broad applications" >}} See, for instance, Lars Hupel's excellent introduction to CRDTs which uses lattices for Conflict-Free Replicated Data Types. CRDTs can be used to implement peer-to-peer distributed systems. {{< /sidenote >}} beyond static program analysis, so the work in this post is interesting in its own right. However, for the purposes of this series, I'm most interested in lattices as an encoding of program information when performing analysis. To start motivating lattices in that context, I'll need to start with monotone frameworks.
Monotone Frameworks
The key notion for monotone frameworks is the "specificity" of information.
Take, for instance, an analyzer that tries to figure out if a variable is
positive, negative, or equal to zero (this is called a sign analysis, and
we'll be using this example a lot). Of course, the variable could be "none
of the above" -- perhaps if it was initialized from user input, which would
allow both positive and negative numbers. Such an analyzer might return
+
, -
, 0
, or unknown
for any given variable. These outputs are not
created equal: if a variable has sign +
, we know more about it than if
the sign is unknown
: we've ruled out negative numbers as possible values!
Specificity is important to us because we want our analyses to be as precise
as possible. It would be valid for a program analysis to just return
unknown
for everything, but it wouldn't be very useful. Thus, we want to
rank possible outputs, and try pick the most specific one. The
{{< sidenote "right" "convention-note" "convention" -12 >}}
I say convention, because it doesn't actually matter if we represent more
specific values as "larger" or "smaller". Given a lattice with a particular
order written as <
, we can flip the sign in all relations
(turning a < b
into a > b
), and get back another
lattice. This lattice will have the same properties (more precisely,
the properties will be
dual). So
we shouldn't fret about picking a direction for "what's less than what".
{{< /sidenote >}}
seems to be to make
{{< sidenote "right" "order-note" "more specific things "smaller"" 1 >}}
Admittedly, it's a little bit odd to say that something which is "more" than
something else is actually smaller. The intuition that I favor is that
something that's more specific describes fewer objects: there are less
white horses than horses, so "white horse" is more specific than "horse".
The direction of <
can be thought of as comparing the number
of objects.
Note that this is only an intuition; there are equally many positive and
negative numbers, but we will not group them together
in our order.
{{< /sidenote >}},
and less specific things "larger". Coming back to our previous example, we'd
write + < unknown
, since +
is more specific. Of course, the exact
things we're trying to rank depend on the sort of analysis we're trying to
perform. Since I introduced sign analysis, we're ranking signs like +
and -
.
For other analyses, the elements will be different. The comparison, however,
will be a permanent fixture.
{#specificity}
Suppose now that we have some program analysis, and we're feeding it some input
information. Perhaps we're giving it the signs of variables x
and y
, and
hoping for it to give us the sign of a third variable z
. It would be very
unfortunate if, when given more specific information, the analysis would return
a less specific output! The more you know going in, the more you should know
coming out. Similarly, when given less specific / vaguer information, the
analysis shouldn't produce a more specific answer -- how could it do that?
This leads us to come up with the following rule:
{{< latex >}} \textbf{if}\ \text{input}_1 \le \text{input}_2, \textbf{then}\ \text{analyze}(\text{input}_1) \le \text{analyze}(\text{input}_2) {{< /latex >}}
In mathematics, such a property is called monotonicity. We say that "analyze" is a monotonic function. This property gives its name to monotone frameworks. For our purposes, this property means that being more specific "pays off": better information in means better information out. In Agda, we can encode monotonicity as follows:
{{< codelines "Agda" "agda-spa/Lattice.agda" 17 21 >}}
Note that above, I defined Monotonic
on an arbitrary function, whose
outputs might be of a different type than its inputs. This will come in handy
later.
The order <
of our elements and the monotonicity of our analysis are useful
to us for another reason: they help gauge and limit, in a roundabout way, how much
work might be left for our analysis to do. This matters because we don't want
to allow analyses that can take forever to finish -- that's a little too long
for a pragmatic tool used by people.
The key observation -- which I will describe in detail in a later post -- is that a monotonic analysis, in a way, "climbs upwards" through an order. As we continue using this analysis to refine information over and over, its results get {{< sidenote "right" "less-specific-note" "less and less specific." >}} It is not a bad thing for our results to get less specific over time, because our initial information is probably incomplete. If you've only seen German shepherds in your life, that might be your picture of what a dog is like. If you then come across a chihuahua, your initial definition of "dog" would certainly not accommodate it. To allow for both German shepherds and chihuahuas, you'd have to loosen the definition of "dog". This new definition would be less specific, but it would be more accurate. {{< /sidenote >}} If we add an additional ingredient, and say that the order has a fixed height, we can deduce that the analysis will eventually stop producing additional information: either it will keep "climbing", and reach the top (thus having to stop), or it will stop on its own before reaching the top. This is the essence of the fixed-point algorithm, which in Agda-like pseudocode can be stated as follows:
module _ (IsFiniteHeight A ≺)
(f : A → A)
(Monotonicᶠ : Monotonic _≼_ _≼_ f) where
-- There exists a point...
aᶠ : A
-- Such that applying the monotonic function doesn't change the result.
aᶠ≈faᶠ : aᶠ ≈ f aᶠ
Moreover, the value we'll get out of the fixed point algorithm will be the least fixed point. For us, this means that the result will be "the most specific result possible".
{{< codelines "Agda" "agda-spa/Fixedpoint.agda" 80 80 >}}
The above explanation omits a lot of details, but it's a start. To get more precise, we must drill down into several aspects of what I've said so far. The first of them is, how can we compare program information using an order?
Lattices
Let's start with a question: when it comes to our specificity-based order,
is -
less than, greater than, or equal to +
? Surely it's not less specific;
knowing that a number is negative doesn't give you less information than
knowing if that number is positive. Similarly, it's not any more specific, for
the same reason. You could consider it equally specific, but that doesn't
seem quite right either; the information is different, so comparing specificity
feels apples-to-oranges. On the other hand, both +
and -
are clearly
more specific than unknown
.
The solution to this conundrum is to simply refuse to compare certain elements:
+
is neither less than, greater than, nor equal to -
, but + < unknown
and
- < unknown
. Such an ordering is called a partial order.
Next, another question. Suppose that the user writes code like this:
if someCondition {
x = exprA;
} else {
x = exprB;
}
y = x;
If exprA
has sign s1
, and exprB
has sign s2
, what's the sign of y
?
It's not necessarily s1
nor s2
, since they might not match: s1
could be +
,
and s2
could be -
, and using either +
or -
for y
would be incorrect.
We're looking for something that can encompass both s1
and s2
.
Necessarily, it would be either equally specific or less specific than
either s1
or s2
: there isn't any new information coming in about x
,
and since we don't know which branch is taken, we stand to lose a little
bit of info. However, our goal is always to maximize specificity, since
more specific signs give us more information about our program.
This gives us the following constraints. Since the combined sign s
has to
be equally or less specific than either s1
and s2
, we have s1 <= s
and
s2 <= s
. However, we want to pick s
such that it's more specific than
any other "combined sign" candidate. Thus, if there's another sign t
,
with s1 <= t
and s2 <= t
, then it must be less specific than s
: s <= t
.
At first, the above constraints might seem quite complicated. We can interpret
them in more familiar territory by looking at numbers instead of signs.
If we have two numbers n1
and n2
, what number is the smallest number
that's bigger than either n1
or n2
? Why, the maximum of the two, of course!
There is a reason why I used the constraints above instead of just saying
"maximum". For numbers, max(a,b)
is either a
or b
. However, we saw earlier
that neither +
nor -
works as the sign for y
in our program. Moreover,
we agreed above that our order is partial: how can we pick "the bigger of two
elements" if neither is bigger than the other? max
itself doesn't quite work,
but what we're looking for is something similar. Instead, we simply require a
similar function for our signs. We call this function "least upper bound",
since it is the "least (most specific)
element that's greater (less specific) than either s1
or s2
". Conventionally,
this function is written as a \sqcup b
(or in our case, s_1 \sqcup s_2
).
The (\sqcup)
symbol is also called the join of a
and b
.
We can define it for our signs so far using the following Cayley table.
{{< latex >}} \begin{array}{c|cccc} \sqcup & - & 0 & + & ? \ \hline
- & - & ? & ? & ? \ 0 & ? & 0 & ? & ? \
- & ? & ? & + & ? \ ? & ? & ? & ? & ? \ \end{array} {{< /latex >}}
By using the above table, we can see that (+\ \sqcup\ -)\ =\ ?
(aka unknown
).
This is correct; given the four signs we're working with, that's the most we can say.
Let's explore the analogy to the max
function a little bit more, by observing
that this function has certain properties:
max(a, a) = a
. The maximum of one number is just that number. Mathematically, this property is called idempotence. Note that by inspecting the diagonal of the above table, we can confirm that our(\sqcup)
function is idempotent.max(a, b) = max(b, a)
. If you're taking the maximum of two numbers, it doesn't matter which one you consider first. This property is called commutativity. Note that if you mirror the table along the diagonal, it doesn't change; this shows that our(\sqcup)
function is commutative.max(a, max(b, c)) = max(max(a, b), c)
. When you have three numbers, and you're determining the maximum value, it doesn't matter which pair of numbers you compare first. This property is called associativity. You can use the table above to verify the(\sqcup)
is associative, too.
A set that has a binary operation (like max
or (\sqcup)
) that
satisfies the above properties is called a semilattice. In Agda, we can write this definition roughly as follows:
record IsSemilattice {a} (A : Set a) (_⊔_ : A → A → A) : Set a where
field
⊔-assoc : (x y z : A) → ((x ⊔ y) ⊔ z) ≡ (x ⊔ (y ⊔ z))
⊔-comm : (x y : A) → (x ⊔ y) ≡ (y ⊔ x)
⊔-idemp : (x : A) → (x ⊔ x) ≡ x
Note that this is an example of the ["Is Something" pattern]({{< relref "agda_is_pattern" >}}).
It turns out to be convenient, however, to not require definitional equality
(≡
). For instance, we might model sets as lists. Definitional equality
would force us to consider lists with the same elements but a different
order to be unequal. Instead, we parameterize our definition of IsSemilattice
by a binary relation _≈_
, which we ask to be an equivalence relation.
{#definitional-equality}
{{< codelines "Agda" "agda-spa/Lattice.agda" 23 39 >}}
Notice that the above code also provides -- but doesn't require -- _≼_
and
_≺_
. That's because a least-upper-bound operation encodes an order:
intuitively, if max(a, b) = b
, then b
must be larger than a
.
Lars Hupel's CRDT series includes an explanation of how the ordering operator
and the "least upper bound" function can be constructed from one another.
As it turns out, the min
function has very similar properties to max
:
it's idempotent, commutative, and associative. For a partial order like
ours, the analog to min
is "greatest lower bound", or "the largest value
that's smaller than both inputs". Such a function is denoted as a\sqcap b
,
and often called the "meet" of a
and b
.
As for what it means, where s_1 \sqcup s_2
means "combine two signs where
you don't know which one will be used" (like in an if
/else
),
s_1 \sqcap s_2
means "combine two signs where you know
{{< sidenote "right" "or-join-note" "both of them to be true" -7 >}}
If you're familiar with
Boolean algebra, this might look a little bit familiar to you. In fact,
the symbol for "and" on booleans is \land
. Similarly, the symbol
for "or" is \lor
. So, s_1 \sqcup s_2
means "the sign is s_1
or (s_2)",
or "(the sign is s_1
) \lor
(the sign is s_2
)". Similarly,
s_1 \sqcap s_2
means "(the sign is s_1
) \land
(the sign is s_2
)".
Don't these symbols look similar?
In fact, booleans with (\lor)
and (\land)
satisfy the semilattice
laws we've been discussing, and together form a lattice (to which I'm building
to in the main body of the text). The same is true for the set union and
intersection operations, (\cup)
and (\cap)
.
{{< /sidenote >}}". For example, (+\ \sqcap\ ?)\ =\ +
, because a variable
that's both "any sign" and "positive" must be positive.
{#lub-glub-or-and}
There's just one hiccup: what's the greatest lower bound of +
and -
?
it needs to be a value that's less than both of them, but so far, we don't have
such a value. Intuitively, this value should be called something like impossible
,
because a number that's both positive and negative doesn't exist. So, let's
extend our analyzer to have a new impossible
value. In fact, it turns
out that this "impossible" value is the least element of our set (we added
it to be the lower bound of +
and co., which in turn are less than unknown
).
Similarly, unknown
is the largest element of our set, since it's greater
than +
and co, and transitively greater than impossible
. In mathematics,
it's not uncommon to define the least element as \bot
(read "bottom"), and the
greatest element as \top
(read "top"). With that in mind, the
following are the updated Cayley tables for our operations.
{{< latex >}} \begin{array}{c|ccccc} \sqcup & - & 0 & + & \top & \bot \ \hline
- & - & \top & \top & \top & - \ 0 & \top & 0 & \top & \top & 0 \
- & \top & \top & + & \top & + \ \top & \top & \top & \top & \top & \top \ \bot & - & 0 & + & \top & \bot \ \end{array}
\qquad
\begin{array}{c|ccccc} \sqcap & - & 0 & + & \top & \bot \ \hline
- & - & \bot & \bot & - & \bot \ 0 & \bot & 0 & \bot & 0 & \bot \
- & \bot & \bot & + & + & \bot \ \top & - & 0 & + & \top & \bot \ \bot & \bot & \bot & \bot & \bot & \bot \ \end{array} {{< /latex >}} {#sign-lattice}
So, it turns out that our set of possible signs is a semilattice in two ways. And if "semi" means "half", does two "semi"s make a whole? Indeed it does!
A lattice is made up of two semilattices. The operations of these two lattices,
however, must satisfy some additional properties. Let's examine the properties
in the context of min
and max
as we have before. They are usually called
the absorption laws:
{#absorption-laws}
max(a, min(a, b)) = a
.a
is either less than or bigger thanb
; so if you try to find the maximum and the minimum ofa
andb
, one of the operations will returna
.min(a, max(a, b)) = a
. The reason for this one is the same as the reason above.
In Agda, we can therefore write a lattice as follows:
{{< codelines "Agda" "agda-spa/Lattice.agda" 153 163 >}}
Concrete Examples
Natural Numbers
Since we've been talking about min
and max
as motivators for properties
of (\sqcap)
and (\sqcup)
, it might not be all that surprising
that natural numbers form a lattice with min
and max
as the two binary
operations. In fact, the Agda standard library writes min
as _⊓_
and
max
as _⊔_
! We can make use of the already-proven properties of these
operators to easily define IsLattice
for natural numbers. Notice that
since we're not doing anything clever, like considering lists up to reordering,
there's no reason not to use definitional equality ≡
for our equivalence relation.
{{< codelines "Agda" "agda-spa/Lattice/Nat.agda" 1 45 >}}
The definition for the lattice instance itself is pretty similar; I'll omit
it here to avoid taking up a lot of vertical space, but you can find it
on lines 47 through 83 of [my Lattice.Nat
module]({{< codeurl "agda-spa/Lattice/Nat.agda" >}}).
The "Above-Below" Lattice
It's not too hard to implement our sign lattice in Agda. However, we can
do it in a somewhat general way. As it turns out, extending an existing set,
such as \{+, -, 0\}
, with a "bottom" and "top" element (to be used
when taking the least upper bound and greatest lower bound) is quite common
and useful. For instance, if we were to do constant propagation (simplifying
7+4
to 11
), we would probably do something similar, using the set
of integers \mathbb{Z}
instead of the plus-zero-minus set.
The general definition is as follows. Take some original set S
(like our 3-element
set of signs), and extend it with new "top" and "bottom" elements (\top
and
\bot
). Then, define (\sqcup)
as follows:
{{< latex >}} x_1 \sqcup x_2 = \begin{cases} \top & x_1 = \top\ \text{or}\ x_2 = \top \ \top & x_1, x_2 \in S, x_1 \neq x_2 \ x_1 = x_2 & x_1, x_2 \in S, x_1 = x_2 \ x_1 & x_2 = \bot \ x_2 & x_1 = \bot \end{cases} {{< /latex >}}
In other words, \top
overrules anything that it's combined with. In
math terms, it's the absorbing element of the lattice. On the other hand,
\bot
gets overruled by anything it's combined with. In math terms, that's
an identity element. Finally, when combining two elements that aren't
\top
or \bot
(which would otherwise be covered by the prior sentences),
combining an element with itself leaves it unchanged (upholding idempotence),
while combining two unequal element results in \top
. That last part
matches the way we defined "least upper bound" earlier.
The intuition is as follows: the (\sqcup)
operator is like an "or". Then,
"anything or positive" means "anything"; same with "anything or negative", etc.
On the other hand, "impossible or positive" means positive, since one of those
cases will never happen. Finally, in the absense of additional elements, the
most we can say about "positive or negative" is "any sign"; of course,
"positive or positive" is the same as "positive".
The "greatest lower bound" operator is defined by effectively swapping top and bottom.
{{< latex >}} x_1 \sqcup x_2 = \begin{cases} \bot & x_1 = \bot\ \text{or}\ x_2 = \bot \ \bot & x_1, x_2 \in S, x_1 \neq x_2 \ x_1 = x_2 & x_1, x_2 \in S, x_1 = x_2 \ x_1 & x_2 = \top \ x_2 & x_1 = \top \end{cases} {{< /latex >}}
For this operator, \bot
is the absorbing element, and \top
is the
identity element. The intuition here is not too different: if
(\sqcap)
is like an "and", then "impossible and positive" can't happen;
same with "impossible and negative", and so on. On the other hand,
"anything and positive" clearly means positive. Finally, "negative and positive"
can't happen (again, there is no number that's both positive and negative),
and "positive and positive" is just "positive".
What properties of the underlying set did we use to get this to work? The only thing we needed is to be able to check and see if two elements are equal or not; this is called decidable equality. Since that's the only thing we used, this means that we can define an "above/below" lattice like this for any type for which we can check if two elements are equal. In Agda, I encoded this using a parameterized module:
{{< codelines "Agda" "agda-spa/Lattice/AboveBelow.agda" 5 8 >}}
From there, I defined the actual data type as follows:
{{< codelines "Agda" "agda-spa/Lattice/AboveBelow.agda" 23 26 >}}
From there, I defined the (\sqcup)
and (\sqcap)
operations almost
exactly to the mathematical equation above (the cases were re-ordered to
improve Agda's reduction behavior). Here's the former:
{{< codelines "Agda" "agda-spa/Lattice/AboveBelow.agda" 86 93 >}}
And here's the latter:
{{< codelines "Agda" "agda-spa/Lattice/AboveBelow.agda" 181 188 >}}
The proofs of the lattice properties are straightforward and proceed by simple case analysis. Unfortunately, Agda doesn't quite seem to evaluate the binary operator in every context that I would expect it to, which has led me to define some helper lemmas such as the following:
{{< codelines "Agda" "agda-spa/Lattice/AboveBelow.agda" 95 96 >}}
As a sample, here's a proof of commutativity of (\sqcup)
:
{{< codelines "Agda" "agda-spa/Lattice/AboveBelow.agda" 158 165 >}}
The details of the rest of the proofs can be found in the
[AboveBelow.agda
file]({{< codeurl "agda-spa/Lattice/AboveBelow.agda" >}}).
To recover the sign lattice we've been talking about all along, it's sufficient to define a sign data type:
{{< codelines "Agda" "agda-spa/Analysis/Sign.agda" 19 22 >}}
Then, prove decidable equality on it (effecitly defining a comparison function),
and instantiate the AboveBelow
module:
{{< codelines "Agda" "agda-spa/Analysis/Sign.agda" 34 47 >}}
From Simple Lattices to Complex Ones
Natural numbers and signs alone are cool enough, but they will not be sufficient to write program analyzers. That's because when we're writing an analyzer, we don't just care about one variable: we care about all of them! An initial guess might be to say that when analyzing a program, we really need several signs: one for each variable. This might be reminiscent of a map. So, when we compare specificity, we'll really be comparing the specificity of maps. Even that, though, is not enough. The reason is that variables might have different signs at different points in the program! A single map would not be able to capture that sort of nuance, so what we really need is a map associating states with another map, which in turn associates variables with their signs.
Mathematically, we might write this as:
{{< latex >}} \text{Info} \triangleq \text{ProgramStates} \to (\text{Variables} \to \text{Sign}) {{< /latex >}}
That's a big step up in complexity. We now have a doubly-nested map structure
instead of just a sign. and we need to compare such maps in order to gaugage
their specificity and advance our analyses. But where do we even start with
maps, and how do we define the (\sqcup)
and (\sqcap)
operations?
The solution turns out to be to define ways in which simpler lattices (like our sign) can be combined and transformed to define more complex lattices. We'll move on to that in the next post of this series.