240 lines
12 KiB
Markdown
240 lines
12 KiB
Markdown
---
|
||
title: "Implementing and Verifying \"Static Program Analysis\" in Agda, Part 2: Combining Lattices"
|
||
series: "Static Program Analysis in Agda"
|
||
description: "In this post, I describe how lattices can be combined to create other, more complex lattices"
|
||
date: 2024-05-30T19:37:00-07:00
|
||
draft: true
|
||
tags: ["Agda", "Programming Languages"]
|
||
---
|
||
|
||
In the previous post, I wrote about how lattices arise when tracking, comparing
|
||
and combining static information about programs. I then showed two simple lattices:
|
||
the natural numbers, and the (parameterized) "above-below" lattice, which
|
||
modified an arbitrary set with "bottom" and "top" elements (\(\bot\) and \(\top\)
|
||
respectively). One instance of the "above-below" lattice was the sign lattice,
|
||
which could be used to reason about the signs (positive, negative, or zero)
|
||
of variables in a program.
|
||
|
||
At the end of that post, I introduced a source of complexity: the "full"
|
||
lattices that we want to use for the program analysis aren't signs or numbers,
|
||
but maps of states and variables to lattice-based descriptions. The full lattice
|
||
for sign analysis might something in the form:
|
||
|
||
{{< latex >}}
|
||
\text{Info} \triangleq \text{ProgramStates} \to (\text{Variables} \to \text{Sign})
|
||
{{< /latex >}}
|
||
|
||
Thus, we have to compare and find least upper bounds (e.g.) of not just
|
||
signs, but maps! Proving the various lattice laws for signs was not too
|
||
challenging, but for for a two-level map like \(\text{Info}\) above, we'd
|
||
need to do a lot more work. We need tools to build up such complicated lattices.
|
||
|
||
The way to do this, it turns out, is by using simpler lattices as building blocks.
|
||
To start with, let's take a look at a very simple way of combining lattices:
|
||
taking the [Cartesian product](https://mathworld.wolfram.com/CartesianProduct.html).
|
||
|
||
### The Cartesian Product Lattice
|
||
|
||
Suppose you have two lattices \(L_1\) and \(L_2\). As I covered in the previous
|
||
post, each lattice comes equipped with a "least upper bound" operator \((\sqcup)\)
|
||
and a "greatest lower bound" operator \((\sqcap)\). Since we now have two lattices,
|
||
let's use numerical suffixes to disambiguate between the operators
|
||
of the first and second lattice: \((\sqcup_1)\) will be the LUB operator of
|
||
the first lattice \(L_1\), and \((\sqcup_2)\) of the second lattice \(L_2\),
|
||
and so on.
|
||
|
||
Then, let's take the Cartesian product of the elements of \(L_1\) and \(L_2\);
|
||
mathematically, we'll write this as \(L_1 \times L_2\), and in Agda, we can
|
||
just use the standard [`Data.Product`](https://agda.github.io/agda-stdlib/master/Data.Product.html)
|
||
module. Then, I'll define the lattice as another [parameterized module](https://agda.readthedocs.io/en/latest/language/module-system.html#parameterised-modules). Since both \(L_1\) and \(L_2\)
|
||
are lattices, this parameterized module will require `IsLattice` instances
|
||
for both types:
|
||
|
||
{{< codelines "Agda" "agda-spa/Lattice/Prod.agda" 1 7 >}}
|
||
|
||
Elements of \(L_1 \times L_2\) are in the form \((l_1, l_2)\), where
|
||
\(l_1 \in L_1\) and \(l_2 \in L_2\). The first thing we can get out of the
|
||
way is defining what it means for two such elements to be equal. Recall that
|
||
we opted for a [custom equivalence relation]({{< relref "01_spa_agda_lattices#definitional-equality" >}})
|
||
instead of definitional equality to allow similar elements to be considered
|
||
equal; we'll have to define a similar relation for our new product lattice.
|
||
That's easy enough: we have an equality predicate `_≈₁_` that checks if an element
|
||
of \(L_1\) is equal to another, and we have `_≈₂_` that does the same for \(L_2\).
|
||
It's reasonable to say that _pairs_ of elements are equal if their respective
|
||
first and second elements are equal:
|
||
|
||
{{< latex >}}
|
||
(l_1, l_2) \approx (j_1, j_2) \iff l_1 \approx_1 j_1 \land l_2 \approx_2 j_2
|
||
{{< /latex >}}
|
||
|
||
In Agda:
|
||
|
||
{{< codelines "Agda" "agda-spa/Lattice/Prod.agda" 39 40 >}}
|
||
|
||
Verifying that this relation has the properties of an equivalence relation
|
||
boils down to the fact that `_≈₁_` and `_≈₂_` are themselves equivalence
|
||
relations.
|
||
|
||
{{< codelines "Agda" "agda-spa/Lattice/Prod.agda" 42 48 >}}
|
||
|
||
Defining \((\sqcup)\) and \((\sqcap)\) by simply applying the
|
||
corresponding operators from \(L_1\) and \(L_2\) seems quite natural as well.
|
||
|
||
{{< latex >}}
|
||
(l_1, l_2) \sqcup (j_1, j_2) \triangleq (l_1 \sqcup_1 j_1, l_2 \sqcup_2 j_2) \\
|
||
(l_1, l_2) \sqcap (j_1, j_2) \triangleq (l_1 \sqcap_1 j_1, l_2 \sqcap_2 j_2)
|
||
{{< /latex >}}
|
||
|
||
As an example, consider the product lattice \(\text{Sign}\times\text{Sign}\),
|
||
which is made up of pairs of signs that we talked about in the previous
|
||
post. Two elements of this lattice are \((+, +)\) and \((+, -)\). Here's
|
||
how the \((\sqcup)\) operation is evaluated on them:
|
||
|
||
{{< latex >}}
|
||
(+, +) \sqcup (+, -) = (+ \sqcup + , + \sqcup -) = (+ , \top)
|
||
{{< /latex >}}
|
||
|
||
In Agda, the definition is written very similarly to its mathematical form:
|
||
|
||
{{< codelines "Agda" "agda-spa/Lattice/Prod.agda" 50 54 >}}
|
||
|
||
All that's left is to prove the various (semi)lattice properties. Intuitively,
|
||
we can see that since the "combined" operator `_⊔_` just independently applies
|
||
the element operators `_⊔₁_` and `_⊔₂_`, as long as they are idempotent,
|
||
commutative, and associative, so is the "combined" operator itself.
|
||
Moreover, the proofs that `_⊔_` and `_⊓_` form semilattices are identical
|
||
up to replacing \((\sqcup)\) with \((\sqcap)\). Thus, in Agda, we can write
|
||
the code once, parameterizing it by the binary operators involved (and proofs
|
||
that these operators obey the semilattice laws).
|
||
|
||
{{< codelines "Agda" "agda-spa/Lattice/Prod.agda" 56 82 >}}
|
||
|
||
Above, I used `f₁` to stand for "either `_⊔₁_` or `_⊓₁_`", and similarly
|
||
`f₂` for "either `_⊔₂_` or `_⊓₂_`". Much like the semilattice properties,
|
||
proving lattice properties boils down to applying the lattice properties of
|
||
\(L_1\) and \(L_2\) to individual components.
|
||
|
||
{{< codelines "Agda" "agda-spa/Lattice/Prod.agda" 84 96 >}}
|
||
|
||
This concludes the definition of the product lattice, which is made up of
|
||
two other lattices. If we have a type of analysis that can be expressed as
|
||
{{< sidenote "right" "pair-note" "a pair of two signs," >}}
|
||
Perhaps the signs are the smallest and largest possible values of a variable.
|
||
{{< /sidenote >}} for example, we won't have to do all the work of
|
||
proving the (semi)lattice properties of those pairs. In fact, we can build up
|
||
even bigger data structures. By taking a product a product twice, like
|
||
\(L_1 \times (L_2 \times L_3)\), we can construct a lattice of 3-tuples. Any
|
||
of the lattices involved in that product can itself be a product; we can
|
||
therefore create lattices out of arbitrary bundles of data, so long as
|
||
the smallest pieces that make up the bundles are themselves lattices.
|
||
|
||
Products will come very handy a bit later in this series. For now though,
|
||
our goal is to create another type of lattice: the map lattice. We will
|
||
take the same approach we did with products: assuming the elements of the
|
||
map are lattices, we'll prove that the map itself is a lattice. Then, just
|
||
like we could put products inside products when building up lattices, we'll
|
||
be able to put a map inside a map. This will allow us to represent the
|
||
\(\text{Info}\) lattice, which is a map of maps.
|
||
|
||
### The Map Lattice
|
||
|
||
When I say "map", what I really means is something that associates keys with
|
||
values, like [dictionaries in Python](https://docs.python.org/3/tutorial/datastructures.html#dictionaries).
|
||
This data structure need not have a value for every possible key; a very precise
|
||
author might call such a map a "partial map". We might have a map
|
||
whose value (in Python-ish notation) is `{ "x": +, "y": - }`. Such a map states
|
||
that the sign of the variable `x` is `+`, and the sign of variable `y` is
|
||
`-`. Another possible map is `{ "y": +, "z": - }`; this one states that
|
||
the sign of `y` is `+`, and the sign of another variable `z` is `-`.
|
||
|
||
Let's start thinking about what sorts of lattices our maps will be.
|
||
The thing that [motivated our introduction]({{< relref "01_spa_agda_lattices#specificity" >}})
|
||
of lattices was comparing them by "specificity", so let's try figure out how
|
||
to compare maps. For that, we can begin small, by looking at singleton maps.
|
||
If we have `{"x": +}` and `{"x": ⊤}`, which one of them is smaller? Well, we have
|
||
previously established that `+` is more specific (and thus less than) `⊤`. Thus,
|
||
it shouldn't be too much of a stretch to say that for singleton maps of the same
|
||
key, the one with the smaller value is smaller.
|
||
|
||
Now, what about a pair of singleton maps like `{"x": +}` and `{"y": ⊤}`? Among
|
||
these two, each contains some information that the other does not. Although the
|
||
value of `y` is larger than the value of `x`, it describes a different key, so
|
||
it seems wrong to use that to call the `y`-singleton "larger". Let's call
|
||
these maps incompatible, then. More generally, if we have two maps and each one
|
||
has a key that the other doesn't, we can't compare them.
|
||
|
||
If only one map has a unique key, though, things are different. Take for
|
||
instance `{"x": +}` and `{"x": +, "y": +}`. Are they really incomparable?
|
||
The keys that the two maps do share can be compared (`+ <= +`, because they're
|
||
equal).
|
||
|
||
|
||
All of the above leads to the following conventional definition, which I find
|
||
easier to further motivate using \((\sqcup)\) and \((\sqcap)\)
|
||
(and [do so below]({{< relref "#union-as-or" >}})).
|
||
|
||
> A map `m1` is less than or equal to another map `m2` (`m1 <= m2`) if for
|
||
> every key `k` that has a value in `m1`, the key also has a value in `m2`, and
|
||
> `m1[k] <= m2[k]`.
|
||
|
||
That definitions matches our intuitions so far. The only key in `{"x": +}` is `x`;
|
||
this key is also in `{"x": ⊤}` (check) and `+ < ⊤` (check). On the other hand,
|
||
both `{"x": +}` and `{"y": ⊤}` have a key that the other doesn't, so the
|
||
definition above is not satisfied. Finally, for `{"x": +}` and
|
||
`{"x": +, "y": +}`, the only key in the former is also present in the latter,
|
||
and `+ <= +`; the definition is satisfied.
|
||
|
||
Next, we need to define the \((\sqcup)\) and \((\sqcap)\) operators that match
|
||
our definition of "less than or equal". Let's start with \((\sqcup)\). For two
|
||
maps \(m_1\) and \(m_2\), the join of those two maps, \(m_1 \sqcup m_2\) should
|
||
be greater than or equal to both; in other words, both sub-maps should be less
|
||
than or equal to the join.
|
||
|
||
Our newly-introduced condition for "less than or equal"
|
||
requires that each key in the smaller map be present in the larger one; as
|
||
a result, \(m_1 \sqcup m_2\) should contain all the keys in \(m_1\) __and__
|
||
all the keys in \(m_2\). So, we could just take the union of the two maps:
|
||
copy values from both into the result. Only, what happens if both \(m_1\)
|
||
and \(m_2\) have a value mapped to a particular key? The values in the two
|
||
maps could be distinct, and they might even be incomparable. This is where the
|
||
second part of the condition kicks in: the value in the combination of the
|
||
maps needs to be bigger than the value in either sub-map. We already know how
|
||
to get a value that's bigger than two other values: we use a join on the
|
||
values!
|
||
|
||
Thus, define \(m_1 \sqcup m_2\) as a map that has all the keys
|
||
from \(m_1\) and \(m_2\), where the value at a particular key is given
|
||
as follows:
|
||
|
||
{{< latex >}}
|
||
(m_1 \sqcup m_2)[k] =
|
||
\begin{cases}
|
||
m_1[k] \sqcup m_2[k] & k \in m_1, k \in m_2 \\
|
||
m_1[k] & k \in m_1, k \notin m_2 \\
|
||
m_2[k] & k \notin m_1, k \in m_2
|
||
\end{cases}
|
||
{{< /latex >}}
|
||
|
||
If you're familiar with set theory, this operation is like
|
||
{{< sidenote "right" "map-union-note" "an extension of the union operator \((\cup)\)" >}}
|
||
There are, of course, other ways to extend the "union" operation to maps.
|
||
Haskell, for instance, defines it in a "left-biased" way (preferring the
|
||
elements from the left operand of the operation when duplicates are encountered).<br>
|
||
<br>
|
||
However, with a "join" operation \((\sqcup)\) that's defined on the values
|
||
stored in the map gives us an extra tool to work with. As a result, I would
|
||
argue that our extension, given such an operator, is the most natural.
|
||
{{< /sidenote >}} to maps. In fact, this begins to motivate
|
||
the choice to use \((\sqcup)\) to denote this operation. A further bit of
|
||
motivation is this:
|
||
[we've already seen]({{< relref "01_spa_agda_lattices#lub-glub-or-and" >}})
|
||
that the \((\sqcup)\) and \((\sqcap)\) operators correspond to "or"
|
||
and "and". The elements in the union of two sets are precisely
|
||
those that are in one set __or__ the other. Thus, using union here fits our
|
||
notion of how the \((\sqcup)\) operator behaves.
|
||
{#union-as-or}
|
||
|
||
{{< todo >}}
|
||
I started using 'join' but haven't introduced it before.
|
||
{{< /todo >}}
|