Finish draft of part 2, combining lattices
Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
This commit is contained in:
parent
df75d6e017
commit
1fb7e5ff85
|
@ -137,6 +137,7 @@ be able to put a map inside a map. This will allow us to represent the
|
||||||
\(\text{Info}\) lattice, which is a map of maps.
|
\(\text{Info}\) lattice, which is a map of maps.
|
||||||
|
|
||||||
### The Map Lattice
|
### The Map Lattice
|
||||||
|
#### The Theory
|
||||||
|
|
||||||
When I say "map", what I really means is something that associates keys with
|
When I say "map", what I really means is something that associates keys with
|
||||||
values, like [dictionaries in Python](https://docs.python.org/3/tutorial/datastructures.html#dictionaries).
|
values, like [dictionaries in Python](https://docs.python.org/3/tutorial/datastructures.html#dictionaries).
|
||||||
|
@ -234,6 +235,278 @@ those that are in one set __or__ the other. Thus, using union here fits our
|
||||||
notion of how the \((\sqcup)\) operator behaves.
|
notion of how the \((\sqcup)\) operator behaves.
|
||||||
{#union-as-or}
|
{#union-as-or}
|
||||||
|
|
||||||
|
Now, let's take a look at the \((\sqcap)\) operator. For two maps \(m_1\) and
|
||||||
|
\(m_2\), the meet of those two maps, \(m_1 \sqcap m_2\) should be less than
|
||||||
|
or equal to both. Our definition above requires that each key of the smaller
|
||||||
|
map is present in the larger map; for the combination of two maps to be
|
||||||
|
smaller than both, we must ensure that it only has keys present in both maps.
|
||||||
|
To combine the elements from the two maps, we can use the \((\sqcap)\) operator
|
||||||
|
on values.
|
||||||
|
|
||||||
|
{{< latex >}}
|
||||||
|
(m_1 \sqcap m_2)[k] = m_1[k] \sqcap m_2[k]
|
||||||
|
{{< /latex >}}
|
||||||
|
|
||||||
|
Turning once again to set theory, we can think of this operation like the
|
||||||
|
extension of the intersection operator \((\cup)\) to maps. This can be
|
||||||
|
motivated in the same way as the union operation above; the \((\sqcap)\)
|
||||||
|
operator combines lattice elements in such away that the result represents
|
||||||
|
both of them, and intersections of sets contain elements that are in __both__
|
||||||
|
sets.
|
||||||
|
|
||||||
|
Now we have the the two binary operators and the comparison function in hand.
|
||||||
|
There's just one detail we're missing: what it means for two maps to be
|
||||||
|
equivalent. Here, once again we take our cue from set theory: two sets are
|
||||||
|
said to be equal when each one is a subset of the other. Mathematically, we can
|
||||||
|
write this as follows:
|
||||||
|
|
||||||
|
{{< latex >}}
|
||||||
|
m_1 \approx m_2 \triangleq m_1 \subseteq m_2 \land m_1 \supseteq m_2
|
||||||
|
{{< /latex >}}
|
||||||
|
|
||||||
|
I might as well show you the Agda definition of this, since it's a word-for-word
|
||||||
|
transliteration:
|
||||||
|
|
||||||
|
{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 530 531 >}}
|
||||||
|
|
||||||
|
Okay, but we haven't actually defined what it means for one map to be a subset
|
||||||
|
of another. My definition is as follows: if \(m_1 \subseteq m_2\), that is,
|
||||||
|
if \(m_1\) is a subset of \(m_2\), then every key in \(m_1\) is also present
|
||||||
|
in \(m_2\), and they are mapped to the same value. My first stab at
|
||||||
|
a mathematical definition of this is the following:
|
||||||
|
|
||||||
|
{{< latex >}}
|
||||||
|
m_1 \subseteq m_2 \triangleq \forall k, v.\ (k, v) \in m_1 \Rightarrow (k, v) \in m_2
|
||||||
|
{{< /latex >}}
|
||||||
|
|
||||||
|
Only there's a slight complication; remember that our values themselves come
|
||||||
|
from a lattice, and that this lattice might use its own equivalence operator
|
||||||
|
\((\approx)\) to group similar elements. One example where this is important
|
||||||
|
is our now-familiar "map of maps" scenario: the values store in the "outer"
|
||||||
|
map are themselves maps, and we don't want the order of the keys or other
|
||||||
|
menial details of the inner maps to influence whether the outer maps are equal.
|
||||||
|
Thus, we settle for a more robust definition of \(m_1 \subseteq m_2\)
|
||||||
|
that allows \(m_1\) to have different-but-equivalent values from those
|
||||||
|
in \(m_2\).
|
||||||
|
|
||||||
|
{{< latex >}}
|
||||||
|
m_1 \subseteq m_2 \triangleq \forall k, v.\ (k, v) \in m_1 \Rightarrow \exists v'.\ v \approx v' \land (k, v') \in m_2
|
||||||
|
{{< /latex >}}
|
||||||
|
|
||||||
|
In Agda, the core of my definition is once again very close:
|
||||||
|
|
||||||
|
{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 98 99 >}}
|
||||||
|
|
||||||
|
#### The Implementation
|
||||||
|
|
||||||
|
Now it's time to show you how I implemented the Map lattice. I chose
|
||||||
|
represent maps using a list of key-value pairs, along with a condition
|
||||||
|
that the keys are unique (non-repeating). I chose this definition because
|
||||||
|
it was simple to implement, and because it makes it possible to iterate
|
||||||
|
over the keys of a map. That last property is useful if we use the maps
|
||||||
|
to later represent sets (which I did). Moreover, lists of key-value pairs are
|
||||||
|
easy to serialize and write to disk. This isn't hugely important for my
|
||||||
|
immediate static program analysis needs, but it might be nice in the future.
|
||||||
|
The requirement that the keys are unique prevents the map from being a multi-map
|
||||||
|
(which might have several values associated with a particular key).
|
||||||
|
|
||||||
|
My `Map` module is parameterized by the key and value types (`A` and `B`
|
||||||
|
respectively), and additionally requires some additional properties to
|
||||||
|
be satisfied by these types.
|
||||||
|
|
||||||
|
{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 6 10 >}}
|
||||||
|
|
||||||
|
For `A`, the key property is the
|
||||||
|
[decidability](https://en.wikipedia.org/wiki/Decidability_(logic)) of
|
||||||
|
equality: there should be a way to compare keys for equality. This is
|
||||||
|
important for all sorts of map operations. For example, when inserting a new
|
||||||
|
value into a map, we need to decide if the value is already present (so that
|
||||||
|
we know to override it), but if we can't check if two values are equal, we
|
||||||
|
can't see if it's already there.
|
||||||
|
|
||||||
|
The values of the map (represented by `B`) we expected to be lattices, so
|
||||||
|
we require them to provide the lattice operations \((\sqcup)\) and \((\sqcap)\),
|
||||||
|
as well as the equivalence relation \((\approx)\) and the proof of the lattice
|
||||||
|
properties in `isLattice`. To distinguish the lattice operations on `B`
|
||||||
|
from the ones we'll be defining on the map itself -- you might've
|
||||||
|
noticed that there's a bit of overleading going on in this post -- I've
|
||||||
|
suffixed them with the subscript `2`. My convention is to use the subscript
|
||||||
|
corresponding to the number of the type parameter. Here, `A` is "first" and `B`
|
||||||
|
is "second", so the operators on `B` get `2`.
|
||||||
|
|
||||||
|
From there, I define the map as a pair; the first component is the list of
|
||||||
|
key-value pairs, and the second is the proof that all the keys in the
|
||||||
|
list occur only once.
|
||||||
|
|
||||||
|
{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 480 481 >}}
|
||||||
|
|
||||||
|
Now, to implement union and intersection; for the most part, the proofs deal
|
||||||
|
just with the first component of the map -- the key-value pairs. For union,
|
||||||
|
the key operation is "insert-or-combine". We can think of merging two maps
|
||||||
|
as inserting all the keys from one map (arbitrary, the "left") into the
|
||||||
|
other. If a key is not in the "left" map, insertion won't do anything to its
|
||||||
|
prior value in the right map; similarly, if a key is not in the "right" map,
|
||||||
|
then it should appear unchanged in the final result after insertion. Finally,
|
||||||
|
if a key is inserted into the "right" map, but already has a value there, then
|
||||||
|
the two values need to be combined using `_⊔₂_`. This leads to the following
|
||||||
|
definition of `insert` on key-value pair lists:
|
||||||
|
|
||||||
|
{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 114 118 >}}
|
||||||
|
|
||||||
|
Above, `f` is just a stand-in for `_⊓₂_` (making the definition a tiny bit more general).
|
||||||
|
For each element in the "right" key-value list, we check if its key matches
|
||||||
|
the one we're inserting; if it does, we have to combine the values, and
|
||||||
|
there's no need to recurse into the rest of the list. If on the other hand
|
||||||
|
the key doesn't match, we move on to the next element of the list. If we
|
||||||
|
run out of elements, we know that the key we're inserting wasn't in the "right"
|
||||||
|
map, so we insert it as-is.
|
||||||
|
|
||||||
|
The union operation is just about inserting every pair from one map into another.
|
||||||
|
|
||||||
|
{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 120 121 >}}
|
||||||
|
|
||||||
|
Here, I defined my own version of `foldr` which unpacks the pairs, for
|
||||||
|
convenience:
|
||||||
|
|
||||||
|
{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 110 112 "" "**(Click here to see the definition of my `foldr`)**" >}}
|
||||||
|
|
||||||
|
For intersection, we do something similar; however, since only elements in
|
||||||
|
_both_ maps should be in the final output, if our "insertion" doesn't find
|
||||||
|
an existing key, it should just fall through; this can be achieved by defining
|
||||||
|
a version of `insert` whose base case simply throws away the input. Of course,
|
||||||
|
this function should also use `_⊓₂_` instead of `_⊔₂_`; below, though, I again
|
||||||
|
use a general function `f` to provide a more general definition. I called this
|
||||||
|
version of the function `update`.
|
||||||
|
|
||||||
|
{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 295 299 >}}
|
||||||
|
|
||||||
|
Just changing `insert` to `update` is not enough. It's true that calling
|
||||||
|
`update` with all keys from `m1` on `m2` would forget all keys unique to `m1`,
|
||||||
|
it would still leave behind the only-in-`m2` keys. To get rid of these, I
|
||||||
|
defined another function, `restrict`, that drops all keys in its second
|
||||||
|
argument that aren't present in its first argument.
|
||||||
|
|
||||||
|
{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 304 308 >}}
|
||||||
|
|
||||||
|
Altogether, intesection is defined as follows, where `updates` just
|
||||||
|
calls `update` for every key-value pair in its first argument.
|
||||||
|
|
||||||
|
{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 310 311 >}}
|
||||||
|
|
||||||
|
The next hurdle is all the proofs about these implementations. I will
|
||||||
|
leave the details of the proofs either as appendices or as links to
|
||||||
|
other posts on this site.
|
||||||
|
|
||||||
|
The first key property is that the insertion, union, update, and intersection operations all
|
||||||
|
preserve uniqueness of keys; the [proofs for this are here](#appendix-proof-of-uniqueness-of-keys).
|
||||||
|
The set of properties are the lattice laws for union and intersection.
|
||||||
|
The proofs of those proceed by cases; to prove that \((\sqcup)\) is
|
||||||
|
commutative, we reason that if \((k , v) \in m_1 \sqcup m_2\), then it must be
|
||||||
|
either in \(m_1\), in \(m_2\), or in both; for each of these three possible
|
||||||
|
cases, we can show that \((k , v)\) must be the same in \(m_2 \sqcup m_1\).
|
||||||
|
Things get even more tedious for proofs of associativity, since there are
|
||||||
|
7 cases to consider; I describe the strategy I used for such proofs
|
||||||
|
in my [article about the "Expression" pattern]({{< relref "agda_expr_pattern" >}})
|
||||||
|
in Agda.
|
||||||
|
|
||||||
|
### Additional Properties of Lattices
|
||||||
|
|
||||||
|
The product and map lattices are the two pulling the most weight in my
|
||||||
|
implementation of program analyses. However, there's an additional property
|
||||||
|
that they have: if the lattices they are made of have a _finite height_,
|
||||||
|
then so do products and map lattices themselves. A lattice having a finite
|
||||||
|
height means that we can only line up so many elements using the less-than
|
||||||
|
operator `<`. For instance, the natural numbers are _not_ a finite-height lattice;
|
||||||
|
we can create the infinite chain:
|
||||||
|
|
||||||
|
{{< latex >}}
|
||||||
|
0 < 1 < 2 < ...
|
||||||
|
{{< /latex >}}
|
||||||
|
|
||||||
|
On the other hand, our sign lattice _is_ of finite height; the longest chains
|
||||||
|
we can make have three elements and two `<` signs. Here's one:
|
||||||
|
|
||||||
|
{{< latex >}}
|
||||||
|
\bot < + < \top
|
||||||
|
{{< /latex >}}
|
||||||
|
|
||||||
|
As a result of this, _pairs_ of signs also have a finite height; the longest
|
||||||
|
chains we can make have five elements and four `<` signs.
|
||||||
|
{{< sidenote "right" "example-note" "An example:" >}}
|
||||||
|
Notice that the elements in the example progress the same way as the ones
|
||||||
|
in the single-sign chain. This is no accident; the longest chains in the
|
||||||
|
pair lattice can be constructed from longest chains of its element
|
||||||
|
lattices. The length of the product lattice chain, counted by the number of
|
||||||
|
"less than" signs, is the sum of the lengths of the element chains.
|
||||||
|
{{< /sidenote >}}
|
||||||
|
|
||||||
|
{{< latex >}}
|
||||||
|
(\bot, \bot) < (\bot, +) < (\bot, \top) < (+, \top) < (\top, \top)
|
||||||
|
{{< /latex >}}
|
||||||
|
|
||||||
|
The same is true for maps, under certain conditions.
|
||||||
|
|
||||||
|
The finite-height property is crucial to lattice-based static program analysis;
|
||||||
|
we'll talk about it in more detail in the next post of this series.
|
||||||
|
|
||||||
{{< todo >}}
|
{{< todo >}}
|
||||||
I started using 'join' but haven't introduced it before.
|
I started using 'join' but haven't introduced it before.
|
||||||
{{< /todo >}}
|
{{< /todo >}}
|
||||||
|
|
||||||
|
### Appendix: Proof of Uniqueness of Keys
|
||||||
|
|
||||||
|
I will provide sketches of the proofs here, and omit the implementations
|
||||||
|
of my lemmas. Click on the link in the code block headers to jump to their
|
||||||
|
implementation on my Git server.
|
||||||
|
|
||||||
|
First, note that if we're inserting a key that's already in a list, then the keys of that list are unchanged.
|
||||||
|
|
||||||
|
{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 123 124 >}}
|
||||||
|
|
||||||
|
On the other hand, if we're inserting a new key, it ends up at the end, and
|
||||||
|
the rest of the keys are unchanged.
|
||||||
|
|
||||||
|
{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 134 135 >}}
|
||||||
|
|
||||||
|
Then, for any given key-value pair, the key either is or isn't in the list we're
|
||||||
|
inserting it into. If it is, then the list ends up unchanged, and remains
|
||||||
|
unique if it was already unique. On the other hand, if it's not in the list,
|
||||||
|
then it ends up at the end; adding a new element to the end of a unique
|
||||||
|
list produces another unique list. Thus, in either case, the final keys
|
||||||
|
are unique.
|
||||||
|
|
||||||
|
{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 143 148 >}}
|
||||||
|
|
||||||
|
By induction, we can then prove that calling `insert` many times as we do
|
||||||
|
in `union` preserves uniqueness too. Here, `insert-preserves-Unique` serves
|
||||||
|
as the inductive step.
|
||||||
|
|
||||||
|
{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 164 168 >}}
|
||||||
|
|
||||||
|
For `update`, things are simple; it doesn't change the keys of the argument
|
||||||
|
list at all, since it only modifies, and doesn't add new pairs. This
|
||||||
|
is captured by the `update-keys` property:
|
||||||
|
|
||||||
|
{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 313 314 >}}
|
||||||
|
|
||||||
|
If the keys don't change, they obviously remain unique.
|
||||||
|
|
||||||
|
{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 328 330 >}}
|
||||||
|
|
||||||
|
For `restrict`, we note that it only ever removes keys; as a result, if
|
||||||
|
a key was not in the input to `restrict`, then it won't be in its output,
|
||||||
|
either.
|
||||||
|
|
||||||
|
{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 337 338 >}}
|
||||||
|
|
||||||
|
As a result, for each key of the list being restricted, we either drop it
|
||||||
|
(which does not damage uniqueness) or we keep it; since we only remove
|
||||||
|
keys, and since the keys were originally unique, the key we kept won't
|
||||||
|
conflict with any of the other final keys.
|
||||||
|
|
||||||
|
{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 345 351 >}}
|
||||||
|
|
||||||
|
Since both `update` and `restrict` preserve uniqueness, then so does
|
||||||
|
`intersect`:
|
||||||
|
|
||||||
|
{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 353 355 >}}
|
||||||
|
|
Loading…
Reference in New Issue
Block a user