diff --git a/content/blog/02_spa_agda_combining_lattices.md b/content/blog/02_spa_agda_combining_lattices.md index ba6193c..0be1311 100644 --- a/content/blog/02_spa_agda_combining_lattices.md +++ b/content/blog/02_spa_agda_combining_lattices.md @@ -137,6 +137,7 @@ be able to put a map inside a map. This will allow us to represent the \(\text{Info}\) lattice, which is a map of maps. ### The Map Lattice +#### The Theory When I say "map", what I really means is something that associates keys with values, like [dictionaries in Python](https://docs.python.org/3/tutorial/datastructures.html#dictionaries). @@ -234,6 +235,278 @@ those that are in one set __or__ the other. Thus, using union here fits our notion of how the \((\sqcup)\) operator behaves. {#union-as-or} +Now, let's take a look at the \((\sqcap)\) operator. For two maps \(m_1\) and +\(m_2\), the meet of those two maps, \(m_1 \sqcap m_2\) should be less than +or equal to both. Our definition above requires that each key of the smaller +map is present in the larger map; for the combination of two maps to be +smaller than both, we must ensure that it only has keys present in both maps. +To combine the elements from the two maps, we can use the \((\sqcap)\) operator +on values. + +{{< latex >}} +(m_1 \sqcap m_2)[k] = m_1[k] \sqcap m_2[k] +{{< /latex >}} + +Turning once again to set theory, we can think of this operation like the +extension of the intersection operator \((\cup)\) to maps. This can be +motivated in the same way as the union operation above; the \((\sqcap)\) +operator combines lattice elements in such away that the result represents +both of them, and intersections of sets contain elements that are in __both__ +sets. + +Now we have the the two binary operators and the comparison function in hand. +There's just one detail we're missing: what it means for two maps to be +equivalent. Here, once again we take our cue from set theory: two sets are +said to be equal when each one is a subset of the other. Mathematically, we can +write this as follows: + +{{< latex >}} +m_1 \approx m_2 \triangleq m_1 \subseteq m_2 \land m_1 \supseteq m_2 +{{< /latex >}} + +I might as well show you the Agda definition of this, since it's a word-for-word +transliteration: + +{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 530 531 >}} + +Okay, but we haven't actually defined what it means for one map to be a subset +of another. My definition is as follows: if \(m_1 \subseteq m_2\), that is, +if \(m_1\) is a subset of \(m_2\), then every key in \(m_1\) is also present +in \(m_2\), and they are mapped to the same value. My first stab at +a mathematical definition of this is the following: + +{{< latex >}} +m_1 \subseteq m_2 \triangleq \forall k, v.\ (k, v) \in m_1 \Rightarrow (k, v) \in m_2 +{{< /latex >}} + +Only there's a slight complication; remember that our values themselves come +from a lattice, and that this lattice might use its own equivalence operator +\((\approx)\) to group similar elements. One example where this is important +is our now-familiar "map of maps" scenario: the values store in the "outer" +map are themselves maps, and we don't want the order of the keys or other +menial details of the inner maps to influence whether the outer maps are equal. +Thus, we settle for a more robust definition of \(m_1 \subseteq m_2\) +that allows \(m_1\) to have different-but-equivalent values from those +in \(m_2\). + +{{< latex >}} +m_1 \subseteq m_2 \triangleq \forall k, v.\ (k, v) \in m_1 \Rightarrow \exists v'.\ v \approx v' \land (k, v') \in m_2 +{{< /latex >}} + +In Agda, the core of my definition is once again very close: + +{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 98 99 >}} + +#### The Implementation + +Now it's time to show you how I implemented the Map lattice. I chose +represent maps using a list of key-value pairs, along with a condition +that the keys are unique (non-repeating). I chose this definition because +it was simple to implement, and because it makes it possible to iterate +over the keys of a map. That last property is useful if we use the maps +to later represent sets (which I did). Moreover, lists of key-value pairs are +easy to serialize and write to disk. This isn't hugely important for my +immediate static program analysis needs, but it might be nice in the future. +The requirement that the keys are unique prevents the map from being a multi-map +(which might have several values associated with a particular key). + +My `Map` module is parameterized by the key and value types (`A` and `B` +respectively), and additionally requires some additional properties to +be satisfied by these types. + +{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 6 10 >}} + +For `A`, the key property is the +[decidability](https://en.wikipedia.org/wiki/Decidability_(logic)) of +equality: there should be a way to compare keys for equality. This is +important for all sorts of map operations. For example, when inserting a new +value into a map, we need to decide if the value is already present (so that +we know to override it), but if we can't check if two values are equal, we +can't see if it's already there. + +The values of the map (represented by `B`) we expected to be lattices, so +we require them to provide the lattice operations \((\sqcup)\) and \((\sqcap)\), +as well as the equivalence relation \((\approx)\) and the proof of the lattice +properties in `isLattice`. To distinguish the lattice operations on `B` +from the ones we'll be defining on the map itself -- you might've +noticed that there's a bit of overleading going on in this post -- I've +suffixed them with the subscript `2`. My convention is to use the subscript +corresponding to the number of the type parameter. Here, `A` is "first" and `B` +is "second", so the operators on `B` get `2`. + +From there, I define the map as a pair; the first component is the list of +key-value pairs, and the second is the proof that all the keys in the +list occur only once. + +{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 480 481 >}} + +Now, to implement union and intersection; for the most part, the proofs deal +just with the first component of the map -- the key-value pairs. For union, +the key operation is "insert-or-combine". We can think of merging two maps +as inserting all the keys from one map (arbitrary, the "left") into the +other. If a key is not in the "left" map, insertion won't do anything to its +prior value in the right map; similarly, if a key is not in the "right" map, +then it should appear unchanged in the final result after insertion. Finally, +if a key is inserted into the "right" map, but already has a value there, then +the two values need to be combined using `_⊔₂_`. This leads to the following +definition of `insert` on key-value pair lists: + +{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 114 118 >}} + +Above, `f` is just a stand-in for `_⊓₂_` (making the definition a tiny bit more general). +For each element in the "right" key-value list, we check if its key matches +the one we're inserting; if it does, we have to combine the values, and +there's no need to recurse into the rest of the list. If on the other hand +the key doesn't match, we move on to the next element of the list. If we +run out of elements, we know that the key we're inserting wasn't in the "right" +map, so we insert it as-is. + +The union operation is just about inserting every pair from one map into another. + +{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 120 121 >}} + +Here, I defined my own version of `foldr` which unpacks the pairs, for +convenience: + +{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 110 112 "" "**(Click here to see the definition of my `foldr`)**" >}} + +For intersection, we do something similar; however, since only elements in +_both_ maps should be in the final output, if our "insertion" doesn't find +an existing key, it should just fall through; this can be achieved by defining +a version of `insert` whose base case simply throws away the input. Of course, +this function should also use `_⊓₂_` instead of `_⊔₂_`; below, though, I again +use a general function `f` to provide a more general definition. I called this +version of the function `update`. + +{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 295 299 >}} + +Just changing `insert` to `update` is not enough. It's true that calling +`update` with all keys from `m1` on `m2` would forget all keys unique to `m1`, +it would still leave behind the only-in-`m2` keys. To get rid of these, I +defined another function, `restrict`, that drops all keys in its second +argument that aren't present in its first argument. + +{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 304 308 >}} + +Altogether, intesection is defined as follows, where `updates` just +calls `update` for every key-value pair in its first argument. + +{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 310 311 >}} + +The next hurdle is all the proofs about these implementations. I will +leave the details of the proofs either as appendices or as links to +other posts on this site. + +The first key property is that the insertion, union, update, and intersection operations all +preserve uniqueness of keys; the [proofs for this are here](#appendix-proof-of-uniqueness-of-keys). +The set of properties are the lattice laws for union and intersection. +The proofs of those proceed by cases; to prove that \((\sqcup)\) is +commutative, we reason that if \((k , v) \in m_1 \sqcup m_2\), then it must be +either in \(m_1\), in \(m_2\), or in both; for each of these three possible +cases, we can show that \((k , v)\) must be the same in \(m_2 \sqcup m_1\). +Things get even more tedious for proofs of associativity, since there are +7 cases to consider; I describe the strategy I used for such proofs +in my [article about the "Expression" pattern]({{< relref "agda_expr_pattern" >}}) +in Agda. + +### Additional Properties of Lattices + +The product and map lattices are the two pulling the most weight in my +implementation of program analyses. However, there's an additional property +that they have: if the lattices they are made of have a _finite height_, +then so do products and map lattices themselves. A lattice having a finite +height means that we can only line up so many elements using the less-than +operator `<`. For instance, the natural numbers are _not_ a finite-height lattice; +we can create the infinite chain: + +{{< latex >}} +0 < 1 < 2 < ... +{{< /latex >}} + +On the other hand, our sign lattice _is_ of finite height; the longest chains +we can make have three elements and two `<` signs. Here's one: + +{{< latex >}} +\bot < + < \top +{{< /latex >}} + +As a result of this, _pairs_ of signs also have a finite height; the longest +chains we can make have five elements and four `<` signs. +{{< sidenote "right" "example-note" "An example:" >}} +Notice that the elements in the example progress the same way as the ones +in the single-sign chain. This is no accident; the longest chains in the +pair lattice can be constructed from longest chains of its element +lattices. The length of the product lattice chain, counted by the number of +"less than" signs, is the sum of the lengths of the element chains. +{{< /sidenote >}} + +{{< latex >}} +(\bot, \bot) < (\bot, +) < (\bot, \top) < (+, \top) < (\top, \top) +{{< /latex >}} + +The same is true for maps, under certain conditions. + +The finite-height property is crucial to lattice-based static program analysis; +we'll talk about it in more detail in the next post of this series. + {{< todo >}} I started using 'join' but haven't introduced it before. {{< /todo >}} + +### Appendix: Proof of Uniqueness of Keys + +I will provide sketches of the proofs here, and omit the implementations +of my lemmas. Click on the link in the code block headers to jump to their +implementation on my Git server. + +First, note that if we're inserting a key that's already in a list, then the keys of that list are unchanged. + +{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 123 124 >}} + +On the other hand, if we're inserting a new key, it ends up at the end, and +the rest of the keys are unchanged. + +{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 134 135 >}} + +Then, for any given key-value pair, the key either is or isn't in the list we're +inserting it into. If it is, then the list ends up unchanged, and remains +unique if it was already unique. On the other hand, if it's not in the list, +then it ends up at the end; adding a new element to the end of a unique +list produces another unique list. Thus, in either case, the final keys +are unique. + +{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 143 148 >}} + +By induction, we can then prove that calling `insert` many times as we do +in `union` preserves uniqueness too. Here, `insert-preserves-Unique` serves +as the inductive step. + +{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 164 168 >}} + +For `update`, things are simple; it doesn't change the keys of the argument +list at all, since it only modifies, and doesn't add new pairs. This +is captured by the `update-keys` property: + +{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 313 314 >}} + +If the keys don't change, they obviously remain unique. + +{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 328 330 >}} + +For `restrict`, we note that it only ever removes keys; as a result, if +a key was not in the input to `restrict`, then it won't be in its output, +either. + +{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 337 338 >}} + +As a result, for each key of the list being restricted, we either drop it +(which does not damage uniqueness) or we keep it; since we only remove +keys, and since the keys were originally unique, the key we kept won't +conflict with any of the other final keys. + +{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 345 351 >}} + +Since both `update` and `restrict` preserve uniqueness, then so does +`intersect`: + +{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 353 355 >}}