Finish draft of part 2, combining lattices

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
2024-06-16 18:09:56 -07:00 · 2024-06-16 18:09:56 -07:00 · 1fb7e5ff85
commit 1fb7e5ff85
parent df75d6e017
1 changed files with 273 additions and 0 deletions
--- a/content/blog/02_spa_agda_combining_lattices.md
+++ b/content/blog/02_spa_agda_combining_lattices.md
@ -137,6 +137,7 @@ be able to put a map inside a map. This will allow us to represent the
 \(\text{Info}\) lattice, which is a map of maps.
 ### The Map Lattice
 #### The Theory
 When I say "map", what I really means is something that associates keys with
 values, like [dictionaries in Python](https://docs.python.org/3/tutorial/datastructures.html#dictionaries).
@ -234,6 +235,278 @@ those that are in one set __or__ the other. Thus, using union here fits our
 notion of how the \((\sqcup)\) operator behaves.
 {#union-as-or}
 Now, let's take a look at the \((\sqcap)\) operator. For two maps \(m_1\) and
 \(m_2\), the meet of those two maps, \(m_1 \sqcap m_2\) should be less than
 or equal to both. Our definition above requires that each key of the smaller
 map is present in the larger map; for the combination of two maps to be
 smaller than both, we must ensure that it only has keys present in both maps.
 To combine the elements from the two maps, we can use the \((\sqcap)\) operator
 on values.
 {{< latex >}}
 (m_1 \sqcap m_2)[k] = m_1[k] \sqcap m_2[k]
 {{< /latex >}}
 Turning once again to set theory, we can think of this operation like the
 extension of the intersection operator \((\cup)\) to maps. This can be
 motivated in the same way as the union operation above; the \((\sqcap)\)
 operator combines lattice elements in such away that the result represents
 both of them, and intersections of sets contain elements that are in __both__
 sets.
 Now we have the the two binary operators and the comparison function in hand.
 There's just one detail we're missing: what it means for two maps to be
 equivalent. Here, once again we take our cue from set theory: two sets are
 said to be equal when each one is a subset of the other. Mathematically, we can
 write this as follows:
 {{< latex >}}
 m_1 \approx m_2 \triangleq m_1 \subseteq m_2 \land m_1 \supseteq m_2
 {{< /latex >}}
 I might as well show you the Agda definition of this, since it's a word-for-word
 transliteration:
 {{< codelines "Agda" "agda-spa/Lattice/Map.agda" 530 531 >}}
 Okay, but we haven't actually defined what it means for one map to be a subset
 of another. My definition is as follows: if \(m_1 \subseteq m_2\), that is,
 if \(m_1\) is a subset of \(m_2\), then every key in \(m_1\) is also present
 in \(m_2\), and they are mapped to the same value. My first stab at
 a mathematical definition of this is the following:
 {{< latex >}}
 m_1 \subseteq m_2 \triangleq \forall k, v.\ (k, v) \in m_1 \Rightarrow (k, v) \in m_2
 {{< /latex >}}
 Only there's a slight complication; remember that our values themselves come
 from a lattice, and that this lattice might use its own equivalence operator
 \((\approx)\) to group similar elements. One example where this is important
 is our now-familiar "map of maps" scenario: the values store in the "outer"
 map are themselves maps, and we don't want the order of the keys or other
 menial details of the inner maps to influence whether the outer maps are equal.
 Thus, we settle for a more robust definition of \(m_1 \subseteq m_2\) 
 that allows \(m_1\) to have different-but-equivalent values from those
 in \(m_2\).
 {{< latex >}}
 m_1 \subseteq m_2 \triangleq \forall k, v.\ (k, v) \in m_1 \Rightarrow \exists v'.\ v \approx v' \land (k, v') \in m_2
 {{< /latex >}}
 In Agda, the core of my definition is once again very close:
 {{< codelines "Agda" "agda-spa/Lattice/Map.agda" 98 99 >}}
 #### The Implementation
 Now it's time to show you how I implemented the Map lattice. I chose
 represent maps using a list of key-value pairs, along with a condition
 that the keys are unique (non-repeating). I chose this definition because
 it was simple to implement, and because it makes it possible to iterate
 over the keys of a map. That last property is useful if we use the maps
 to later represent sets (which I did). Moreover, lists of key-value pairs are
 easy to serialize and write to disk. This isn't hugely important for my
 immediate static program analysis needs, but it might be nice in the future.
 The requirement that the keys are unique prevents the map from being a multi-map
 (which might have several values associated with a particular key).
 My `Map` module is parameterized by the key and value types (`A` and `B`
 respectively), and additionally requires some additional properties to
 be satisfied by these types.
 {{< codelines "Agda" "agda-spa/Lattice/Map.agda" 6 10 >}}
 For `A`, the key property is the
 [decidability](https://en.wikipedia.org/wiki/Decidability_(logic)) of
 equality: there should be a way to compare keys for equality. This is
 important for all sorts of map operations. For example, when inserting a new
 value into a map, we need to decide if the value is already present (so that
 we know to override it), but if we can't check if two values are equal, we
 can't see if it's already there.
 The values of the map (represented by `B`) we expected to be lattices, so
 we require them to provide the lattice operations \((\sqcup)\) and \((\sqcap)\),
 as well as the equivalence relation \((\approx)\) and the proof of the lattice
 properties in `isLattice`. To distinguish the lattice operations on `B`
 from the ones we'll be defining on the map itself -- you might've
 noticed that there's a bit of overleading going on in this post -- I've
 suffixed them with the subscript `2`. My convention is to use the subscript
 corresponding to the number of the type parameter. Here, `A` is "first" and `B`
 is "second", so the operators on `B` get `2`.
 From there, I define the map as a pair; the first component is the list of
 key-value pairs, and the second is the proof that all the keys in the
 list occur only once.
 {{< codelines "Agda" "agda-spa/Lattice/Map.agda" 480 481 >}}
 Now, to implement union and intersection; for the most part, the proofs deal
 just with the first component of the map -- the key-value pairs. For union,
 the key operation is "insert-or-combine". We can think of merging two maps
 as inserting all the keys from one map (arbitrary, the "left") into the
 other. If a key is not in the "left" map, insertion won't do anything to its
 prior value in the right map; similarly, if a key is not in the "right" map,
 then it should appear unchanged in the final result after insertion. Finally,
 if a key is inserted into the "right" map, but already has a value there, then
 the two values need to be combined using `_⊔₂_`. This leads to the following
 definition of `insert` on key-value pair lists:
 {{< codelines "Agda" "agda-spa/Lattice/Map.agda" 114 118 >}}
 Above, `f` is just a stand-in for `_⊓₂_` (making the definition a tiny bit more general).
 For each element in the "right" key-value list, we check if its key matches
 the one we're inserting; if it does, we have to combine the values, and
 there's no need to recurse into the rest of the list. If on the other hand
 the key doesn't match, we move on to the next element of the list. If we
 run out of elements, we know that the key we're inserting wasn't in the "right"
 map, so we insert it as-is.
 The union operation is just about inserting every pair from one map into another.
 {{< codelines "Agda" "agda-spa/Lattice/Map.agda" 120 121 >}}
 Here, I defined my own version of `foldr` which unpacks the pairs, for
 convenience:
 {{< codelines "Agda" "agda-spa/Lattice/Map.agda" 110 112 "" "**(Click here to see the definition of my `foldr`)**" >}}
 For intersection, we do something similar; however, since only elements in
 _both_ maps should be in the final output, if our "insertion" doesn't find
 an existing key, it should just fall through; this can be achieved by defining
 a version of `insert` whose base case simply throws away the input. Of course,
 this function should also use `_⊓₂_` instead of `_⊔₂_`; below, though, I again
 use a general function `f` to provide a more general definition. I called this
 version of the function `update`.
 {{< codelines "Agda" "agda-spa/Lattice/Map.agda" 295 299 >}}
 Just changing `insert` to `update` is not enough. It's true that calling
 `update` with all keys from `m1` on `m2` would forget all keys unique to `m1`,
 it would still leave behind the only-in-`m2` keys. To get rid of these, I
 defined another function, `restrict`, that drops all keys in its second
 argument that aren't present in its first argument.
 {{< codelines "Agda" "agda-spa/Lattice/Map.agda" 304 308 >}}
 Altogether, intesection is defined as follows, where `updates` just
 calls `update` for every key-value pair in its first argument.
 {{< codelines "Agda" "agda-spa/Lattice/Map.agda" 310 311 >}}
 The next hurdle is all the proofs about these implementations. I will
 leave the details of the proofs either as appendices or as links to
 other posts on this site.
 The first key property is that the insertion, union, update, and intersection operations all
 preserve uniqueness of keys; the [proofs for this are here](#appendix-proof-of-uniqueness-of-keys).
 The set of properties are the lattice laws for union and intersection.
 The proofs of those proceed by cases; to prove that \((\sqcup)\) is
 commutative, we reason that if \((k , v) \in m_1 \sqcup m_2\), then it must be
 either in \(m_1\), in \(m_2\), or in both; for each of these three possible
 cases, we can show that \((k , v)\) must be the same in \(m_2 \sqcup m_1\).
 Things get even more tedious for proofs of associativity, since there are
 7 cases to consider; I describe the strategy I used for such proofs
 in my [article about the "Expression" pattern]({{< relref "agda_expr_pattern" >}})
 in Agda.
 ### Additional Properties of Lattices
 The product and map lattices are the two pulling the most weight in my
 implementation of program analyses. However, there's an additional property
 that they have: if the lattices they are made of have a _finite height_,
 then so do products and map lattices themselves. A lattice having a finite
 height means that we can only line up so many elements using the less-than
 operator `<`. For instance, the natural numbers are _not_ a finite-height lattice;
 we can create the infinite chain:
 {{< latex >}}
 0 < 1 < 2 < ...
 {{< /latex >}}
 On the other hand, our sign lattice _is_ of finite height; the longest chains
 we can make have three elements and two `<` signs. Here's one:
 {{< latex >}}
 \bot < + < \top
 {{< /latex >}}
 As a result of this, _pairs_ of signs also have a finite height; the longest
 chains we can make have five elements and four `<` signs.
 {{< sidenote "right" "example-note" "An example:" >}}
 Notice that the elements in the example progress the same way as the ones
 in the single-sign chain. This is no accident; the longest chains in the
 pair lattice can be constructed from longest chains of its element
 lattices. The length of the product lattice chain, counted by the number of
 "less than" signs, is the sum of the lengths of the element chains.
 {{< /sidenote >}}
 {{< latex >}}
 (\bot, \bot) < (\bot, +) < (\bot, \top) < (+, \top) < (\top, \top)
 {{< /latex >}}
 The same is true for maps, under certain conditions.
 The finite-height property is crucial to lattice-based static program analysis;
 we'll talk about it in more detail in the next post of this series.
 {{< todo >}}
 I started using 'join' but haven't introduced it before.
 {{< /todo >}}
 ### Appendix: Proof of Uniqueness of Keys
 I will provide sketches of the proofs here, and omit the implementations
 of my lemmas. Click on the link in the code block headers to jump to their
 implementation on my Git server.
 First, note that if we're inserting a key that's already in a list, then the keys of that list are unchanged.
 {{< codelines "Agda" "agda-spa/Lattice/Map.agda" 123 124 >}}
 On the other hand, if we're inserting a new key, it ends up at the end, and
 the rest of the keys are unchanged.
 {{< codelines "Agda" "agda-spa/Lattice/Map.agda" 134 135 >}}
 Then, for any given key-value pair, the key either is or isn't in the list we're
 inserting it into. If it is, then the list ends up unchanged, and remains
 unique if it was already unique. On the other hand, if it's not in the list,
 then it ends up at the end; adding a new element to the end of a unique
 list produces another unique list. Thus, in either case, the final keys
 are unique.
 {{< codelines "Agda" "agda-spa/Lattice/Map.agda" 143 148 >}}
 By induction, we can then prove that calling `insert` many times as we do
 in `union` preserves uniqueness too. Here, `insert-preserves-Unique` serves
 as the inductive step.
 {{< codelines "Agda" "agda-spa/Lattice/Map.agda" 164 168 >}}
 For `update`, things are simple; it doesn't change the keys of the argument
 list at all, since it only modifies, and doesn't add new pairs. This
 is captured by the `update-keys` property:
 {{< codelines "Agda" "agda-spa/Lattice/Map.agda" 313 314 >}}
 If the keys don't change, they obviously remain unique.
 {{< codelines "Agda" "agda-spa/Lattice/Map.agda" 328 330 >}}
 For `restrict`, we note that it only ever removes keys; as a result, if
 a key was not in the input to `restrict`, then it won't be in its output,
 either.
 {{< codelines "Agda" "agda-spa/Lattice/Map.agda" 337 338 >}}
 As a result, for each key of the list being restricted, we either drop it
 (which does not damage uniqueness) or we keep it; since we only remove
 keys, and since the keys were originally unique, the key we kept won't
 conflict with any of the other final keys.
 {{< codelines "Agda" "agda-spa/Lattice/Map.agda" 345 351 >}}
 Since both `update` and `restrict` preserve uniqueness, then so does
 `intersect`:
 {{< codelines "Agda" "agda-spa/Lattice/Map.agda" 353 355 >}}