Finish draft of part 2, combining lattices

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
2024-06-16 18:09:56 -07:00
parent df75d6e017
commit 1fb7e5ff85
1 changed files with 273 additions and 0 deletions
--- a/content/blog/02_spa_agda_combining_lattices.md
+++ b/content/blog/02_spa_agda_combining_lattices.md
@@ -137,6 +137,7 @@ be able to put a map inside a map. This will allow us to represent the
 \(\text{Info}\) lattice, which is a map of maps.

 ### The Map Lattice
+#### The Theory

 When I say "map", what I really means is something that associates keys with
 values, like [dictionaries in Python](https://docs.python.org/3/tutorial/datastructures.html#dictionaries).
@@ -234,6 +235,278 @@ those that are in one set __or__ the other. Thus, using union here fits our
 notion of how the \((\sqcup)\) operator behaves.
 {#union-as-or}

+Now, let's take a look at the \((\sqcap)\) operator. For two maps \(m_1\) and
+\(m_2\), the meet of those two maps, \(m_1 \sqcap m_2\) should be less than
+or equal to both. Our definition above requires that each key of the smaller
+map is present in the larger map; for the combination of two maps to be
+smaller than both, we must ensure that it only has keys present in both maps.
+To combine the elements from the two maps, we can use the \((\sqcap)\) operator
+on values.
+
+{{< latex >}}
+(m_1 \sqcap m_2)[k] = m_1[k] \sqcap m_2[k]
+{{< /latex >}}
+
+Turning once again to set theory, we can think of this operation like the
+extension of the intersection operator \((\cup)\) to maps. This can be
+motivated in the same way as the union operation above; the \((\sqcap)\)
+operator combines lattice elements in such away that the result represents
+both of them, and intersections of sets contain elements that are in __both__
+sets.
+
+Now we have the the two binary operators and the comparison function in hand.
+There's just one detail we're missing: what it means for two maps to be
+equivalent. Here, once again we take our cue from set theory: two sets are
+said to be equal when each one is a subset of the other. Mathematically, we can
+write this as follows:
+
+{{< latex >}}
+m_1 \approx m_2 \triangleq m_1 \subseteq m_2 \land m_1 \supseteq m_2
+{{< /latex >}}
+
+I might as well show you the Agda definition of this, since it's a word-for-word
+transliteration:
+
+{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 530 531 >}}
+
+Okay, but we haven't actually defined what it means for one map to be a subset
+of another. My definition is as follows: if \(m_1 \subseteq m_2\), that is,
+if \(m_1\) is a subset of \(m_2\), then every key in \(m_1\) is also present
+in \(m_2\), and they are mapped to the same value. My first stab at
+a mathematical definition of this is the following:
+
+{{< latex >}}
+m_1 \subseteq m_2 \triangleq \forall k, v.\ (k, v) \in m_1 \Rightarrow (k, v) \in m_2
+{{< /latex >}}
+
+Only there's a slight complication; remember that our values themselves come
+from a lattice, and that this lattice might use its own equivalence operator
+\((\approx)\) to group similar elements. One example where this is important
+is our now-familiar "map of maps" scenario: the values store in the "outer"
+map are themselves maps, and we don't want the order of the keys or other
+menial details of the inner maps to influence whether the outer maps are equal.
+Thus, we settle for a more robust definition of \(m_1 \subseteq m_2\) 
+that allows \(m_1\) to have different-but-equivalent values from those
+in \(m_2\).
+
+{{< latex >}}
+m_1 \subseteq m_2 \triangleq \forall k, v.\ (k, v) \in m_1 \Rightarrow \exists v'.\ v \approx v' \land (k, v') \in m_2
+{{< /latex >}}
+
+In Agda, the core of my definition is once again very close:
+
+{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 98 99 >}}
+
+#### The Implementation
+
+Now it's time to show you how I implemented the Map lattice. I chose
+represent maps using a list of key-value pairs, along with a condition
+that the keys are unique (non-repeating). I chose this definition because
+it was simple to implement, and because it makes it possible to iterate
+over the keys of a map. That last property is useful if we use the maps
+to later represent sets (which I did). Moreover, lists of key-value pairs are
+easy to serialize and write to disk. This isn't hugely important for my
+immediate static program analysis needs, but it might be nice in the future.
+The requirement that the keys are unique prevents the map from being a multi-map
+(which might have several values associated with a particular key).
+
+My `Map` module is parameterized by the key and value types (`A` and `B`
+respectively), and additionally requires some additional properties to
+be satisfied by these types.
+
+{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 6 10 >}}
+
+For `A`, the key property is the
+[decidability](https://en.wikipedia.org/wiki/Decidability_(logic)) of
+equality: there should be a way to compare keys for equality. This is
+important for all sorts of map operations. For example, when inserting a new
+value into a map, we need to decide if the value is already present (so that
+we know to override it), but if we can't check if two values are equal, we
+can't see if it's already there.
+
+The values of the map (represented by `B`) we expected to be lattices, so
+we require them to provide the lattice operations \((\sqcup)\) and \((\sqcap)\),
+as well as the equivalence relation \((\approx)\) and the proof of the lattice
+properties in `isLattice`. To distinguish the lattice operations on `B`
+from the ones we'll be defining on the map itself -- you might've
+noticed that there's a bit of overleading going on in this post -- I've
+suffixed them with the subscript `2`. My convention is to use the subscript
+corresponding to the number of the type parameter. Here, `A` is "first" and `B`
+is "second", so the operators on `B` get `2`.
+
+From there, I define the map as a pair; the first component is the list of
+key-value pairs, and the second is the proof that all the keys in the
+list occur only once.
+
+{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 480 481 >}}
+
+Now, to implement union and intersection; for the most part, the proofs deal
+just with the first component of the map -- the key-value pairs. For union,
+the key operation is "insert-or-combine". We can think of merging two maps
+as inserting all the keys from one map (arbitrary, the "left") into the
+other. If a key is not in the "left" map, insertion won't do anything to its
+prior value in the right map; similarly, if a key is not in the "right" map,
+then it should appear unchanged in the final result after insertion. Finally,
+if a key is inserted into the "right" map, but already has a value there, then
+the two values need to be combined using `_⊔₂_`. This leads to the following
+definition of `insert` on key-value pair lists:
+
+{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 114 118 >}}
+
+Above, `f` is just a stand-in for `_⊓₂_` (making the definition a tiny bit more general).
+For each element in the "right" key-value list, we check if its key matches
+the one we're inserting; if it does, we have to combine the values, and
+there's no need to recurse into the rest of the list. If on the other hand
+the key doesn't match, we move on to the next element of the list. If we
+run out of elements, we know that the key we're inserting wasn't in the "right"
+map, so we insert it as-is.
+
+The union operation is just about inserting every pair from one map into another.
+
+{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 120 121 >}}
+
+Here, I defined my own version of `foldr` which unpacks the pairs, for
+convenience:
+
+{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 110 112 "" "**(Click here to see the definition of my `foldr`)**" >}}
+
+For intersection, we do something similar; however, since only elements in
+_both_ maps should be in the final output, if our "insertion" doesn't find
+an existing key, it should just fall through; this can be achieved by defining
+a version of `insert` whose base case simply throws away the input. Of course,
+this function should also use `_⊓₂_` instead of `_⊔₂_`; below, though, I again
+use a general function `f` to provide a more general definition. I called this
+version of the function `update`.
+
+{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 295 299 >}}
+
+Just changing `insert` to `update` is not enough. It's true that calling
+`update` with all keys from `m1` on `m2` would forget all keys unique to `m1`,
+it would still leave behind the only-in-`m2` keys. To get rid of these, I
+defined another function, `restrict`, that drops all keys in its second
+argument that aren't present in its first argument.
+
+{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 304 308 >}}
+
+Altogether, intesection is defined as follows, where `updates` just
+calls `update` for every key-value pair in its first argument.
+
+{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 310 311 >}}
+
+The next hurdle is all the proofs about these implementations. I will
+leave the details of the proofs either as appendices or as links to
+other posts on this site.
+
+The first key property is that the insertion, union, update, and intersection operations all
+preserve uniqueness of keys; the [proofs for this are here](#appendix-proof-of-uniqueness-of-keys).
+The set of properties are the lattice laws for union and intersection.
+The proofs of those proceed by cases; to prove that \((\sqcup)\) is
+commutative, we reason that if \((k , v) \in m_1 \sqcup m_2\), then it must be
+either in \(m_1\), in \(m_2\), or in both; for each of these three possible
+cases, we can show that \((k , v)\) must be the same in \(m_2 \sqcup m_1\).
+Things get even more tedious for proofs of associativity, since there are
+7 cases to consider; I describe the strategy I used for such proofs
+in my [article about the "Expression" pattern]({{< relref "agda_expr_pattern" >}})
+in Agda.
+
+### Additional Properties of Lattices
+
+The product and map lattices are the two pulling the most weight in my
+implementation of program analyses. However, there's an additional property
+that they have: if the lattices they are made of have a _finite height_,
+then so do products and map lattices themselves. A lattice having a finite
+height means that we can only line up so many elements using the less-than
+operator `<`. For instance, the natural numbers are _not_ a finite-height lattice;
+we can create the infinite chain:
+
+{{< latex >}}
+0 < 1 < 2 < ...
+{{< /latex >}}
+
+On the other hand, our sign lattice _is_ of finite height; the longest chains
+we can make have three elements and two `<` signs. Here's one:
+
+{{< latex >}}
+\bot < + < \top
+{{< /latex >}}
+
+As a result of this, _pairs_ of signs also have a finite height; the longest
+chains we can make have five elements and four `<` signs.
+{{< sidenote "right" "example-note" "An example:" >}}
+Notice that the elements in the example progress the same way as the ones
+in the single-sign chain. This is no accident; the longest chains in the
+pair lattice can be constructed from longest chains of its element
+lattices. The length of the product lattice chain, counted by the number of
+"less than" signs, is the sum of the lengths of the element chains.
+{{< /sidenote >}}
+
+{{< latex >}}
+(\bot, \bot) < (\bot, +) < (\bot, \top) < (+, \top) < (\top, \top)
+{{< /latex >}}
+
+The same is true for maps, under certain conditions.
+
+The finite-height property is crucial to lattice-based static program analysis;
+we'll talk about it in more detail in the next post of this series.
+
 {{< todo >}}
 I started using 'join' but haven't introduced it before.
 {{< /todo >}}
+
+### Appendix: Proof of Uniqueness of Keys
+
+I will provide sketches of the proofs here, and omit the implementations
+of my lemmas. Click on the link in the code block headers to jump to their
+implementation on my Git server.
+
+First, note that if we're inserting a key that's already in a list, then the keys of that list are unchanged.
+
+{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 123 124 >}}
+
+On the other hand, if we're inserting a new key, it ends up at the end, and
+the rest of the keys are unchanged.
+
+{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 134 135 >}}
+
+Then, for any given key-value pair, the key either is or isn't in the list we're
+inserting it into. If it is, then the list ends up unchanged, and remains
+unique if it was already unique. On the other hand, if it's not in the list,
+then it ends up at the end; adding a new element to the end of a unique
+list produces another unique list. Thus, in either case, the final keys
+are unique.
+
+{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 143 148 >}}
+
+By induction, we can then prove that calling `insert` many times as we do
+in `union` preserves uniqueness too. Here, `insert-preserves-Unique` serves
+as the inductive step.
+
+{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 164 168 >}}
+
+For `update`, things are simple; it doesn't change the keys of the argument
+list at all, since it only modifies, and doesn't add new pairs. This
+is captured by the `update-keys` property:
+
+{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 313 314 >}}
+
+If the keys don't change, they obviously remain unique.
+
+{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 328 330 >}}
+
+For `restrict`, we note that it only ever removes keys; as a result, if
+a key was not in the input to `restrict`, then it won't be in its output,
+either.
+
+{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 337 338 >}}
+
+As a result, for each key of the list being restricted, we either drop it
+(which does not damage uniqueness) or we keep it; since we only remove
+keys, and since the keys were originally unique, the key we kept won't
+conflict with any of the other final keys.
+
+{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 345 351 >}}
+
+Since both `update` and `restrict` preserve uniqueness, then so does
+`intersect`:
+
+{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 353 355 >}}