231 lines
12 KiB
Markdown
231 lines
12 KiB
Markdown
---
|
||
title: "The \"Deeply Embedded Expression\" Trick in Agda"
|
||
date: 2024-03-11T14:25:52-07:00
|
||
tags: ["Agda"]
|
||
description: "In this post, I talk about a trick I developed to simplify certain Agda proofs."
|
||
---
|
||
|
||
I've been working on a relatively large Agda project for a few months now,
|
||
and I'd like to think that I've become quite proficient. Recently, I came
|
||
up with a little trick to help simplify some of my proofs, and it seems like
|
||
this trick might have broader applications.
|
||
|
||
In my head, I call this trick 'Deeply Embedded Expressions'. Before I introduce
|
||
it, let me explain the part of my work that motivated developing the trick.
|
||
|
||
### Proofs about Map Operations
|
||
|
||
A part of my Agda project is the formalization of simple key-value maps.
|
||
I model key-value maps as lists of key-value pairs. On top of this, I implement
|
||
two operations: `join` and `meet`, which in my code are denoted using `⊔` and `⊓`.
|
||
When "joining" two maps, you create a new map that has the keys from both input ones.
|
||
If a key is only present in one of the input maps, then the new "joined" map
|
||
has the same value for that key as the original. On the other hand, if the key
|
||
is present in both maps, then its value in the new map is the result of "joining"
|
||
the original values. The "meet" operation is similar, except instead of taking
|
||
keys from either map, the result only has keys that were present in both maps,
|
||
"meeting" their values. In a way, "join" and "meet" are similar to set union
|
||
and intersection --- but they also operate on the values in the map.
|
||
|
||
Given these operations, I need to prove certain properties of these operation.
|
||
The most inconvenient to prove is probably associativity:
|
||
|
||
{{< codelines "agda" "agda-spa/Lattice/Map.agda" 752 752 >}}
|
||
|
||
This property is, in turn, proven using two 'subset' relations on maps, defined
|
||
in the usual way.
|
||
|
||
{{< codelines "agda" "agda-spa/Lattice/Map.agda" 755 755 >}}
|
||
{{< codelines "agda" "agda-spa/Lattice/Map.agda" 774 774 >}}
|
||
|
||
The reason this property is so inconvenient to prove is that there are a
|
||
lot of cases to consider. That's because your claim, in words, is something
|
||
like:
|
||
|
||
> Suppose a key-value pair `k , v` is present in `(m₁ ⊔ m₂) ⊔ m₃`. Show
|
||
> that `k , v` is also in `m₁ ⊔ (m₂ ⊔ m₃)`.
|
||
|
||
The only thing you can really do with `k , v` is figure out how it got into
|
||
the three-way union map: did it come from `m₁`, `m₂`, or `m₃`, or perhaps
|
||
several of them? The essence of the proof boils down to repeated uses
|
||
of the fact that for a key to be in the union, it must be in at least one
|
||
of the two maps. You end up with witnesses, repeated application of the same
|
||
lemmas, lots of `let`-expressions or `where` clauses. It's relatively tedious
|
||
and, what's more frustrating, __driven entirely by the structure of the
|
||
map operations__. It seems like one shouldn't have to mimic that structure
|
||
using boilerplate lemmas. So I started looking at other ways.
|
||
|
||
### Case Analysis using GADTs
|
||
|
||
A "proof by cases" in a dependently typed language like Agda usually brings
|
||
to mind pattern matching. So, here's an idea: what if for each expression
|
||
involving `⊔` and `⊓`, we had some kind of data type, and that data type
|
||
had exactly as many inhabitants as there are cases to analyze? A data type
|
||
corresponding to `m₁ ⊔ m₂` might have three cases, and the one for
|
||
`(m₁ ⊔ m₂) ⊔ m₃` might have seven. Each case would contain the information
|
||
necessary to perform the proof.
|
||
|
||
A data type whose "shape" depends on an expression in the way I described above
|
||
is said to be _indexed by_ that expression. In Agda, GADTs are used to create
|
||
indexed types. My initial attempt was something like this:
|
||
|
||
```Agda
|
||
data Provenance (k : A) : B → Map → Set (a ⊔ℓ b) where
|
||
single : ∀ {v : B} {m : Map} → (k , v) ∈ m → Provenance k v m
|
||
in₁ : ∀ {v : B} {m₁ m₂ : Expr} → Provenance k v e₁ → ¬ k ∈k m₂ → Provenance k v (e₁ ⊔ e₂)
|
||
in₂ : ∀ {v : B} {m₁ m₂ : Expr} → ¬ k ∈k m₁ → Provenance k v m₂ → Provenance k v (e₁ ⊔ e₂)
|
||
bothᵘ : ∀ {v₁ v₂ : B} {m₁ m₂ : Expr} → Provenance k v₁ m₁ → Provenance k v₂ m₂ → Provenance k (v₁ ⊔ v₂) (e₁ ⊔ e₂)
|
||
bothⁱ : ∀ {v₁ v₂ : B} {m₁ m₂ : Expr} → Provenance k v₁ m₁ → Provenance k v₂ m₂ → Provenance k (v₁ ⊓ v₂) (e₁ ⊓ e₂)
|
||
```
|
||
|
||
I was planning on a proof of associativity (in one direction) that looked
|
||
something like the following --- pattern matching on cases from the new
|
||
`Provenance` type.
|
||
|
||
```Agda
|
||
⊔-assoc₁ : ((m₁ ⊔ m₂) ⊔ m₃) ⊆ (m₁ ⊔ (m₂ ⊔ m₃))
|
||
⊔-assoc₁ k v k,v∈m₁₂m₃
|
||
with get-Provenance k,v∈m₁₂m₃
|
||
... | in₂ k∉km₁₂ (single v∈m₃) = ...
|
||
... | in₁ (in₂ k∉km₁ (single v∈m₂)) k∉km₃ = ...
|
||
... | bothᵘ (in₂ k∉km₁ (single {v₂} v₂∈m₂)) (single {v₃} v₃∈m₃) = ...
|
||
... | in₁ (in₁ (single v₁∈m₁) k∉km₂) k∉km₃ = ...
|
||
... | bothᵘ (in₁ (single {v₁} v₁∈m₁) k∉km₂) (single {v₃} v₃∈m₃) = ...
|
||
... | in₁ (bothᵘ (single {v₁} v₁∈m₁) (single {v₂} v₂∈m₂)) k∉ke₃ = ...
|
||
... | bothᵘ (bothᵘ (single {v₁} v₁∈m₁) (single {v₂} v₂∈m₂)) (single {v₃} v₃∈m₃) = ...
|
||
```
|
||
|
||
However, this doesn't work. Agda has trouble figuring out which cases of
|
||
the `Provenance` GADT are allowed, in which aren't. Is `m₁ ⊔ m` a single map,
|
||
fit for the `single` case, or should it be broken up into more cases like
|
||
`in₁` and `in₂`? In general, is some expression of type `Map` the "bottom"
|
||
of our recursion, or should it be analyzed further?
|
||
|
||
The above hints at what's wrong. The mistake here is requiring Agda to infer
|
||
the shape of our "join" and "meet" expressions from arbitrary terms.
|
||
The set of expressions that we want to reason about is much more restricted --
|
||
each expression will always be of three components: "meet", "join", and base-case maps being
|
||
combined using these operations.
|
||
|
||
### Defining an Expression Data Type
|
||
If you're like me, and have spent years of your life around programming language
|
||
theory and domain specific languages (DSLs), the last sentence of the previous
|
||
section may be ringing a bell. In fact, it's eerily similar to how we describe
|
||
recursive grammars:
|
||
|
||
> An expression of interest is either,
|
||
> * A map
|
||
> * The "join" of two expressions
|
||
> * The "meet" of two expressions
|
||
|
||
Mathematically, we might write this as follows:
|
||
|
||
{{< latex >}}
|
||
\begin{array}{rcll}
|
||
e & ::= & m & \text{(maps)} \\
|
||
& | & e \sqcup e & \text{(join)} \\
|
||
& | & e \sqcap e & \text{(meet)}
|
||
\end{array}
|
||
{{< /latex >}}
|
||
|
||
And in Agda,
|
||
|
||
{{< codelines "agda" "agda-spa/Lattice/Map.agda" 543 546 >}}
|
||
|
||
In the code, I used the set union and intersection operators
|
||
to avoid overloading the `⊔` and `⊓` more than they already are.
|
||
|
||
We have just defined a very small expression language. In computer science,
|
||
a language is called _deeply embedded_ if a data type (or class hierarchy, or
|
||
other 'explicit' representation) is defined for its syntax in the _host_
|
||
language (Agda, in our case). This is in contrast to a _shallow embedding_, in
|
||
which expressions in the (new) language are just expressions in the host
|
||
language.
|
||
|
||
In this sense, our `Expr` is deeply embedded --- we defined new container for it,
|
||
and `_∪_` is a distinct entity from `_⊔_`. Our first attempt was a shallow
|
||
embedding. That fell through because the Agda language is much broader than
|
||
our expression language, which makes case analysis very difficult.
|
||
|
||
An obvious thing to do with an expression is to evaluate it. This will be
|
||
important for our proofs, because it will establish a connection between
|
||
expressions (created via `Expr`) and actual Agda objects that we need to
|
||
reason about at the end of the day. The notation \(\llbracket e \rrbracket\)
|
||
is commonly used in PL circles for evaluation (it comes from
|
||
[Denotational Semantics](https://en.wikipedia.org/wiki/Denotational_semantics)).
|
||
Thus, my Agda evaluation function is written as follows:
|
||
|
||
{{< codelines "agda" "agda-spa/Lattice/Map.agda" 586 589 >}}
|
||
|
||
On top of this, here is my actual implementation of the `Provenance` data type.
|
||
This time, it's indexed by expressions in `Expr`, which makes it much easier to
|
||
pattern match on instances:
|
||
|
||
{{< codelines "agda" "agda-spa/Lattice/Map.agda" 591 596 >}}
|
||
|
||
Note that we have to use the evaluation function to be able to use
|
||
operators such as `∈`. That's because these are still defined on maps,
|
||
and not expressions.
|
||
|
||
With this, I was able to write my proof in the way that I had hoped. It has
|
||
the exact form of my previous sketch-of-proof.
|
||
|
||
{{< codelines "agda" "agda-spa/Lattice/Map.agda" 755 773 "" "**(click here to see the full example, including each case's implementation)**" >}}
|
||
|
||
### The General Trick
|
||
|
||
So far, I've presented a problem I faced in my Agda proof and a solution for
|
||
that problem. However, it may not be clear how useful the trick is beyond
|
||
this narrow case that I've encountered. The way I see it, the "deeply embedded
|
||
expression" trick is applicable whenever you have data that is constructed
|
||
from some fixed set of cases, and when proofs about that data need to follow
|
||
the structure of these cases. Thus, examples include:
|
||
|
||
* **Proofs about the origin of keys in a map (this one):** the "data" is the
|
||
key-value map that is being analyzed. The enumeration of cases for this
|
||
map is driven by the structure of the "join" and "meet" operations used
|
||
to build the map.
|
||
* **Automatic derivation of function properties:** suppose you're interested
|
||
in working with continuous functions. You also know that the addition,
|
||
subtraction, and multiplication of two functions preserves continuity.
|
||
Of course, the constant function \(x \mapsto c\) and the identity function
|
||
\(x \mapsto x\) are continuous too. You may define an expression data type
|
||
that has cases for these operations. Then, your evaluation function could
|
||
transform the expression into a plain function, and a proof on the
|
||
structure of the expression can be used to verify the resulting function's
|
||
continuity.
|
||
* **Proof search for algebraic expressions:** suppose that you wanted to
|
||
automatically find solutions for certain algebraic (in)equalities. Instead
|
||
of using some sort of reflection mechanism to inspect terms and determine
|
||
how constraints should be solved, you might represent the set of operations
|
||
in you equation system as cases in a data type. You can then use regular
|
||
Agda code to manipulate terms; an evaluation function can then be used
|
||
to recover the equations in Agda, together with witnesses justifying the
|
||
solution.
|
||
|
||
There are some pretty clear commonalities about examples above, which
|
||
are the ingredients to this trick:
|
||
|
||
* __The expression:__ you create a new expression data type that encodes all
|
||
the operations (and bases cases) on your data. In my example, this is
|
||
the `Expr` data type.
|
||
* __The evaluation function__: you provide a way to lower the expression
|
||
you've defined back into a regular Agda term. This connects your (abstract)
|
||
operations to their interpretation in Agda. In my example, this is the
|
||
`⟦_⟧` function.
|
||
* __The proofs__: you write proofs that consider only the fixed set of cases
|
||
encoded by the data type (`Expr`), but state properties about the
|
||
_evaluated_ expression. In my example, this is `Provenance` and
|
||
the `Expr-Provenance` function. Specifically, the `Provenance` data type connects
|
||
expressions and the terms they evaluate to, because it is indexed by expressions,
|
||
but contains data in the form `k ∈k ⟦ e₂ ⟧`.
|
||
|
||
### Conclusion
|
||
I'll be the first to admit that this trick is quite situational, and may
|
||
not be as far-reaching as the ["Is Something" pattern]({{< relref "agda_is_pattern" >}})
|
||
I wrote about before, which seems to occur far more in the wild. However, there
|
||
have now been two times when I personally reached for this trick, which seems
|
||
to suggest that it may be useful to someone else.
|
||
|
||
I hope you've found this useful. Happy (dependently typed) programming!
|