Add Coq advent of code post.

2020-12-02 01:14:32 -08:00 · 2020-12-02 01:14:32 -08:00 · 234b795157
commit 234b795157
parent e317c56c99
2 changed files with 450 additions and 0 deletions
--- a/code/aoc-coq/day1.v
+++ b/code/aoc-coq/day1.v
@ -0,0 +1,102 @@
+Require Import Coq.Lists.List.
+Require Import Omega.
+
+Definition has_pair (t : nat) (is : list nat) : Prop :=
+  exists n1 n2 : nat, n1 <> n2 /\ In n1 is /\ In n2 is /\ n1 + n2 = t.
+
+Fixpoint find_matching (is : list nat) (total : nat) (x : nat) : option nat :=
+  match is with
+  | nil => None
+  | cons y ys =>
+      if Nat.eqb (x + y) total
+      then Some y
+      else find_matching ys total x 
+  end.
+
+Fixpoint find_sum (is : list nat) (total : nat) : option (nat * nat) :=
+  match is with
+  | nil => None
+  | cons x xs =>
+      match find_matching xs total x with
+      | None => find_sum xs total (* Was buggy! *)
+      | Some y => Some (x, y)
+      end
+  end.
+
+Lemma find_matching_correct : forall is k x y,
+  find_matching is k x = Some y -> x + y = k.
+Proof.
+  intros is. induction is;
+  intros k x y Hev.
+  - simpl in Hev. inversion Hev.
+  - simpl in Hev. destruct (Nat.eqb (x+a) k) eqn:Heq.
+    + injection Hev as H; subst.
+      apply EqNat.beq_nat_eq. auto.
+    + apply IHis. assumption.
+Qed.
+
+Lemma find_matching_skip : forall k x y i is,
+  find_matching is k x = Some y -> find_matching (cons i is) k x = Some y.
+Proof.
+  intros k x y i is Hsmall.
+  simpl. destruct (Nat.eqb (x+i) k) eqn:Heq.
+  - apply find_matching_correct in Hsmall.
+    symmetry in Heq. apply EqNat.beq_nat_eq in Heq.
+    assert (i = y). { omega. } rewrite H. reflexivity.
+  - assumption.
+Qed.
+
+Lemma find_matching_works : forall is k x y, In y is /\ x + y = k ->
+  find_matching is k x = Some y.
+Proof.
+  intros is. induction is;
+  intros k x y [Hin Heq].
+  - inversion Hin.
+  - inversion Hin.
+    + subst a. simpl. Search Nat.eqb.
+      destruct (Nat.eqb_spec (x+y) k).
+      * reflexivity.
+      * exfalso. apply n. assumption.
+    + apply find_matching_skip. apply IHis.
+      split; assumption.
+Qed.
+
+Theorem find_sum_works :
+  forall k is, has_pair k is ->
+  exists x y, (find_sum is k = Some (x, y) /\ x + y = k).
+Proof.
+  intros k is. generalize dependent k.
+  induction is; intros k [x' [y' [Hneq [Hinx [Hiny Hsum]]]]].
+  - (* is is empty. But x is in is! *)
+    inversion Hinx.
+  - (* is is not empty. *)
+    inversion Hinx. 
+    + (* x is the first element. *)
+      subst a. inversion Hiny.
+      * (* y is also the first element; but this is impossible! *)
+        exfalso. apply Hneq. apply H.
+      * (* y is somewhere in the rest of the list.
+           We've proven that we will find it! *)
+        exists x'. simpl.
+        erewrite find_matching_works.
+        { exists y'. split. reflexivity. assumption. }
+        { split; assumption. }
+    + (* x is not the first element. *)
+      inversion Hiny.
+      * (* y is the first element,
+           so x is somewhere in the rest of the list.
+           Again, we've proven that we can find it. *)
+        subst a. exists y'. simpl.
+        erewrite find_matching_works.
+        { exists x'. split. reflexivity. rewrite plus_comm. assumption. }
+        { split. assumption. rewrite plus_comm. assumption. }
+      * (* y is not the first element, either.
+           Of course, there could be another matching pair
+           starting with a. Otherwise, the inductive hypothesis applies. *)
+        simpl. destruct (find_matching is k a) eqn:Hf.
+        { exists a. exists n. split.
+          reflexivity. 
+          apply find_matching_correct with is. assumption. }
+        { apply IHis. unfold has_pair. exists x'. exists y'.
+          repeat split; assumption. }
+Qed.
--- a/content/blog/00_aoc_coq.md
+++ b/content/blog/00_aoc_coq.md
@ -0,0 +1,348 @@
+---
+title: "Advent of Code in Coq - Day 1"
+date: 2020-12-01T21:50:28-08:00
+tags: ["Advent of Code", "Coq"]
+draft: true
+---
+
+The first puzzle of this year's [Advent of Code](https://adventofcode.com) was quite
+simple, which gave me a thought: "Hey, this feels within reach for me to formally verify!"
+At first, I wanted to formalize and prove the correctness of the [two-pointer solution](https://www.geeksforgeeks.org/two-pointers-technique/).
+However, I didn't have the time to mess around with the various properties of sorted
+lists and their traversals. So, I settled for the brute force solution. Despite
+the simplicity of its implementation, there is plenty to talk about when proving
+its correctness using Coq. Let's get right into it!
+
+Before we start, in the interest of keeping the post self-contained, here's the (paraphrased)
+problem statement:
+
+> Given an unsorted list of numbers, find two distinct numbers that add up to 2020.
+
+With this in mind, we can move on to writing some Coq!
+
+### Defining the Functions
+The first step to proving our code correct is to actually write the code! As a first
+step, let's write a helper function that, given a number `x`, tries to find another number
+`y` such that `x + y = 2020`. The code is quite simple:
+
+{{< codelines "Coq" "aoc-coq/day1.v" 7 14 >}}
+
+Here, `is` is the list of numbers that we want to search, and `total` is the number
+we want to add up to (`2020` in our particular case), and `x` is the same as it was in
+the previous paragraph. We proceed by case analysis: if the list is empty, we can't
+find a match, so we return `None` (the Coq equivalent of Haskell's `Nothing`).
+On the other hand, if the list has at least one element, `y`, we see if it adds
+up to `total`, and return `y` if it does. If it doesn't, we continue our search
+into the rest of the list.
+
+It's somewhat unusual, in my experience, to put the list argument first when writing
+functions in a language with [currying](https://wiki.haskell.org/Currying). However,
+it seems as though Coq's `simpl` tactic, which we will use later, works better
+for our purposes when the argument being case analyzed is given first.
+
+We can now use `find_matching` to define our `find_sum` function, which solves part 1.
+Here's the code:
+
+{{< codelines "Coq" "aoc-coq/day1.v" 16 24 >}}
+
+For every `x` that we encounter in our input list `is`, we want to check if there's
+a matching number in the rest of the list. We only search the remainder of the list
+because we can't use `x` twice: the `x` and `y` we return that add up to `total`
+must be different elements. We use `find_matching` to try find a complementary number
+for `x`. If we don't find it, this `x` isn't it, so we recursively move on to `xs`.
+On the other hand, if we _do_ find a matching `y`, we're done! We return `(x,y)`,
+wrapped in `Some` to indicate that we found something useful.
+
+What about that `(* Was buggy! *)` line? Well, it so happens that my initial
+implementation had a bug on this line, one that came up as I was proving
+the correctness of my function. When I wasn't able to prove a particular
+behavior in one of the cases, I realized something was wrong. In short,
+my proof actually helped me find and fix a bug!
+
+This is all the code we'll need to get our solution. Next, let's talk about some
+properties of our two functions.
+
+### Our First Lemma
+When we call `find_matching`, we want to be sure that if we get a number, 
+it does indeed add up to our expected total. We can state it a little bit more
+formally as follows:
+
+> For any numbers `k` and `x`, and for any list of number `is`,
+> if `find_matching is k x` returns a number `y`, then `x + y = k`.
+
+And this is how we write it in Coq:
+
+{{< codelines "Coq" "aoc-coq/day1.v" 26 27 >}}
+
+The arrow, `->`, reads "implies". Other than that, I think this
+property reads pretty well. The proof, unfortunately, is a little bit more involved.
+Here are the first few lines:
+
+{{< codelines "Coq" "aoc-coq/day1.v" 28 31 >}}
+
+We start with the `intros is` tactic, which is akin to saying
+"consider a particular list of integers `is`". We do this without losing
+generality: by simply examining a concrete list, we've said nothing about
+what that list is like. We then proceed by induction on `is`.
+
+To prove something by induction for a list, we need to prove two things:
+
+* The __base case__. Whatever property we want to hold, it must
+hold for the empty list, which is the simplest possible list.
+In our case, this means `find_matching` searching an empty list.
+* The __inductive case__. Assuming that a property holds for any list
+`[b, c, ...]`, we want to show that the property also holds for 
+the list `[a, b, ...]`. That is, the property must remain true if we
+prepend an element to a list for which this property holds.
+
+These two things combined give us a proof for _all_ lists, which is exactly
+what we want! If you don't belive me, here's how it works. Suppose you want
+to prove that some property `P` holds for `[1,2,3,4]`. Given the base
+case, we know that `P []` holds. Next, by the inductive case, since
+`P []` holds, we can prepend `4` to the list, and the property will
+still hold. Thus, `P [4]`. Now that `P [4]` holds, we can again prepend
+an element to the list, this time a `3`, and conclude that `P [3,4]`.
+Repeating this twice more, we arrive at our desired fact: `P [1,2,3,4]`.
+
+When we write `induction is`, Coq will generate two proof goals for us,
+one for the base case, and one for the inductive case. Since we have
+not yet introduced the variables `k`, `x`, and `y`, they remain
+inside a `forall` quantifier at that time. To be able to refer
+to them, we want to use `intros`. We want to do this in both the
+base and the inductive case. To quickly do this, we use Coq's `;`
+operator. When we write `a; b`, Coq runs the tactic `a`, and then
+runs the tactic `b` in every proof goal generated by `a`. This is
+exactly what we want.
+
+There's one more variable inside our second `intros`: `Hev`.
+This variable refers to the hypothesis of our statement:
+that is, the part on the left of the `->`. To prove that `A`
+implies `B`, we assume that `A` holds, and try to argue `B` from there.
+Here is no different: when we use `intros Hev`, we say, "suppose that you have
+a proof that `find_matching` evaluates to `Some y`, called `Hev`". The thing
+on the right of `->` becomes our proof goal.
+
+Now, it's time to look at the cases. To focus on one case at a time,
+we use `-`. The first case is our base case. Here's what Coq prints
+out at this time:
+
+```
+k, x, y : nat
+Hev : find_matching nil k x = Some y
+
+========================= (1 / 1)
+
+x + y = k
+```
+
+All the stuff above the `===` line are our hypotheses. We know
+that we have some `k`, `x`, and `y`, all of which are numbers.
+We also have the assumption that `find_matching` returns `Some y`.
+In the base case, `is` is just `[]`, and this is reflected in the
+type for `Hev`. To make this more clear, we'll simplify the call to `find_matching`
+in `Hev`, using `simpl in Hev`. Now, here's what Coq has to say about `Hev`:
+
+```
+Hev : None = Some y
+```
+
+Well, this doesn't make any sense. How can something be equal to nothing?
+We ask Coq this question using `inversion Hev`. Effectively, `inversion` asks
+the question: what are the possible ways we could have acquired `Hev`?
+Coq generates a proof goal for each of these possible ways. Alas, there are
+no ways to arrive at this contradictory assumption: the number of proof sub-goals
+is zero. This means we're done with the base case!
+
+The inductive case is the meat of this proof. Here's the corresponding part
+of the proof:
+
+{{< codelines "Coq" "aoc-coq/day1.v" 32 36 >}}
+
+This time, the proof state is more complicated:
+
+```
+a : nat
+is : list nat
+IHis : forall k x y : nat, find_matching is k x = Some y -> x + y = k
+k, x, y : nat
+Hev : find_matching (a :: is) k x = Some y
+
+========================= (1 / 1)
+
+x + y = k
+```
+
+Following the footsteps of our informal description of the inductive case,
+Coq has us prove our property for `(a :: is)`, or the list `is` to which
+`a` is being prepended. Like before, we assume that our property holds for `is`.
+This is represented in the __induction hypothesis__ `IHis`. It states that if we
+`find_matching` finds a `y` in `is`, it must add up to `k`. However, `IHis`
+doesn't tell us anything about `a :: is`: that's our job. We also still have
+`Hev`, which is our assumption that `find_matching` finds a `y` in `(a :: is)`.
+Running `simpl in Hev` gives us the following:
+
+```
+Hev : (if x + a =? k then Some a else find_matching is k x) = Some y
+```
+
+The result of `find_matching` now depends on whether or not the new element `a`
+adds up to `k`. We're not sure if this is the case. Fortunately, we don't need to be!
+If we can prove that the `y` that `find_matching` finds is correct regardless
+of whether `a` adds up to `k` or not, we're good to go! To do this,
+we perform case analysis using `destruct`.
+
+Our particular use of `destruct` says: check any possible value for `Nat.eqb (x+a) k`,
+and create an equation `Heq` that tells us what that value is. `Nat.eqb` returns a boolean
+value, and so `destruct` generates two new goals: one where the function returns `true`,
+and one where it returns `false`. We start with the former. Here's the proof state:
+
+```
+a : nat
+is : list nat
+IHis : forall k x y : nat, find_matching is k x = Some y -> x + y = k
+k, x, y : nat
+Heq : (x + a =? k) = true
+Hev : Some a = Some y
+
+========================= (1 / 1)
+
+x + y = k
+```
+
+There is a new hypothesis: `Heq`. It tells us that we're currently
+considering the case where `Nat.eqb` evaluates to `true`. Also,
+`Hev` has been considerably simplified: now that we know the condition
+of the `if` expression, we can just replace it with the `then` branch.
+
+Looking at `Hev`, we can see that `a` is equal to `y`. After all,
+if they weren't, `Some a` wouldn't equal to `Some y`. To make Coq
+take this information into account, we use `injection`. This will create
+a new hypothesis, `a = y`. But if one is equal to the other, why don't we
+just use only one of these variables everywhere? We do exactly that by using
+`subst`, which replaces `a` with `y` everywhere in our proof.
+
+The proof state is now:
+
+```
+is : list nat
+IHis : forall k x y : nat, find_matching is k x = Some y -> x + y = k
+k, x, y : nat
+Heq : (x + y =? k) = true
+
+========================= (1 / 1)
+
+x + y = k
+```
+
+We're close, but there's one more detail to keep in mind. Our goal, `x + y = k`,
+is the __proposition__ that `x + y` is equal to `k`. However, `Heq` tells us
+that the __function__ `eqb` evaluates to `true`. These are fundamentally different.
+One talks about mathematical equality, while the other about some function `eqb`
+defined somewhere in Coq's standard library. Who knows - maybe there's a bug in
+Coq's implementation! Fortunately, Coq comes with a proof that if two things
+are equal according to `eqb`, they are mathematically equal. This proof is
+called `eqb_nat_eq`. We tell Coq to use this with `apply`. Our proof goal changes to:
+
+```
+true = (x + y =? k)
+```
+
+This is _almost_ like `Heq`, but flipped. Instead of manually flipping it and using `apply`
+with `Heq`, I let Coq figure the rest of the work out using `auto`.
+
+Phew! All this for the `true` case of `eqb`. Next, what happens if `x + a` does not equal `k`?
+Here's the proof state at this time:
+
+```
+a : nat
+is : list nat
+IHis : forall k x y : nat, find_matching is k x = Some y -> x + y = k
+k, x, y : nat
+Heq : (x + a =? k) = false
+Hev : find_matching is k x = Some y
+
+========================= (1 / 1)
+
+x + y = k
+```
+
+Since `a` was not what it was looking for, `find_matching` moved on to `is`. But hey,
+we're in the inductive case! We are assuming that `find_matching` will work properly
+with the list `is`. Since `find_matching` found its `y` in `is`, this should be all we need!
+We use our induction hypothesis `IHis` with `apply`. `IHis` itself does not know that
+`find_matching` moved on to `is`, so it asks us to prove it. Fortunately, `Hev` tells us
+exactly that, so we use `assumption`, and the proof is complete! Quod erat demonstrandum, QED!
+
+### The Rest of the Owl
+Here are a couple of other properties of `find_matching`. For brevity's sake, I will
+not go through their proofs step-by-step. I find that the best way to understand
+Coq proofs is to actually step through them in the IDE!
+
+First on the list is `find_matching_skip`. Here's the type:
+
+{{< codelines "Coq" "aoc-coq/day1.v" 38 39 >}}
+
+It reads: if we correctly find a number in a small list `is`, we can find that same number
+even if another number is prepended to `is`. That makes sense: _adding_ a number to
+a list doesn't remove whatever we found in it! I used this lemma to prove another,
+`find_matching_works`:
+
+{{< codelines "Coq" "aoc-coq/day1.v" 49 50 >}}
+
+This reads, if there _is_ an element `y` in `is` that adds up to `k` with `x`, then
+`find_matching` will find it. This is an important property. After all, if it didn't
+hold, it would mean that `find_matching` would occasionally fail to find a matching
+number, even though it's there! We can't have that.
+
+Finally, we want to specify what it means for `find_sum`, or solution function, to actually
+work. The naive definition would be:
+
+> Given a list of integers, `find_sum` always finds a pair of numbers that add up to `k`.
+
+Unfortunately, this is not true. What if, for instance, we have `find_sum` an empty list?
+There are no numbers from that list to find and add together. Even a non-empty list
+may not include such a pair! We need a way to characterize valid input lists. I claim
+that all lists from this Advent of Code puzzle are guaranteed to have two numbers that
+add up to our goal, and that these numbers are not equal to each other. In Coq,
+we state this as follows:
+
+{{< codelines "Coq" "aoc-coq/day1.v" 4 5 >}}
+
+This defines a new property, `has_pair t is` (read "`is` has a pair of numbers that add to `t`"),
+which means:
+
+* There are two numbers `n1` and `n2` such that,
+* They are not equal to each other (`n1 <> n2`) and (`/\`),
+* The number `n1` is an element of `is` (`In n1 is`) and,
+* The number `n2` is an element of `is` (`In n2 is`) and,
+* The two numbers add up to `t` (`n1 + n2 = t`).
+
+When making claims about the correctness of our algorithm, we will assume that this
+property holds. Finally, here's the theorem we want to prove:
+
+{{< codelines "Coq" "aoc-coq/day1.v" 64 66 >}}
+
+It reads, "for any total `k` and list `is`, if `is` has a pair of numbers that add to `k`,
+then `find_sum` will return a pair of numbers `x` and `y` that add to `k`".
+There's some nuance here. We hardly reference the `has_pair` property in this definition,
+and for good reason. Our `has_pair` hypothesis only says that there is _at least one_
+pair of numbers in `is` that meets our criteria. However, this pair need not be the only
+one, nor does it need to be the one returned by `find_sum`! However, if we have many pairs,
+we want to confirm that `find_sum` will find one of them. Finally, here is the proof.
+I will not be able to go through it in detail in this post, but I did comment it to
+make it easier to read:
+
+{{< codelines "Coq" "aoc-coq/day1.v" 67 102 >}}
+
+Coq seems happy with it, and so am I! The bug I mentioned earlier popped up on line 96.
+I had accidentally made `find_sum` return `None` if it couldn't find a complement
+for the `x` it encountered. This meant that it never recursed into the remaining
+list `xs`, and thus, the pair was never found at all! It this became impossible
+to prove that `find_some` will return `Some y`, and I had to double back
+and check my definitions.
+
+I hope you enjoyed this post! If you're interested to learn more about Coq, I strongly recommend
+checking out [Software Foundations](https://softwarefoundations.cis.upenn.edu/), a series
+of books on Coq written as comments in a Coq source file! In particular, check out
+[Logical Foundations](https://softwarefoundations.cis.upenn.edu/lf-current/index.html)
+for an introduction to using Coq. Thanks for reading!