Actually push the rest of the new article

2021-11-28 17:33:04 -08:00 · 2021-11-28 17:33:04 -08:00 · 2ce351f7ef
commit 2ce351f7ef
parent 826dde759f
1 changed files with 304 additions and 9 deletions
--- a/content/blog/coq_dawn_eval.md
+++ b/content/blog/coq_dawn_eval.md
@ -124,7 +124,7 @@ Coq, why don't we use another language? Coq's extraction mechanism allows us to

 I added the following code to the end of my file:

-{{< codelines "Coq" "dawn/DawnEval.v" 219 223 >}}
+{{< codelines "Coq" "dawn/DawnEval.v" 231 235 >}}

 Coq happily produces a new file, `UccGen.hs` with a lot of code. It's not exactly the most
 aesthetic; here's a quick peek:
@ -172,11 +172,12 @@ Okay, so `true false or` evaluates to `true`, much like our semantics claims.
 However, does our evaluator _always_ match the semantics? So far, we have not
 claimed or verified that it does. Let's try giving it a shot.

+#### First Steps and Evaluation Chains
 The first thing we can do is show that if we have a proof that `e` takes
 initial stack `vs` to final stack `vs'`, then each
 `eval_step` "makes progress" towards `vs'` (it doesn't simply _return_
 `vs'`, since it only takes a single step and doesn't always complete
-the evaluation). We also want to show that if the semanics dictates
+the evaluation). We also want to show that if the semantics dictates
 `e` finishes in stack `vs'`, then `eval_step` will never return an error.
 Thus, we have two possibilities:

@ -236,9 +237,9 @@ exists ei1 ei2 vsi1 vsi2,

 This is awful! Not only is this a lot of writing, but it also makes various
 sequences of steps have a different "shape". This way, we can't make
-proofs about evalutions of an _arbitrary_ number of steps. Not all is lost, though: if we squint
+proofs about evaluations of an _arbitrary_ number of steps. Not all is lost, though: if we squint
 a little at the last example (three steps), a pattern starts to emerge.
-First, let's re-arrange the `exists` qualifiers:
+First, let's re-arrange the `exists` quantifiers:

 ```Coq
 exists ei1 vsi1, eval_step vs e = middle ei1 vsi1 /\  (* Cons *)
@ -274,7 +275,7 @@ eval_chain vs e vs'
 ```

 Evaluation chains have a couple of interesting properties. First and foremost,
-they can be "concatenated". This is analagous to the `Sem_e_comp` rule
+they can be "concatenated". This is analogous to the `Sem_e_comp` rule
 in our original semantics: if an expression `e1` starts in stack `vs`
 and finishes in stack `vs'`, and another expression starts in stack `vs'`
 and finishes in stack `vs''`, then we can compose these two expressions,
@ -283,7 +284,7 @@ and the result will start in `vs` and finish in `vs''`.
 {{< codelines "Coq" "dawn/DawnEval.v" 69 75 >}}

 The proof is very short. We go
-{{< sidenote "right" "concat-note" "by induction on the left evaluation" >}}
+{{< sidenote "right" "concat-note" "by induction on the left evaluation chain" >}}
 It's not a coincidence that defining something like <code>(++)</code>
 (list concatenation) in Haskell typically starts by pattern matching
 on the <em>left</em> list. In fact, proofs by <code>induction</code>
@ -294,8 +295,8 @@ in sidenotes, but if you're playing with the
 the <a href="https://en.wikipedia.org/wiki/Fixed-point_combinator">fixed point
 combinator</a> used to implement recursion.
 {{< /sidenote >}}
-chain (the one for `e1`). Coq takes care of most of the rest with `auto`.
-The key to this proof is that whatever `P` is cotained within a "node"
+(the one for `e1`). Coq takes care of most of the rest with `auto`.
+The key to this proof is that whatever `P` is contained within a "node"
 in the left chain determines how `eval_step (e_comp e1 e2)` behaves. Whether
 `e1` evaluates to `final` or `middle`, the composition evaluates to `middle`
 (a composition of two expressions cannot be done in one step), so we always
@ -307,7 +308,7 @@ create new nodes when appending to an empty list.
 * If the step was `final`, then
  the rest of the steps use only `e2`, and good news, we already have a chain
  for that!
-* Otherwise, the step was `middle`, an we stll have a chain for some
+* Otherwise, the step was `middle`, an we still have a chain for some
  intermediate program `ei` that starts in some stack `vsi` and ends in `vs'`.
  By induction, we know that _this_ chain can be concatenated with the one
  for `e2`, resulting in a chain for `e_comp ei e2`. All that remains is to
@ -323,3 +324,297 @@ one for each composed expression:

 While interesting, this particular fact isn't used anywhere else in this
 post, and I will omit the proof here.
+
+#### The Forward Direction
+Let's try rewording our very first proof, `eval_step_correct`, using
+chains. The _direction_ remains the same: given that an expression
+produces a final stack under the formal semantics, we need to
+prove that this expression _evaluates_ to the same final stack using
+a sequence of `eval_step`.
+
+{{< codelines "Coq" "dawn/DawnEval.v" 93 96 >}}
+
+The power of this theorem is roughly the same as that of
+the original one: we can use `eval_step_correct` to build up a chain
+by applying it over and over, and we can take the "first element"
+of a chain to serve as a witness for `eval_step_correct`. However,
+the formulation is arguably significantly cleaner, and contains
+a proof for _all_ steps right away, rather than a single one.
+
+Before we go through the proof, notice that there's actually _two_
+theorems being stated here, which depend on each other. This
+is not surprising given that our semantics are given using two
+data types, `Sem_expr` and `Sem_int`, each of which contains the other.
+Regular proofs by induction, which work on only one of the data types,
+break down because we can't make claims "by induction" about the _other_
+type. For example, while going by induction on `Sem_expr`, we run into
+issues in the `e_int` case while handling \\(\\text{apply}\\). We know
+a single step -- popping the value being run from the stack. But what then?
+The rule for \\(\\text{apply}\\) in `Sem_int` contains another `Sem_expr`
+which confirms that the quoted value properly evaluates. But this other
+`Sem_expr` isn't directly a child of the "bigger" `Sem_expr`, and so we
+don't get an inductive hypothesis about it. We know nothing about it; we're stuck.
+
+We therefore have two theorems declared together using `with` (just like we
+used `with` to declare `Sem_expr` and `Sem_int`). We have to prove both,
+which is once again a surprisingly easy task thanks to Coq's `auto`. Let's
+start with the first theorem, the one for expression semantics.
+
+{{< codelines "Coq" "dawn/DawnEval.v" 98 107 >}}
+
+We go by induction on the semantics data type. There are therefore three cases:
+intrinsics, quotes, and composition. The intrinsic case is handed right
+off to the second theorem (which we'll see momentarily). The quote case
+is very simple since quoted expressions are simply pushed onto the stack in
+a single step (we thus use `chain_final`). Finally, in the composition
+case, we have two sub-proofs, one for each expression being evaluated.
+By induction, we know that each of these sub-proofs can be turned into
+a chain, and we use `eval_chain_merge` to combine these two chains into one.
+That's it.
+
+Now, let's try the second theorem. The code is even shorter:
+
+{{< codelines "Coq" "dawn/DawnEval.v" 108 115 >}}
+
+The first command, `inversion Hsem`, lets us go case-by-case on the various
+ways an intrinsic can be evaluated. Most intrinsics are quite boring;
+in our evaluator, they only need a single step, and their semantics
+rules guarantee that the stack has the exact kind of data that the evaluator
+expects. We dismiss such cases with `apply chain_final; auto`. The only
+time this doesn't work is when we encounter the \\(\\text{apply}\\) intrinsic;
+in that case, however, we can simply use the first theorem we proved.
+
+#### A Quick Aside: Automation Using `Ltac2`
+Going in the other direction will involve lots of situations in which we _know_
+that `eval_step` evaluated to something. Here's a toy proof built around
+one such case:
+
+{{< codelines "Coq" "dawn/DawnEval.v" 237 246  "hl_lines=5-8">}}
+
+The proof claims that if the `swap` instruction was evaluated to something,
+then the initial stack must contain two values, and the final stack must
+have those two values on top but flipped. This is very easy to prove, since
+that's the exact behavior of `eval_step` for the `swap` intrinsic. However,
+notice how much boilerplate we're writing: lines 241 through 244 deal entirely
+with ensuring that our stack does, indeed, have two values. This is trivially true:
+if the stack _didn't_ have two values, it wouldn't evaluate to `final`, but to `error`.
+However, it takes some effort to convince Coq of this.
+
+This was just for a single intrinsic. What if we're trying to prove something
+for _every_ intrinsic? Things will get messy very quickly. We can't even re-use
+our `destruct vs` code, because different intrinsics need stacks with different numbers
+of values (they have a different _arity_); if we try to handle all cases with at most
+2 values on the stack, we'd have to prove the same thing twice for simpler intrinsics.
+In short, proofs will look like a mess.
+
+Things don't have to be this way, though! The boilerplate code is very repetitive, and
+this makes it a good candidate for automation. For this, Coq has `Ltac2`.
+
+In short, `Ltac2` is another mini-language contained within Coq. We can use it to put
+into code the decision making that we'd be otherwise doing on our own. For example,
+here's a tactic that checks if the current proof has a hypothesis in which
+an intrinsic is evaluated to something:
+
+{{< codelines "Coq" "dawn/DawnEval.v" 141 148 >}}
+
+`Ltac2` has a pattern matching construct much like Coq itself does, but it can
+pattern match on Coq expressions and proof goals. When pattern matching
+on goals, we use the following notation: `[ h1 : t1, ..., hn : tn |- g ]`, which reads:
+
+> Given hypotheses `h1` through `hn`, of types `t1` through `tn` respectively,
+  we need to prove `g`.
+
+In our pattern match, then, we're ignoring the actual thing we're supposed to prove
+(the tactic we're writing won't be that smart; its user -- ourselves -- will need
+to know when to use it). We are, however, searching for a hypothesis in the form
+`eval_step ?a (e_int ?b) = ?c`. The three variables, `?a`, `?b`, and `?c` are placeholders
+which can be matched with anything. Thus, we expect any hypothesis in which
+an intrinsic is evaluated to anything else (by the way, Coq checks all hypotheses one-by-one,
+starting at the most recent and finishing with the oldest). The code on lines 144 through 146 actually
+performs the `destruct` calls, as well as the `inversion` attempts that complete
+any proofs with an inconsistent assumption (like `err = final vs'`).
+
+So how do these lines work? Well, first we need to get a handle on the hypothesis we found.
+Various tactics can manipulate the proof state, and cause hypotheses to disappear;
+by calling `Control.hyp` on `h`, we secure a more permanent hold on our evaluation assumption.
+In the next line, we call _another_ `Ltac2` function, `destruct_int_stack`, which unpacks
+the stack exactly as many times as necessary. We'll look at this function in a moment.
+Finally, we use regular old tactics. Specifically, we use `inversion` (as mentioned before).
+I use `fail` after `inversion` to avoid cluttering the proof in case there's _not_
+a contradiction.
+
+On to `destruct_int_stack`. The definition is very simple:
+
+{{< codelines "Coq" "dawn/DawnEval.v" 139 139 >}}
+
+The main novelty is that this function takes two arguments, both of which are Coq expressions:
+`int`, or the intrinsic being evaluated, and `va`, the stack that's being analyzed. The `constr`
+type in Coq holds terms. Let's look at `int_arity` next:
+
+{{< codelines "Coq" "dawn/DawnEval.v" 128 137 >}}
+
+This is a pretty standard pattern match that assigns to each expression its arity. However, we're
+case analyzing over _literally any possible Coq expression_, so we need to handle the case in which
+the expression isn't actually a specific intrinsic. For this, we use `Control.throw` with
+the `Not_intrinsic` exception.
+
+Wait a moment, does Coq just happen to include a `Not_intrinsic` exception? No, it does not.
+We have to register that one ourselves. In `Ltac2`, the type of exception (`exn`) is _extensible_,
+which means we can add on to it. We add just a single constructor with no arguments:
+
+{{< codelines "Coq" "dawn/DawnEval.v" 118 118 >}}
+
+Finally, unpacking the value stack. Here's the code for that:
+
+{{< codelines "Coq" "dawn/DawnEval.v" 120 126 >}}
+
+This function accepts the arity of an operation, and unpacks the stack that many times.
+`Ltac2` doesn't have a way to pattern match on numbers, so we resort to good old "less than or equal",
+and and `if`/`else`. If the arity is zero (or less than zero, since it's an integer), we don't
+need to unpack anymore. This is our base case. Otherwise, we generate two new free variables
+(we can't just hardcode `v1` and `v2`, since they may be in use elsewhere in the proof). To this
+end we use `Fresh.in_goal` and give it a "base symbol" to build on. We then use
+`destruct`, passing it our "scrutinee" (the expression being destructed) and the names
+we generated for its components. This generates multiple goals; the `Control.enter` tactic
+is used to run code for each one of these goals. In the non-empty list case, we try to break
+apart its tail as necessary by recursively calling `destruct_n`.
+
+That's pretty much it! We can now use our tactic from Coq like any other. Rewriting
+our earlier proof about \\(\\text{swap}\\), we now only need to handle the valid case:
+
+{{< codelines "Coq" "dawn/DawnEval.v" 248 254 >}}
+
+We'll be using this new `ensure_valid_stack` tactic in our subsequent proofs.
+
+#### The Backward Direction
+After our last proof before the `Ltac2` diversion, we know that our evaluator matches our semantics, but only when a `Sem_expr`
+object exists for our program. However, invalid programs don't have a `Sem_expr` object
+(there's no way to prove that they evaluate to anything). Does our evaluator behave
+properly on "bad" programs, too? We've not ruled out that our evaluator produces junk
+`final` or `middle` outputs whenever it encounters an error. We need another theorem:
+
+{{< codelines "Coq" "dawn/DawnEval.v" 215 216 >}}
+
+That is, if our evalutor reaches a final state, this state matches our semantics.
+This proof is most conveniently split into two pieces. The first piece says that
+if our evaluator completes in a single step, then this step matches our semantics.
+
+{{< codelines "Coq" "dawn/DawnEval.v" 158 175 >}}
+
+The statement for a non-`final` step is more involved; the one step by itself need
+not match the semantics, since it only carries out a part of a computation. However,
+we _can_ say that if the "rest of the program" matches the semantics, then so does
+the whole expression.
+
+{{< codelines "Coq" "dawn/DawnEval.v" 177 213 >}}
+
+Finally, we snap these two pieces together in `eval_step_sem_back`:
+
+{{< codelines "Coq" "dawn/DawnEval.v" 215 222 >}}
+
+We have now proved complete equivalence: our evaluator completes in a final state
+if and only if our semantics lead to this same final state. As a corollary of this,
+we can see that if a program is invalid (it doesn't evaluate to anything under our semantics),
+then our evaluator won't finish in a `final` state:
+
+{{< codelines "Coq" "dawn/DawnEval.v" 224 229 >}}
+
+### Making a Mini-REPL
+We can now be pretty confident about our evaluator. However, this whole business with using GHCi
+to run our programs is rather inconvenient; I'd rather write `[drop]` than `E_quote (E_int Drop)`.
+I'd also rather _read_ the former instead of the latter. We can do some work in Haskell to make
+playing with our interpreter more convenient.
+
+#### Pretty Printing Code
+Coq generated Haskell data types that correspond precisely to the data types we defined for our
+proofs. We can use the standard type class mechanism in Haskell to define how they should be
+printed:
+
+{{< codelines "Haskell" "dawn/Ucc.hs" 8 22 >}}
+
+Now our expressions and values print a lot nicer:
+
+```
+ghci> true
+[swap drop]
+ghci> false
+[drop]
+ghci> UccGen.or
+clone apply
+```
+
+#### Reading Code In
+The Parsec library in Haskell can be used to convert text back into data structures.
+It's not too hard to create a parser for UCC:
+
+{{< codelines "Haskell" "dawn/Ucc.hs" 24 44 >}}
+
+Now, `parseExpression` can be used to read code:
+
+```
+ghci> parseExpression "[drop] [drop]"
+Right [drop] [drop]
+```
+
+#### The REPL
+We now literally have the three pieces of a read-evaluate-print loop. All that's left
+is putting them together:
+
+{{< codelines "Haskell" "dawn/Ucc.hs" 53 64 >}}
+
+We now have our own little verified evaluator:
+
+```
+$ ghci Ucc
+$ ghc -main-is Ucc Ucc
+$ ./Ucc
+> [drop] [drop] clone apply
+[[drop]]
+> [drop] [swap drop] clone apply
+[[swap drop]]
+> [swap drop] [drop] clone apply
+[[swap drop]]
+> [swap drop] [swap drop] clone apply
+[[swap drop]]
+```
+
+### Potential Future Work
+We now have a verified UCC evaluator in Haskell. What now?
+
+1. We only verified the evaluation component of our REPL. In fact,
+   the whole thing is technically a bit buggy due to the parser:
+   ```
+   > [drop] drop coq is a weird name to say out loud in interviews
+   []
+   ```
+   Shouldn't this be an error? Do you see why it's not? There's work in
+   writing formally verified parsers; some time ago I saw a Galois
+   talk about a [formally verified parser generator](https://galois.com/blog/2019/09/tech-talk-a-verified-ll1-parser-generator/).
+   We could use this formally verified parser generator to create our parsers, and then be
+   sure that our grammar is precisely followed.
+2. We could try make our evaluator a bit smarter. One thing we could definitely do
+   is maker our REPL support variables. Then, we would be able to write:
+   ```
+   > true = [swap drop]
+   > false = [drop]
+   > or = clone apply
+   > true false or
+   [swap drop]
+   ```
+   There are different ways to go about this. One way is to extend our `expr` data type
+   with a variable constructor. This would complicate the semantics (a _lot_), but it would
+   allow us to prove facts about variables. Alternatively, we could implement expressions
+   as syntax sugar in our parser. Using a variable would be the same as simply
+   pasting in the variable's definition. This is pretty much what the Dawn article
+   seems to be doing.
+3. We could prove more things. Can we confirm, once and for all, the correctness of \\(\\text{quote}_n\\),
+   for _any_ \\(n\\)? Is there is a generalized way of converting inductive data types into a UCC encoding?
+   Or could we perhaps formally verify the following comment from Lobsters:
+
+   > with the encoding of natural numbers given, n+1 contains the definition of n duplicated two times.
+     This means that the encoding of n has size exponential in n, which seems extremely impractical. 
+   
+   The world's our oyster!
+
+Despite all of these exciting possibilities, this is where I stop, for now. I hope you enjoyed this article,
+and thank you for reading!