diff --git a/content/blog/coq_dawn_eval.md b/content/blog/coq_dawn_eval.md new file mode 100644 index 0000000..3af286d --- /dev/null +++ b/content/blog/coq_dawn_eval.md @@ -0,0 +1,274 @@ +--- +title: "A Verified Evaluator for the Untyped Concatenative Calculus" +date: 2021-11-27T20:24:57-08:00 +draft: true +tags: ["Dawn", "Coq", "Programming Languages"] +--- + +Earlier, I wrote [an article]({{< relref "./coq_dawn" >}}) in which I used Coq to +encode the formal semantics of [Dawn's Untyped Concatenative Calculus](https://www.dawn-lang.org/posts/foundations-ucc/), +and to prove a few thing about the mini-language. Though I'm quite happy with how that turned out, +my article was missing something that's present on the original Dawn page -- an evaluator. In this article, I'll define +an evaluator function in Coq, prove that it's equivalent to Dawn's formal semantics, +and extract all of this into usable Haskell code. + +### Changes Since Last Time +In trying to write and verify this evaluator, I decided to make changes to the way I formalized +the UCC. Remember how we used a predicate, `IsValue`, to tag expressions that were values? +It turns out that this is a very cumbersome approach. For one thing, this formalization +allows for the case in which the exact same expression is a value for two different +reasons. Although `IsValue` has only one constructor (`Val_quote`), it's actually +{{< sidenote "right" "hott-note" "not provable" >}} +Interestingly, it's also not provable that any two proofs of \(a = b\) are equal, +even though equality only has one constructor, \(\text{eq_refl}\ :\ a \rightarrow (a = a) \). +Under the homotopic interpretation +of type theory, this corresponds to the fact that two paths from \(a\) to \(b\) need +not be homotopic (continuously deformable) to each other.
+
+As an intuitive example, imagine wrapping a string around a pole. Holding the ends of +the string in place, there's nothing you can do to "unwrap" the string. This string +is thus not deformable into a string that starts and stops at the same points, +but does not go around the pole. +{{< /sidenote >}} +that any two proofs of `IsValue e` are equal. I ended up getting into a lot of losing +arguments with the Coq runtime about whether or not two stacks are actually the same. +Also, some of the semantic rules expected a value on the stack with a particular proof +for `IsValue`, and rejected the exact same stack with a generic value. + +Thus, I switched from our old implementation: + +{{< codelines "Coq" "dawn/Dawn.v" 19 22 >}} + +To the one I originally had in mind: + +{{< codelines "Coq" "dawn/DawnV2.v" 19 19 >}} + +I then had the following function to convert a value back into an equivalent expression: + +{{< codelines "Coq" "dawn/DawnV2.v" 22 25 >}} + +I replaced instances of `projT1` with instances of `value_to_expr`. + +### Where We Are +At the end of my previous article, we ended up with a Coq encoding of the big-step +[operational semantics](https://en.wikipedia.org/wiki/Operational_semantics) +of UCC, as well as some proofs of correctness about the derived forms from +the article (like \\(\\text{quote}_3\\) and \\(\\text{rotate}_3\\)). The trouble +is, despite having our operational semantics, we can't make our Coq +code run anything. This is for several reasons: + +1. Our definitions only let us to _confirm_ that given some + initial stack, a program ends up in some other final stack. We even have a + little `Ltac2` tactic to help us automate this kind of proof. However, in an evaluator, + the final stack is not known until the program finishes running. We can't + confirm the result of evaluation, we need to _find_ it. +2. To address the first point, we could try write a function that takes a program + and an initial stack, and produces a final stack, as well as a proof that + the program would evaluate to this stack under our semantics. However, + it's quite easy to write a non-terminating UCC program, whereas functions + in Coq _have to terminate!_ We can't write a terminating function to + run non-terminating code. + +So, is there anything we can do? No, there isn't. Writing an evaluator in Coq +is just not possible. The end, thank you for reading. + +Just kidding -- there's definitely a way to get our code evaluating, but it +will look a little bit strange. + +### A Step-by-Step Evaluator +The trick is to recognize that program evaluation +occurs in steps. There may well be an infinite number of steps, if the program +is non-terminating, but there's always a step we can take. That is, unless +an invalid instruction is run, like trying to clone from an empty stack, or unless +the program finished running. You don't need a non-terminating function to just +give you a next step, if one exists. We can write such a function in Coq. + +Here's a new data type that encodes the three situations we mentioned in the +previous paragraph. Its constructors (one per case) are as follows: + +1. `err` - there are no possible evaluation steps due to an error. +3. `middle e s` - the evaluation took a step; the stack changed to `s`, and the rest of the program is `e`. +2. `final s` - there are no possible evaluation steps because the evaluation is complete, +leaving a final stack `s`. + +{{< codelines "Coq" "dawn/DawnEval.v" 6 9 >}} + +We can now write a function that tries to execute a single step given an expression. + +{{< codelines "Coq" "dawn/DawnEval.v" 11 27 >}} + +Most intrinsics, by themselves, complete after just one step. For instance, a program +consisting solely of \\(\\text{swap}\\) will either fail (if the stack doesn't have enough +values), or it will swap the top two values and be done. We list only "correct" cases, +and resort to a "catch-all" case on line 26 that returns an error. The one multi-step +intrinsic is \\(\\text{apply}\\), which can evaluate an arbitrary expression from the stack. +In this case, the "one step" consists of popping the quoted value from the stack; the +"remaining program" is precisely the expression that was popped. + +Quoting an expression also always completes in one step (it simply places the quoted +expression on the stack). The really interesting case is composition. Expressions +are evaluated left-to-right, so we first determine what kind of step the left expression (`e1`) +can take. We may need more than one step to finish up with `e1`, so there's a good chance it +returns a "rest program" `e1'` and a stack `vs'`. If this happens, to complete evaluation of +\\(e_1\\ e_2\\), we need to first finish evaluating \\(e_1'\\), and then evaluate \\(e_2\\). +Thus, the "rest of the program" is \\(e_1'\\ e_2\\), or `e_comp e1' e2`. On the other hand, +if `e1` finished evaluating, we still need to evaluate `e2`, so our "rest of the program" +is \\(e_2\\), or `e2`. If evaluating `e1` led to an error, then so did evaluating `e_comp e1 e2`, +and we return `err`. + +### Extracting Code +Just knowing a single step is not enough to run the code. We need something that repeatedly +tries to take a step, as long as it's possible. However, this part is once again +not possible in Coq, as it brings back the possibility of non-termination. So if we can't use +Coq, why don't we use another language? Coq's extraction mechanism allows us to do just that. + +I added the following code to the end of my file: + +{{< codelines "Coq" "dawn/DawnEval.v" 219 223 >}} + +Coq happily produces a new file, `UccGen.hs` with a lot of code. It's not exactly the most +aesthetic; here's a quick peek: + +```Haskell +data Intrinsic = + Swap + | Clone + | Drop + | Quote + | Compose + | Apply + +data Expr = + E_int Intrinsic + | E_quote Expr + | E_comp Expr Expr + +data Value = + V_quote Expr + +-- ... more +``` + +All that's left is to make a new file, `Ucc.hs`. I use a different file so that +changes I make aren't overridden by Coq each time I run extraction. In this +file, we place the "glue code" that tries running one step after another: + +{{< codelines "Coq" "dawn/Ucc.hs" 46 51 >}} + +Finally, loading up GHCi using `ghci Ucc.hs`, I can run the following commands: + +``` +ghci> fromList = foldl1 E_comp +ghci> test = eval [] $ fromList [true, false, UccGen.or] +ghci> :f test +test = Just [V_quote (E_comp (E_int Swap) (E_int Drop))] +``` + +That is, applying `or` to `true` and `false` results a stack with only `true` on top. +As expected, and proven by our semantics! + +### Proving Equivalence +Okay, so `true false or` evaluates to `true`, much like our semantics claims. +However, does our evaluator _always_ match the semantics? So far, we have not +claimed or verified that it does. Let's try giving it a shot. + +The first thing we can do is show that if we have a proof that `e` takes +initial stack `vs` to final stack `vs'`, then each +`eval_step` "makes progress" towards `vs'` (it doesn't simply _return_ +`vs'`, since it only takes a single step and doesn't always complete +the evaluation). We also want to show that if the semanics dictates +`e` finishes in stack `vs'`, then `eval_step` will never return an error. +Thus, we have two possibilities: + +* `eval_step` returns `final`. In this case, for it to match our semantics, + the final stack must be the same as `vs'`. Here's the relevant section + from the Coq file: + + {{< codelines "Coq" "dawn/DawnEval.v" 30 30 >}} +* `eval_step` returns `middle`. In this case, the "rest of the program" needs + to evaluate to `vs'` according to our semantics (otherwise, taking a step + has changed the program's final outcome, which should not happen). + We need to quantify the new variables (specifically, the "rest of the + program", which we'll call `ei`, and the "stack after one step", `vsi`), + for which we use Coq's `exists` clause. The whole relevant statement is as + follows: + + {{< codelines "Coq" "dawn/DawnEval.v" 31 33 >}} + +The whole theorem claim is as follows: + +{{< codelines "Coq" "dawn/DawnEval.v" 29 33 >}} + +I have the Coq proof script for this (in fact, you can click the link +at the top of the code block to view the original source file). However, +there's something unsatisfying about this statement. In particular, +how do we prove that an entire sequence of steps evaluates +to something? We'd have to examine the first step, checking if +it's a "final" step or a "middle" step; if it's a "middle" step, +we'd have to move on to the "rest of the program" and repeat the process. +Each time we'd have to "retrieve" `ei` and `vsi` from `eval_step_correct`, +and feed it back to `eval_step`. + +I'll do you one better: how do we even _say_ that an expression "evaluates +step-by-step to final stack `vs'`"? For one step, we can say: + +```Coq +eval_step vs e = final vs' +``` + +For two steps, we'd have to assert the existence of an intermediate +expression (the "rest of the program" after the first step): + +```Coq +exists ei vsi, + eval_step vs e = middle ei vsi /\ (* First step to intermediate expression. *) + eval_step vsi ei = final vs' (* Second step to final state. *) +``` + +For three steps: + +```Coq +exists ei1 ei2 vsi1 vsi2, + eval_step vs e = middle ei1 vsi1 /\ (* First step to intermediate expression. *) + eval_step vsi1 ei1 = middle ei2 vsi2 /\ (* Second intermediate step *) + eval_step vsi2 ei2 = final vs' (* Second step to final state. *) +``` + +This is awful! Not only is this a lot of writing, but it also makes various +sequences of steps have a different "shape". This way, we can't make +proofs about evalutions of an _arbitrary_ number of steps. Not all is lost, though: if we squint +a little at the last example (three steps), a pattern starts to emerge. +First, let's re-arrange the `exists` qualifiers: + +```Coq +exists ei1 vsi1, eval_step vs e = middle ei1 vsi1 /\ (* Cons *) +exists ei2 vsi2, eval_step vsi1 ei1 = middle ei2 vsi2 /\ (* Cons *) +eval_step vsi2 ei2 = final vs' (* Nil *) +``` + +If you squint at this, it kind of looks like a list! The "empty" +part of a list is the final step, while the "cons" part is a middle step. The +analogy holds up for another reason: an "empty" sequence has zero intermediate +expressions, while each "cons" introduces a single new intermediate +program. Perhaps we can define a new data type that matches this? Indeed +we can! + +{{< codelines "Coq" "dawn/DawnEval.v" 64 67 >}} + +The new data type is parameterized by the initial and final stacks, as well +as the expression that starts in the former and ends in the latter. +Then, consider the following _type_: + +```Coq +eval_chain nil (e_comp (e_comp true false) or) (true :: nil) +``` + +This is the type of sequences (or chains) of steps corresponding to the +evaluation of the program \\(\\text{true}\\ \\text{false}\\ \\text{or}\\), +starting in an empty stack and evaluating to a stack with only true on top. +Thus to say that an expression evaluates to some final stack `vs'`, in +_some unknown number of steps_, it's sufficient to write: + +```Coq +eval_chain vs e vs' +```