From 6c1940f5d287917864564e05ddf4e35136220baa Mon Sep 17 00:00:00 2001
From: Danila Fedorin <danila.fedorin@gmail.com>
Date: Sun, 28 Nov 2021 01:09:26 -0800
Subject: [PATCH] Get started on a post about a UCC evaluator

---
 content/blog/coq_dawn_eval.md | 274 ++++++++++++++++++++++++++++++++++
 1 file changed, 274 insertions(+)
 create mode 100644 content/blog/coq_dawn_eval.md
diff --git a/content/blog/coq_dawn_eval.md b/content/blog/coq_dawn_eval.md
new file mode 100644
index 0000000..3af286d
--- /dev/null
+++ b/content/blog/coq_dawn_eval.md
@@ -0,0 +1,274 @@
+---
+title: "A Verified Evaluator for the Untyped Concatenative Calculus"
+date: 2021-11-27T20:24:57-08:00
+draft: true
+tags: ["Dawn", "Coq", "Programming Languages"]
+---
+
+Earlier, I wrote [an article]({{< relref "./coq_dawn" >}}) in which I used Coq to
+encode the formal semantics of [Dawn's Untyped Concatenative Calculus](https://www.dawn-lang.org/posts/foundations-ucc/),
+and to prove a few thing about the mini-language. Though I'm quite happy with how that turned out,
+my article was missing something that's present on the original Dawn page -- an evaluator. In this article, I'll define
+an evaluator function in Coq, prove that it's equivalent to Dawn's formal semantics,
+and extract all of this into usable Haskell code.
+
+### Changes Since Last Time
+In trying to write and verify this evaluator, I decided to make changes to the way I formalized
+the UCC. Remember how we used a predicate, `IsValue`, to tag expressions that were values?
+It turns out that this is a very cumbersome approach. For one thing, this formalization
+allows for the case in which the exact same expression is a value for two different
+reasons. Although `IsValue` has only one constructor (`Val_quote`), it's actually
+{{< sidenote "right" "hott-note" "not provable" >}}
+Interestingly, it's also not provable that any two proofs of \(a = b\) are equal,
+even though equality only has one constructor, \(\text{eq_refl}\ :\ a \rightarrow (a = a) \).
+Under the <a href="https://homotopytypetheory.org/book/">homotopic interpretation</a>
+of type theory, this corresponds to the fact that two paths from \(a\) to \(b\) need
+not be homotopic (continuously deformable) to each other.<br>
+<br>
+As an intuitive example, imagine wrapping a string around a pole. Holding the ends of
+the string in place, there's nothing you can do to "unwrap" the string. This string
+is thus not deformable into a string that starts and stops at the same points,
+but does not go around the pole.
+{{< /sidenote >}}
+that any two proofs of `IsValue e` are equal. I ended up getting into a lot of losing
+arguments with the Coq runtime about whether or not two stacks are actually the same.
+Also, some of the semantic rules expected a value on the stack with a particular proof
+for `IsValue`, and rejected the exact same stack with a generic value.
+
+Thus, I switched from our old implementation:
+
+{{< codelines "Coq" "dawn/Dawn.v" 19 22 >}}
+
+To the one I originally had in mind:
+
+{{< codelines "Coq" "dawn/DawnV2.v" 19 19 >}}
+
+I then had the following function to convert a value back into an equivalent expression:
+
+{{< codelines "Coq" "dawn/DawnV2.v" 22 25 >}}
+
+I replaced instances of `projT1` with instances of `value_to_expr`.
+
+### Where We Are
+At the end of my previous article, we ended up with a Coq encoding of the big-step
+[operational semantics](https://en.wikipedia.org/wiki/Operational_semantics)
+of UCC, as well as some proofs of correctness about the derived forms from
+the article (like \\(\\text{quote}_3\\) and \\(\\text{rotate}_3\\)). The trouble
+is, despite having our operational semantics, we can't make our Coq
+code run anything. This is for several reasons:
+
+1. Our definitions only let us to _confirm_ that given some
+   initial stack, a program ends up in some other final stack. We even have a
+   little `Ltac2` tactic to help us automate this kind of proof. However, in an evaluator,
+   the final stack is not known until the program finishes running. We can't
+   confirm the result of evaluation, we need to _find_ it.
+2. To address the first point, we could try write a function that takes a program
+   and an initial stack, and produces a final stack, as well as a proof that
+   the program would evaluate to this stack under our semantics. However,
+   it's quite easy to write a non-terminating UCC program, whereas functions
+   in Coq _have to terminate!_ We can't write a terminating function to
+   run non-terminating code.
+
+So, is there anything we can do? No, there isn't. Writing an evaluator in Coq
+is just not possible. The end, thank you for reading.
+
+Just kidding -- there's definitely a way to get our code evaluating, but it
+will look a little bit strange.
+
+### A Step-by-Step Evaluator
+The trick is to recognize that program evaluation
+occurs in steps. There may well be an infinite number of steps, if the program
+is non-terminating, but there's always a step we can take. That is, unless
+an invalid instruction is run, like trying to clone from an empty stack, or unless
+the program finished running. You don't need a non-terminating function to just
+give you a next step, if one exists. We can write such a function in Coq.
+
+Here's a new data type that encodes the three situations we mentioned in the
+previous paragraph. Its constructors (one per case) are as follows:
+
+1. `err` - there are no possible evaluation steps due to an error.
+3. `middle e s` - the evaluation took a step; the stack changed to `s`, and the rest of the program is `e`.
+2. `final s` - there are no possible evaluation steps because the evaluation is complete, 
+leaving a final stack `s`.
+
+{{< codelines "Coq" "dawn/DawnEval.v" 6 9 >}}
+
+We can now write a function that tries to execute a single step given an expression.
+
+{{< codelines "Coq" "dawn/DawnEval.v" 11 27 >}}
+
+Most intrinsics, by themselves, complete after just one step. For instance, a program
+consisting solely of \\(\\text{swap}\\) will either fail (if the stack doesn't have enough
+values), or it will swap the top two values and be done. We list only "correct" cases,
+and resort to a "catch-all" case on line 26 that returns an error. The one multi-step
+intrinsic is \\(\\text{apply}\\), which can evaluate an arbitrary expression from the stack.
+In this case, the "one step" consists of popping the quoted value from the stack; the
+"remaining program" is precisely the expression that was popped.
+
+Quoting an expression also always completes in one step (it simply places the quoted
+expression on the stack). The really interesting case is composition. Expressions
+are evaluated left-to-right, so we first determine what kind of step the left expression (`e1`)
+can take. We may need more than one step to finish up with `e1`, so there's a good chance it
+returns a "rest program" `e1'` and a stack `vs'`. If this happens, to complete evaluation of
+\\(e_1\\ e_2\\), we need to first finish evaluating \\(e_1'\\), and then evaluate \\(e_2\\).
+Thus, the "rest of the program" is \\(e_1'\\ e_2\\), or `e_comp e1' e2`. On the other hand,
+if `e1` finished evaluating, we still need to evaluate `e2`, so our "rest of the program"
+is \\(e_2\\), or `e2`. If evaluating `e1` led to an error, then so did evaluating `e_comp e1 e2`,
+and we return `err`.
+
+### Extracting Code
+Just knowing a single step is not enough to run the code. We need something that repeatedly
+tries to take a step, as long as it's possible. However, this part is once again
+not possible in Coq, as it brings back the possibility of non-termination. So if we can't use
+Coq, why don't we use another language? Coq's extraction mechanism allows us to do just that.
+
+I added the following code to the end of my file:
+
+{{< codelines "Coq" "dawn/DawnEval.v" 219 223 >}}
+
+Coq happily produces a new file, `UccGen.hs` with a lot of code. It's not exactly the most
+aesthetic; here's a quick peek:
+
+```Haskell
+data Intrinsic =
+   Swap
+ | Clone
+ | Drop
+ | Quote
+ | Compose
+ | Apply
+
+data Expr =
+   E_int Intrinsic
+ | E_quote Expr
+ | E_comp Expr Expr
+
+data Value =
+   V_quote Expr
+
+-- ... more
+```
+
+All that's left is to make a new file, `Ucc.hs`. I use a different file so that
+changes I make aren't overridden by Coq each time I run extraction. In this
+file, we place the "glue code" that tries running one step after another:
+
+{{< codelines "Coq" "dawn/Ucc.hs" 46 51 >}}
+
+Finally, loading up GHCi using `ghci Ucc.hs`, I can run the following commands:
+
+```
+ghci> fromList = foldl1 E_comp
+ghci> test = eval [] $ fromList [true, false, UccGen.or]
+ghci> :f test
+test = Just [V_quote (E_comp (E_int Swap) (E_int Drop))]
+```
+
+That is, applying `or` to `true` and `false` results a stack with only `true` on top.
+As expected, and proven by our semantics!
+
+### Proving Equivalence
+Okay, so `true false or` evaluates to `true`, much like our semantics claims.
+However, does our evaluator _always_ match the semantics? So far, we have not
+claimed or verified that it does. Let's try giving it a shot.
+
+The first thing we can do is show that if we have a proof that `e` takes
+initial stack `vs` to final stack `vs'`, then each
+`eval_step` "makes progress" towards `vs'` (it doesn't simply _return_
+`vs'`, since it only takes a single step and doesn't always complete
+the evaluation). We also want to show that if the semanics dictates
+`e` finishes in stack `vs'`, then `eval_step` will never return an error.
+Thus, we have two possibilities:
+
+* `eval_step` returns `final`. In this case, for it to match our semantics,
+  the final stack must be the same as `vs'`. Here's the relevant section
+  from the Coq file:
+
+  {{< codelines "Coq" "dawn/DawnEval.v" 30 30 >}}
+* `eval_step` returns `middle`. In this case, the "rest of the program" needs
+  to evaluate to `vs'` according to our semantics (otherwise, taking a step
+  has changed the program's final outcome, which should not happen).
+  We need to quantify the new variables (specifically, the "rest of the
+  program", which we'll call `ei`, and the "stack after one step", `vsi`),
+  for which we use Coq's `exists` clause. The whole relevant statement is as
+  follows:
+
+  {{< codelines "Coq" "dawn/DawnEval.v" 31 33 >}}
+
+The whole theorem claim is as follows:
+
+{{< codelines "Coq" "dawn/DawnEval.v" 29 33 >}}
+
+I have the Coq proof script for this (in fact, you can click the link
+at the top of the code block to view the original source file). However,
+there's something unsatisfying about this statement. In particular,
+how do we prove that an entire sequence of steps evaluates
+to something? We'd have to examine the first step, checking if
+it's a "final" step or a "middle" step; if it's a "middle" step,
+we'd have to move on to the "rest of the program" and repeat the process.
+Each time we'd have to "retrieve" `ei` and `vsi` from `eval_step_correct`,
+and feed it back to `eval_step`.
+
+I'll do you one better: how do we even _say_ that an expression "evaluates
+step-by-step to final stack `vs'`"? For one step, we can say:
+
+```Coq
+eval_step vs e = final vs'
+```
+
+For two steps, we'd have to assert the existence of an intermediate
+expression (the "rest of the program" after the first step):
+
+```Coq
+exists ei vsi,
+    eval_step vs e = middle ei vsi /\ (* First step to intermediate expression. *)
+    eval_step vsi ei = final vs' (* Second step to final state. *)
+```
+
+For three steps:
+
+```Coq
+exists ei1 ei2 vsi1 vsi2,
+    eval_step vs e = middle ei1 vsi1 /\ (* First step to intermediate expression. *)
+    eval_step vsi1 ei1 = middle ei2 vsi2 /\ (* Second intermediate step *)
+    eval_step vsi2 ei2 = final vs' (* Second step to final state. *)
+```
+
+This is awful! Not only is this a lot of writing, but it also makes various
+sequences of steps have a different "shape". This way, we can't make
+proofs about evalutions of an _arbitrary_ number of steps. Not all is lost, though: if we squint
+a little at the last example (three steps), a pattern starts to emerge.
+First, let's re-arrange the `exists` qualifiers:
+
+```Coq
+exists ei1 vsi1, eval_step vs e = middle ei1 vsi1 /\  (* Cons *)
+exists ei2 vsi2, eval_step vsi1 ei1 = middle ei2 vsi2 /\ (* Cons *)
+eval_step vsi2 ei2 = final vs' (* Nil *)
+```
+
+If you squint at this, it kind of looks like a list! The "empty"
+part of a list is the final step, while the "cons" part is a middle step. The
+analogy holds up for another reason: an "empty" sequence has zero intermediate
+expressions, while each "cons" introduces a single new intermediate
+program. Perhaps we can define a new data type that matches this? Indeed
+we can!
+
+{{< codelines "Coq" "dawn/DawnEval.v" 64 67 >}}
+
+The new data type is parameterized by the initial and final stacks, as well
+as the expression that starts in the former and ends in the latter.
+Then, consider the following _type_:
+
+```Coq
+eval_chain nil (e_comp (e_comp true false) or) (true :: nil)
+```
+
+This is the type of sequences (or chains) of steps corresponding to the
+evaluation of the program \\(\\text{true}\\ \\text{false}\\ \\text{or}\\),
+starting in an empty stack and evaluating to a stack with only true on top.
+Thus to say that an expression evaluates to some final stack `vs'`, in
+_some unknown number of steps_, it's sufficient to write:
+
+```Coq
+eval_chain vs e vs'
+```