blog-static/content/blog/coq_dawn_eval.md

---
title: "A Verified Evaluator for the Untyped Concatenative Calculus"
date: 2021-11-27T20:24:57-08:00
draft: true
tags: ["Dawn", "Coq", "Programming Languages"]
---

Earlier, I wrote [an article]({{< relref "./coq_dawn" >}}) in which I used Coq to
encode the formal semantics of [Dawn's Untyped Concatenative Calculus](https://www.dawn-lang.org/posts/foundations-ucc/),
and to prove a few thing about the mini-language. Though I'm quite happy with how that turned out,
my article was missing something that's present on the original Dawn page -- an evaluator. In this article, I'll define
an evaluator function in Coq, prove that it's equivalent to Dawn's formal semantics,
and extract all of this into usable Haskell code.

### Changes Since Last Time
In trying to write and verify this evaluator, I decided to make changes to the way I formalized
the UCC. Remember how we used a predicate, `IsValue`, to tag expressions that were values?
It turns out that this is a very cumbersome approach. For one thing, this formalization
allows for the case in which the exact same expression is a value for two different
reasons. Although `IsValue` has only one constructor (`Val_quote`), it's actually
{{< sidenote "right" "hott-note" "not provable" >}}
Interestingly, it's also not provable that any two proofs of \(a = b\) are equal,
even though equality only has one constructor, \(\text{eq_refl}\ :\ a \rightarrow (a = a) \).
Under the <a href="https://homotopytypetheory.org/book/">homotopic interpretation</a>
of type theory, this corresponds to the fact that two paths from \(a\) to \(b\) need
not be homotopic (continuously deformable) to each other.<br>
<br>
As an intuitive example, imagine wrapping a string around a pole. Holding the ends of
the string in place, there's nothing you can do to "unwrap" the string. This string
is thus not deformable into a string that starts and stops at the same points,
but does not go around the pole.
{{< /sidenote >}}
that any two proofs of `IsValue e` are equal. I ended up getting into a lot of losing
arguments with the Coq runtime about whether or not two stacks are actually the same.
Also, some of the semantic rules expected a value on the stack with a particular proof
for `IsValue`, and rejected the exact same stack with a generic value.

Thus, I switched from our old implementation:

{{< codelines "Coq" "dawn/Dawn.v" 19 22 >}}

To the one I originally had in mind:

{{< codelines "Coq" "dawn/DawnV2.v" 19 19 >}}

I then had the following function to convert a value back into an equivalent expression:

{{< codelines "Coq" "dawn/DawnV2.v" 22 25 >}}

I replaced instances of `projT1` with instances of `value_to_expr`.

### Where We Are
At the end of my previous article, we ended up with a Coq encoding of the big-step
[operational semantics](https://en.wikipedia.org/wiki/Operational_semantics)
of UCC, as well as some proofs of correctness about the derived forms from
the article (like \\(\\text{quote}_3\\) and \\(\\text{rotate}_3\\)). The trouble
is, despite having our operational semantics, we can't make our Coq
code run anything. This is for several reasons:

1. Our definitions only let us to _confirm_ that given some
   initial stack, a program ends up in some other final stack. We even have a
   little `Ltac2` tactic to help us automate this kind of proof. However, in an evaluator,
   the final stack is not known until the program finishes running. We can't
   confirm the result of evaluation, we need to _find_ it.
2. To address the first point, we could try write a function that takes a program
   and an initial stack, and produces a final stack, as well as a proof that
   the program would evaluate to this stack under our semantics. However,
   it's quite easy to write a non-terminating UCC program, whereas functions
   in Coq _have to terminate!_ We can't write a terminating function to
   run non-terminating code.

So, is there anything we can do? No, there isn't. Writing an evaluator in Coq
is just not possible. The end, thank you for reading.

Just kidding -- there's definitely a way to get our code evaluating, but it
will look a little bit strange.

### A Step-by-Step Evaluator
The trick is to recognize that program evaluation
occurs in steps. There may well be an infinite number of steps, if the program
is non-terminating, but there's always a step we can take. That is, unless
an invalid instruction is run, like trying to clone from an empty stack, or unless
the program finished running. You don't need a non-terminating function to just
give you a next step, if one exists. We can write such a function in Coq.

Here's a new data type that encodes the three situations we mentioned in the
previous paragraph. Its constructors (one per case) are as follows:

1. `err` - there are no possible evaluation steps due to an error.
3. `middle e s` - the evaluation took a step; the stack changed to `s`, and the rest of the program is `e`.
2. `final s` - there are no possible evaluation steps because the evaluation is complete, 
leaving a final stack `s`.

{{< codelines "Coq" "dawn/DawnEval.v" 6 9 >}}

We can now write a function that tries to execute a single step given an expression.

{{< codelines "Coq" "dawn/DawnEval.v" 11 27 >}}

Most intrinsics, by themselves, complete after just one step. For instance, a program
consisting solely of \\(\\text{swap}\\) will either fail (if the stack doesn't have enough
values), or it will swap the top two values and be done. We list only "correct" cases,
and resort to a "catch-all" case on line 26 that returns an error. The one multi-step
intrinsic is \\(\\text{apply}\\), which can evaluate an arbitrary expression from the stack.
In this case, the "one step" consists of popping the quoted value from the stack; the
"remaining program" is precisely the expression that was popped.

Quoting an expression also always completes in one step (it simply places the quoted
expression on the stack). The really interesting case is composition. Expressions
are evaluated left-to-right, so we first determine what kind of step the left expression (`e1`)
can take. We may need more than one step to finish up with `e1`, so there's a good chance it
returns a "rest program" `e1'` and a stack `vs'`. If this happens, to complete evaluation of
\\(e_1\\ e_2\\), we need to first finish evaluating \\(e_1'\\), and then evaluate \\(e_2\\).
Thus, the "rest of the program" is \\(e_1'\\ e_2\\), or `e_comp e1' e2`. On the other hand,
if `e1` finished evaluating, we still need to evaluate `e2`, so our "rest of the program"
is \\(e_2\\), or `e2`. If evaluating `e1` led to an error, then so did evaluating `e_comp e1 e2`,
and we return `err`.

### Extracting Code
Just knowing a single step is not enough to run the code. We need something that repeatedly
tries to take a step, as long as it's possible. However, this part is once again
not possible in Coq, as it brings back the possibility of non-termination. So if we can't use
Coq, why don't we use another language? Coq's extraction mechanism allows us to do just that.

I added the following code to the end of my file:

{{< codelines "Coq" "dawn/DawnEval.v" 219 223 >}}

Coq happily produces a new file, `UccGen.hs` with a lot of code. It's not exactly the most
aesthetic; here's a quick peek:

```Haskell
data Intrinsic =
   Swap
 | Clone
 | Drop
 | Quote
 | Compose
 | Apply

data Expr =
   E_int Intrinsic
 | E_quote Expr
 | E_comp Expr Expr

data Value =
   V_quote Expr

-- ... more
```

All that's left is to make a new file, `Ucc.hs`. I use a different file so that
changes I make aren't overridden by Coq each time I run extraction. In this
file, we place the "glue code" that tries running one step after another:

{{< codelines "Coq" "dawn/Ucc.hs" 46 51 >}}

Finally, loading up GHCi using `ghci Ucc.hs`, I can run the following commands:

```
ghci> fromList = foldl1 E_comp
ghci> test = eval [] $ fromList [true, false, UccGen.or]
ghci> :f test
test = Just [V_quote (E_comp (E_int Swap) (E_int Drop))]
```

That is, applying `or` to `true` and `false` results a stack with only `true` on top.
As expected, and proven by our semantics!

### Proving Equivalence
Okay, so `true false or` evaluates to `true`, much like our semantics claims.
However, does our evaluator _always_ match the semantics? So far, we have not
claimed or verified that it does. Let's try giving it a shot.

The first thing we can do is show that if we have a proof that `e` takes
initial stack `vs` to final stack `vs'`, then each
`eval_step` "makes progress" towards `vs'` (it doesn't simply _return_
`vs'`, since it only takes a single step and doesn't always complete
the evaluation). We also want to show that if the semanics dictates
`e` finishes in stack `vs'`, then `eval_step` will never return an error.
Thus, we have two possibilities:

* `eval_step` returns `final`. In this case, for it to match our semantics,
  the final stack must be the same as `vs'`. Here's the relevant section
  from the Coq file:

  {{< codelines "Coq" "dawn/DawnEval.v" 30 30 >}}
* `eval_step` returns `middle`. In this case, the "rest of the program" needs
  to evaluate to `vs'` according to our semantics (otherwise, taking a step
  has changed the program's final outcome, which should not happen).
  We need to quantify the new variables (specifically, the "rest of the
  program", which we'll call `ei`, and the "stack after one step", `vsi`),
  for which we use Coq's `exists` clause. The whole relevant statement is as
  follows:

  {{< codelines "Coq" "dawn/DawnEval.v" 31 33 >}}

The whole theorem claim is as follows:

{{< codelines "Coq" "dawn/DawnEval.v" 29 33 >}}

I have the Coq proof script for this (in fact, you can click the link
at the top of the code block to view the original source file). However,
there's something unsatisfying about this statement. In particular,
how do we prove that an entire sequence of steps evaluates
to something? We'd have to examine the first step, checking if
it's a "final" step or a "middle" step; if it's a "middle" step,
we'd have to move on to the "rest of the program" and repeat the process.
Each time we'd have to "retrieve" `ei` and `vsi` from `eval_step_correct`,
and feed it back to `eval_step`.

I'll do you one better: how do we even _say_ that an expression "evaluates
step-by-step to final stack `vs'`"? For one step, we can say:

```Coq
eval_step vs e = final vs'
```

For two steps, we'd have to assert the existence of an intermediate
expression (the "rest of the program" after the first step):

```Coq
exists ei vsi,
    eval_step vs e = middle ei vsi /\ (* First step to intermediate expression. *)
    eval_step vsi ei = final vs' (* Second step to final state. *)
```

For three steps:

```Coq
exists ei1 ei2 vsi1 vsi2,
    eval_step vs e = middle ei1 vsi1 /\ (* First step to intermediate expression. *)
    eval_step vsi1 ei1 = middle ei2 vsi2 /\ (* Second intermediate step *)
    eval_step vsi2 ei2 = final vs' (* Second step to final state. *)
```

This is awful! Not only is this a lot of writing, but it also makes various
sequences of steps have a different "shape". This way, we can't make
proofs about evalutions of an _arbitrary_ number of steps. Not all is lost, though: if we squint
a little at the last example (three steps), a pattern starts to emerge.
First, let's re-arrange the `exists` qualifiers:

```Coq
exists ei1 vsi1, eval_step vs e = middle ei1 vsi1 /\  (* Cons *)
exists ei2 vsi2, eval_step vsi1 ei1 = middle ei2 vsi2 /\ (* Cons *)
eval_step vsi2 ei2 = final vs' (* Nil *)
```

If you squint at this, it kind of looks like a list! The "empty"
part of a list is the final step, while the "cons" part is a middle step. The
analogy holds up for another reason: an "empty" sequence has zero intermediate
expressions, while each "cons" introduces a single new intermediate
program. Perhaps we can define a new data type that matches this? Indeed
we can!

{{< codelines "Coq" "dawn/DawnEval.v" 64 67 >}}

The new data type is parameterized by the initial and final stacks, as well
as the expression that starts in the former and ends in the latter.
Then, consider the following _type_:

```Coq
eval_chain nil (e_comp (e_comp true false) or) (true :: nil)
```

This is the type of sequences (or chains) of steps corresponding to the
evaluation of the program \\(\\text{true}\\ \\text{false}\\ \\text{or}\\),
starting in an empty stack and evaluating to a stack with only true on top.
Thus to say that an expression evaluates to some final stack `vs'`, in
_some unknown number of steps_, it's sufficient to write:

```Coq
eval_chain vs e vs'
```
Get started on a post about a UCC evaluator 2021-11-28 01:09:26 -08:00			`---`
			`title: "A Verified Evaluator for the Untyped Concatenative Calculus"`
			`date: 2021-11-27T20:24:57-08:00`
			`draft: true`
			`tags: ["Dawn", "Coq", "Programming Languages"]`
			`---`

			`Earlier, I wrote [an article]({{< relref "./coq_dawn" >}}) in which I used Coq to`
			`encode the formal semantics of [Dawn's Untyped Concatenative Calculus](https://www.dawn-lang.org/posts/foundations-ucc/),`
			`and to prove a few thing about the mini-language. Though I'm quite happy with how that turned out,`
			`my article was missing something that's present on the original Dawn page -- an evaluator. In this article, I'll define`
			`an evaluator function in Coq, prove that it's equivalent to Dawn's formal semantics,`
			`and extract all of this into usable Haskell code.`

			`### Changes Since Last Time`
			`In trying to write and verify this evaluator, I decided to make changes to the way I formalized`
			the UCC. Remember how we used a predicate, `IsValue`, to tag expressions that were values?
			`It turns out that this is a very cumbersome approach. For one thing, this formalization`
			`allows for the case in which the exact same expression is a value for two different`
			reasons. Although `IsValue` has only one constructor (`Val_quote`), it's actually
			`{{< sidenote "right" "hott-note" "not provable" >}}`
			`Interestingly, it's also not provable that any two proofs of \(a = b\) are equal,`
			`even though equality only has one constructor, \(\text{eq_refl}\ :\ a \rightarrow (a = a) \).`
			`Under the <a href="https://homotopytypetheory.org/book/">homotopic interpretation</a>`
			`of type theory, this corresponds to the fact that two paths from \(a\) to \(b\) need`
			`not be homotopic (continuously deformable) to each other.<br>`
			`<br>`
			`As an intuitive example, imagine wrapping a string around a pole. Holding the ends of`
			`the string in place, there's nothing you can do to "unwrap" the string. This string`
			`is thus not deformable into a string that starts and stops at the same points,`
			`but does not go around the pole.`
			`{{< /sidenote >}}`
			that any two proofs of `IsValue e` are equal. I ended up getting into a lot of losing
			`arguments with the Coq runtime about whether or not two stacks are actually the same.`
			`Also, some of the semantic rules expected a value on the stack with a particular proof`
			for `IsValue`, and rejected the exact same stack with a generic value.

			`Thus, I switched from our old implementation:`

			`{{< codelines "Coq" "dawn/Dawn.v" 19 22 >}}`

			`To the one I originally had in mind:`

			`{{< codelines "Coq" "dawn/DawnV2.v" 19 19 >}}`

			`I then had the following function to convert a value back into an equivalent expression:`

			`{{< codelines "Coq" "dawn/DawnV2.v" 22 25 >}}`

			I replaced instances of `projT1` with instances of `value_to_expr`.

			`### Where We Are`
			`At the end of my previous article, we ended up with a Coq encoding of the big-step`
			`[operational semantics](https://en.wikipedia.org/wiki/Operational_semantics)`
			`of UCC, as well as some proofs of correctness about the derived forms from`
			`the article (like \\(\\text{quote}_3\\) and \\(\\text{rotate}_3\\)). The trouble`
			`is, despite having our operational semantics, we can't make our Coq`
			`code run anything. This is for several reasons:`

			`1. Our definitions only let us to _confirm_ that given some`
			`initial stack, a program ends up in some other final stack. We even have a`
			little `Ltac2` tactic to help us automate this kind of proof. However, in an evaluator,
			`the final stack is not known until the program finishes running. We can't`
			`confirm the result of evaluation, we need to _find_ it.`
			`2. To address the first point, we could try write a function that takes a program`
			`and an initial stack, and produces a final stack, as well as a proof that`
			`the program would evaluate to this stack under our semantics. However,`
			`it's quite easy to write a non-terminating UCC program, whereas functions`
			`in Coq _have to terminate!_ We can't write a terminating function to`
			`run non-terminating code.`

			`So, is there anything we can do? No, there isn't. Writing an evaluator in Coq`
			`is just not possible. The end, thank you for reading.`

			`Just kidding -- there's definitely a way to get our code evaluating, but it`
			`will look a little bit strange.`

			`### A Step-by-Step Evaluator`
			`The trick is to recognize that program evaluation`
			`occurs in steps. There may well be an infinite number of steps, if the program`
			`is non-terminating, but there's always a step we can take. That is, unless`
			`an invalid instruction is run, like trying to clone from an empty stack, or unless`
			`the program finished running. You don't need a non-terminating function to just`
			`give you a next step, if one exists. We can write such a function in Coq.`

			`Here's a new data type that encodes the three situations we mentioned in the`
			`previous paragraph. Its constructors (one per case) are as follows:`

			1. `err` - there are no possible evaluation steps due to an error.
			3. `middle e s` - the evaluation took a step; the stack changed to `s`, and the rest of the program is `e`.
			2. `final s` - there are no possible evaluation steps because the evaluation is complete,
			leaving a final stack `s`.

			`{{< codelines "Coq" "dawn/DawnEval.v" 6 9 >}}`

			`We can now write a function that tries to execute a single step given an expression.`

			`{{< codelines "Coq" "dawn/DawnEval.v" 11 27 >}}`

			`Most intrinsics, by themselves, complete after just one step. For instance, a program`
			`consisting solely of \\(\\text{swap}\\) will either fail (if the stack doesn't have enough`
			`values), or it will swap the top two values and be done. We list only "correct" cases,`
			`and resort to a "catch-all" case on line 26 that returns an error. The one multi-step`
			`intrinsic is \\(\\text{apply}\\), which can evaluate an arbitrary expression from the stack.`
			`In this case, the "one step" consists of popping the quoted value from the stack; the`
			`"remaining program" is precisely the expression that was popped.`

			`Quoting an expression also always completes in one step (it simply places the quoted`
			`expression on the stack). The really interesting case is composition. Expressions`
			are evaluated left-to-right, so we first determine what kind of step the left expression (`e1`)
			can take. We may need more than one step to finish up with `e1`, so there's a good chance it
			returns a "rest program" `e1'` and a stack `vs'`. If this happens, to complete evaluation of
			`\\(e_1\\ e_2\\), we need to first finish evaluating \\(e_1'\\), and then evaluate \\(e_2\\).`
			Thus, the "rest of the program" is \\(e_1'\\ e_2\\), or `e_comp e1' e2`. On the other hand,
			if `e1` finished evaluating, we still need to evaluate `e2`, so our "rest of the program"
			is \\(e_2\\), or `e2`. If evaluating `e1` led to an error, then so did evaluating `e_comp e1 e2`,
			and we return `err`.

			`### Extracting Code`
			`Just knowing a single step is not enough to run the code. We need something that repeatedly`
			`tries to take a step, as long as it's possible. However, this part is once again`
			`not possible in Coq, as it brings back the possibility of non-termination. So if we can't use`
			`Coq, why don't we use another language? Coq's extraction mechanism allows us to do just that.`

			`I added the following code to the end of my file:`

			`{{< codelines "Coq" "dawn/DawnEval.v" 219 223 >}}`

			Coq happily produces a new file, `UccGen.hs` with a lot of code. It's not exactly the most
			`aesthetic; here's a quick peek:`

			```Haskell
			`data Intrinsic =`
			`Swap`
			`\| Clone`
			`\| Drop`
			`\| Quote`
			`\| Compose`
			`\| Apply`

			`data Expr =`
			`E_int Intrinsic`
			`\| E_quote Expr`
			`\| E_comp Expr Expr`

			`data Value =`
			`V_quote Expr`

			`-- ... more`
			```

			All that's left is to make a new file, `Ucc.hs`. I use a different file so that
			`changes I make aren't overridden by Coq each time I run extraction. In this`
			`file, we place the "glue code" that tries running one step after another:`

			`{{< codelines "Coq" "dawn/Ucc.hs" 46 51 >}}`

			Finally, loading up GHCi using `ghci Ucc.hs`, I can run the following commands:

			```
			`ghci> fromList = foldl1 E_comp`
			`ghci> test = eval [] $ fromList [true, false, UccGen.or]`
			`ghci> :f test`
			`test = Just [V_quote (E_comp (E_int Swap) (E_int Drop))]`
			```

			That is, applying `or` to `true` and `false` results a stack with only `true` on top.
			`As expected, and proven by our semantics!`

			`### Proving Equivalence`
			Okay, so `true false or` evaluates to `true`, much like our semantics claims.
			`However, does our evaluator _always_ match the semantics? So far, we have not`
			`claimed or verified that it does. Let's try giving it a shot.`

			The first thing we can do is show that if we have a proof that `e` takes
			initial stack `vs` to final stack `vs'`, then each
			`eval_step` "makes progress" towards `vs'` (it doesn't simply _return_
			`vs'`, since it only takes a single step and doesn't always complete
			`the evaluation). We also want to show that if the semanics dictates`
			`e` finishes in stack `vs'`, then `eval_step` will never return an error.
			`Thus, we have two possibilities:`

			* `eval_step` returns `final`. In this case, for it to match our semantics,
			the final stack must be the same as `vs'`. Here's the relevant section
			`from the Coq file:`

			`{{< codelines "Coq" "dawn/DawnEval.v" 30 30 >}}`
			* `eval_step` returns `middle`. In this case, the "rest of the program" needs
			to evaluate to `vs'` according to our semantics (otherwise, taking a step
			`has changed the program's final outcome, which should not happen).`
			`We need to quantify the new variables (specifically, the "rest of the`
			program", which we'll call `ei`, and the "stack after one step", `vsi`),
			for which we use Coq's `exists` clause. The whole relevant statement is as
			`follows:`

			`{{< codelines "Coq" "dawn/DawnEval.v" 31 33 >}}`

			`The whole theorem claim is as follows:`

			`{{< codelines "Coq" "dawn/DawnEval.v" 29 33 >}}`

			`I have the Coq proof script for this (in fact, you can click the link`
			`at the top of the code block to view the original source file). However,`
			`there's something unsatisfying about this statement. In particular,`
			`how do we prove that an entire sequence of steps evaluates`
			`to something? We'd have to examine the first step, checking if`
			`it's a "final" step or a "middle" step; if it's a "middle" step,`
			`we'd have to move on to the "rest of the program" and repeat the process.`
			Each time we'd have to "retrieve" `ei` and `vsi` from `eval_step_correct`,
			and feed it back to `eval_step`.

			`I'll do you one better: how do we even _say_ that an expression "evaluates`
			step-by-step to final stack `vs'`"? For one step, we can say:

			```Coq
			`eval_step vs e = final vs'`
			```

			`For two steps, we'd have to assert the existence of an intermediate`
			`expression (the "rest of the program" after the first step):`

			```Coq
			`exists ei vsi,`
			`eval_step vs e = middle ei vsi /\ (* First step to intermediate expression. *)`
			`eval_step vsi ei = final vs' (* Second step to final state. *)`
			```

			`For three steps:`

			```Coq
			`exists ei1 ei2 vsi1 vsi2,`
			`eval_step vs e = middle ei1 vsi1 /\ (* First step to intermediate expression. *)`
			`eval_step vsi1 ei1 = middle ei2 vsi2 /\ (* Second intermediate step *)`
			`eval_step vsi2 ei2 = final vs' (* Second step to final state. *)`
			```

			`This is awful! Not only is this a lot of writing, but it also makes various`
			`sequences of steps have a different "shape". This way, we can't make`
			`proofs about evalutions of an _arbitrary_ number of steps. Not all is lost, though: if we squint`
			`a little at the last example (three steps), a pattern starts to emerge.`
			First, let's re-arrange the `exists` qualifiers:

			```Coq
			`exists ei1 vsi1, eval_step vs e = middle ei1 vsi1 /\ (* Cons *)`
			`exists ei2 vsi2, eval_step vsi1 ei1 = middle ei2 vsi2 /\ (* Cons *)`
			`eval_step vsi2 ei2 = final vs' (* Nil *)`
			```

			`If you squint at this, it kind of looks like a list! The "empty"`
			`part of a list is the final step, while the "cons" part is a middle step. The`
			`analogy holds up for another reason: an "empty" sequence has zero intermediate`
			`expressions, while each "cons" introduces a single new intermediate`
			`program. Perhaps we can define a new data type that matches this? Indeed`
			`we can!`

			`{{< codelines "Coq" "dawn/DawnEval.v" 64 67 >}}`

			`The new data type is parameterized by the initial and final stacks, as well`
			`as the expression that starts in the former and ends in the latter.`
			`Then, consider the following _type_:`

			```Coq
			`eval_chain nil (e_comp (e_comp true false) or) (true :: nil)`
			```

			`This is the type of sequences (or chains) of steps corresponding to the`
			`evaluation of the program \\(\\text{true}\\ \\text{false}\\ \\text{or}\\),`
			`starting in an empty stack and evaluating to a stack with only true on top.`
			Thus to say that an expression evaluates to some final stack `vs'`, in
			`_some unknown number of steps_, it's sufficient to write:`

			```Coq
			`eval_chain vs e vs'`
			```