Actually push the rest of the new article

This commit is contained in:
Danila Fedorin 2021-11-28 17:33:04 -08:00
parent 826dde759f
commit 2ce351f7ef
1 changed files with 304 additions and 9 deletions

View File

@ -124,7 +124,7 @@ Coq, why don't we use another language? Coq's extraction mechanism allows us to
I added the following code to the end of my file:
{{< codelines "Coq" "dawn/DawnEval.v" 219 223 >}}
{{< codelines "Coq" "dawn/DawnEval.v" 231 235 >}}
Coq happily produces a new file, `UccGen.hs` with a lot of code. It's not exactly the most
aesthetic; here's a quick peek:
@ -172,11 +172,12 @@ Okay, so `true false or` evaluates to `true`, much like our semantics claims.
However, does our evaluator _always_ match the semantics? So far, we have not
claimed or verified that it does. Let's try giving it a shot.
#### First Steps and Evaluation Chains
The first thing we can do is show that if we have a proof that `e` takes
initial stack `vs` to final stack `vs'`, then each
`eval_step` "makes progress" towards `vs'` (it doesn't simply _return_
`vs'`, since it only takes a single step and doesn't always complete
the evaluation). We also want to show that if the semanics dictates
the evaluation). We also want to show that if the semantics dictates
`e` finishes in stack `vs'`, then `eval_step` will never return an error.
Thus, we have two possibilities:
@ -236,9 +237,9 @@ exists ei1 ei2 vsi1 vsi2,
This is awful! Not only is this a lot of writing, but it also makes various
sequences of steps have a different "shape". This way, we can't make
proofs about evalutions of an _arbitrary_ number of steps. Not all is lost, though: if we squint
proofs about evaluations of an _arbitrary_ number of steps. Not all is lost, though: if we squint
a little at the last example (three steps), a pattern starts to emerge.
First, let's re-arrange the `exists` qualifiers:
First, let's re-arrange the `exists` quantifiers:
```Coq
exists ei1 vsi1, eval_step vs e = middle ei1 vsi1 /\ (* Cons *)
@ -274,7 +275,7 @@ eval_chain vs e vs'
```
Evaluation chains have a couple of interesting properties. First and foremost,
they can be "concatenated". This is analagous to the `Sem_e_comp` rule
they can be "concatenated". This is analogous to the `Sem_e_comp` rule
in our original semantics: if an expression `e1` starts in stack `vs`
and finishes in stack `vs'`, and another expression starts in stack `vs'`
and finishes in stack `vs''`, then we can compose these two expressions,
@ -283,7 +284,7 @@ and the result will start in `vs` and finish in `vs''`.
{{< codelines "Coq" "dawn/DawnEval.v" 69 75 >}}
The proof is very short. We go
{{< sidenote "right" "concat-note" "by induction on the left evaluation" >}}
{{< sidenote "right" "concat-note" "by induction on the left evaluation chain" >}}
It's not a coincidence that defining something like <code>(++)</code>
(list concatenation) in Haskell typically starts by pattern matching
on the <em>left</em> list. In fact, proofs by <code>induction</code>
@ -294,8 +295,8 @@ in sidenotes, but if you're playing with the
the <a href="https://en.wikipedia.org/wiki/Fixed-point_combinator">fixed point
combinator</a> used to implement recursion.
{{< /sidenote >}}
chain (the one for `e1`). Coq takes care of most of the rest with `auto`.
The key to this proof is that whatever `P` is cotained within a "node"
(the one for `e1`). Coq takes care of most of the rest with `auto`.
The key to this proof is that whatever `P` is contained within a "node"
in the left chain determines how `eval_step (e_comp e1 e2)` behaves. Whether
`e1` evaluates to `final` or `middle`, the composition evaluates to `middle`
(a composition of two expressions cannot be done in one step), so we always
@ -307,7 +308,7 @@ create new nodes when appending to an empty list.
* If the step was `final`, then
the rest of the steps use only `e2`, and good news, we already have a chain
for that!
* Otherwise, the step was `middle`, an we stll have a chain for some
* Otherwise, the step was `middle`, an we still have a chain for some
intermediate program `ei` that starts in some stack `vsi` and ends in `vs'`.
By induction, we know that _this_ chain can be concatenated with the one
for `e2`, resulting in a chain for `e_comp ei e2`. All that remains is to
@ -323,3 +324,297 @@ one for each composed expression:
While interesting, this particular fact isn't used anywhere else in this
post, and I will omit the proof here.
#### The Forward Direction
Let's try rewording our very first proof, `eval_step_correct`, using
chains. The _direction_ remains the same: given that an expression
produces a final stack under the formal semantics, we need to
prove that this expression _evaluates_ to the same final stack using
a sequence of `eval_step`.
{{< codelines "Coq" "dawn/DawnEval.v" 93 96 >}}
The power of this theorem is roughly the same as that of
the original one: we can use `eval_step_correct` to build up a chain
by applying it over and over, and we can take the "first element"
of a chain to serve as a witness for `eval_step_correct`. However,
the formulation is arguably significantly cleaner, and contains
a proof for _all_ steps right away, rather than a single one.
Before we go through the proof, notice that there's actually _two_
theorems being stated here, which depend on each other. This
is not surprising given that our semantics are given using two
data types, `Sem_expr` and `Sem_int`, each of which contains the other.
Regular proofs by induction, which work on only one of the data types,
break down because we can't make claims "by induction" about the _other_
type. For example, while going by induction on `Sem_expr`, we run into
issues in the `e_int` case while handling \\(\\text{apply}\\). We know
a single step -- popping the value being run from the stack. But what then?
The rule for \\(\\text{apply}\\) in `Sem_int` contains another `Sem_expr`
which confirms that the quoted value properly evaluates. But this other
`Sem_expr` isn't directly a child of the "bigger" `Sem_expr`, and so we
don't get an inductive hypothesis about it. We know nothing about it; we're stuck.
We therefore have two theorems declared together using `with` (just like we
used `with` to declare `Sem_expr` and `Sem_int`). We have to prove both,
which is once again a surprisingly easy task thanks to Coq's `auto`. Let's
start with the first theorem, the one for expression semantics.
{{< codelines "Coq" "dawn/DawnEval.v" 98 107 >}}
We go by induction on the semantics data type. There are therefore three cases:
intrinsics, quotes, and composition. The intrinsic case is handed right
off to the second theorem (which we'll see momentarily). The quote case
is very simple since quoted expressions are simply pushed onto the stack in
a single step (we thus use `chain_final`). Finally, in the composition
case, we have two sub-proofs, one for each expression being evaluated.
By induction, we know that each of these sub-proofs can be turned into
a chain, and we use `eval_chain_merge` to combine these two chains into one.
That's it.
Now, let's try the second theorem. The code is even shorter:
{{< codelines "Coq" "dawn/DawnEval.v" 108 115 >}}
The first command, `inversion Hsem`, lets us go case-by-case on the various
ways an intrinsic can be evaluated. Most intrinsics are quite boring;
in our evaluator, they only need a single step, and their semantics
rules guarantee that the stack has the exact kind of data that the evaluator
expects. We dismiss such cases with `apply chain_final; auto`. The only
time this doesn't work is when we encounter the \\(\\text{apply}\\) intrinsic;
in that case, however, we can simply use the first theorem we proved.
#### A Quick Aside: Automation Using `Ltac2`
Going in the other direction will involve lots of situations in which we _know_
that `eval_step` evaluated to something. Here's a toy proof built around
one such case:
{{< codelines "Coq" "dawn/DawnEval.v" 237 246 "hl_lines=5-8">}}
The proof claims that if the `swap` instruction was evaluated to something,
then the initial stack must contain two values, and the final stack must
have those two values on top but flipped. This is very easy to prove, since
that's the exact behavior of `eval_step` for the `swap` intrinsic. However,
notice how much boilerplate we're writing: lines 241 through 244 deal entirely
with ensuring that our stack does, indeed, have two values. This is trivially true:
if the stack _didn't_ have two values, it wouldn't evaluate to `final`, but to `error`.
However, it takes some effort to convince Coq of this.
This was just for a single intrinsic. What if we're trying to prove something
for _every_ intrinsic? Things will get messy very quickly. We can't even re-use
our `destruct vs` code, because different intrinsics need stacks with different numbers
of values (they have a different _arity_); if we try to handle all cases with at most
2 values on the stack, we'd have to prove the same thing twice for simpler intrinsics.
In short, proofs will look like a mess.
Things don't have to be this way, though! The boilerplate code is very repetitive, and
this makes it a good candidate for automation. For this, Coq has `Ltac2`.
In short, `Ltac2` is another mini-language contained within Coq. We can use it to put
into code the decision making that we'd be otherwise doing on our own. For example,
here's a tactic that checks if the current proof has a hypothesis in which
an intrinsic is evaluated to something:
{{< codelines "Coq" "dawn/DawnEval.v" 141 148 >}}
`Ltac2` has a pattern matching construct much like Coq itself does, but it can
pattern match on Coq expressions and proof goals. When pattern matching
on goals, we use the following notation: `[ h1 : t1, ..., hn : tn |- g ]`, which reads:
> Given hypotheses `h1` through `hn`, of types `t1` through `tn` respectively,
we need to prove `g`.
In our pattern match, then, we're ignoring the actual thing we're supposed to prove
(the tactic we're writing won't be that smart; its user -- ourselves -- will need
to know when to use it). We are, however, searching for a hypothesis in the form
`eval_step ?a (e_int ?b) = ?c`. The three variables, `?a`, `?b`, and `?c` are placeholders
which can be matched with anything. Thus, we expect any hypothesis in which
an intrinsic is evaluated to anything else (by the way, Coq checks all hypotheses one-by-one,
starting at the most recent and finishing with the oldest). The code on lines 144 through 146 actually
performs the `destruct` calls, as well as the `inversion` attempts that complete
any proofs with an inconsistent assumption (like `err = final vs'`).
So how do these lines work? Well, first we need to get a handle on the hypothesis we found.
Various tactics can manipulate the proof state, and cause hypotheses to disappear;
by calling `Control.hyp` on `h`, we secure a more permanent hold on our evaluation assumption.
In the next line, we call _another_ `Ltac2` function, `destruct_int_stack`, which unpacks
the stack exactly as many times as necessary. We'll look at this function in a moment.
Finally, we use regular old tactics. Specifically, we use `inversion` (as mentioned before).
I use `fail` after `inversion` to avoid cluttering the proof in case there's _not_
a contradiction.
On to `destruct_int_stack`. The definition is very simple:
{{< codelines "Coq" "dawn/DawnEval.v" 139 139 >}}
The main novelty is that this function takes two arguments, both of which are Coq expressions:
`int`, or the intrinsic being evaluated, and `va`, the stack that's being analyzed. The `constr`
type in Coq holds terms. Let's look at `int_arity` next:
{{< codelines "Coq" "dawn/DawnEval.v" 128 137 >}}
This is a pretty standard pattern match that assigns to each expression its arity. However, we're
case analyzing over _literally any possible Coq expression_, so we need to handle the case in which
the expression isn't actually a specific intrinsic. For this, we use `Control.throw` with
the `Not_intrinsic` exception.
Wait a moment, does Coq just happen to include a `Not_intrinsic` exception? No, it does not.
We have to register that one ourselves. In `Ltac2`, the type of exception (`exn`) is _extensible_,
which means we can add on to it. We add just a single constructor with no arguments:
{{< codelines "Coq" "dawn/DawnEval.v" 118 118 >}}
Finally, unpacking the value stack. Here's the code for that:
{{< codelines "Coq" "dawn/DawnEval.v" 120 126 >}}
This function accepts the arity of an operation, and unpacks the stack that many times.
`Ltac2` doesn't have a way to pattern match on numbers, so we resort to good old "less than or equal",
and and `if`/`else`. If the arity is zero (or less than zero, since it's an integer), we don't
need to unpack anymore. This is our base case. Otherwise, we generate two new free variables
(we can't just hardcode `v1` and `v2`, since they may be in use elsewhere in the proof). To this
end we use `Fresh.in_goal` and give it a "base symbol" to build on. We then use
`destruct`, passing it our "scrutinee" (the expression being destructed) and the names
we generated for its components. This generates multiple goals; the `Control.enter` tactic
is used to run code for each one of these goals. In the non-empty list case, we try to break
apart its tail as necessary by recursively calling `destruct_n`.
That's pretty much it! We can now use our tactic from Coq like any other. Rewriting
our earlier proof about \\(\\text{swap}\\), we now only need to handle the valid case:
{{< codelines "Coq" "dawn/DawnEval.v" 248 254 >}}
We'll be using this new `ensure_valid_stack` tactic in our subsequent proofs.
#### The Backward Direction
After our last proof before the `Ltac2` diversion, we know that our evaluator matches our semantics, but only when a `Sem_expr`
object exists for our program. However, invalid programs don't have a `Sem_expr` object
(there's no way to prove that they evaluate to anything). Does our evaluator behave
properly on "bad" programs, too? We've not ruled out that our evaluator produces junk
`final` or `middle` outputs whenever it encounters an error. We need another theorem:
{{< codelines "Coq" "dawn/DawnEval.v" 215 216 >}}
That is, if our evalutor reaches a final state, this state matches our semantics.
This proof is most conveniently split into two pieces. The first piece says that
if our evaluator completes in a single step, then this step matches our semantics.
{{< codelines "Coq" "dawn/DawnEval.v" 158 175 >}}
The statement for a non-`final` step is more involved; the one step by itself need
not match the semantics, since it only carries out a part of a computation. However,
we _can_ say that if the "rest of the program" matches the semantics, then so does
the whole expression.
{{< codelines "Coq" "dawn/DawnEval.v" 177 213 >}}
Finally, we snap these two pieces together in `eval_step_sem_back`:
{{< codelines "Coq" "dawn/DawnEval.v" 215 222 >}}
We have now proved complete equivalence: our evaluator completes in a final state
if and only if our semantics lead to this same final state. As a corollary of this,
we can see that if a program is invalid (it doesn't evaluate to anything under our semantics),
then our evaluator won't finish in a `final` state:
{{< codelines "Coq" "dawn/DawnEval.v" 224 229 >}}
### Making a Mini-REPL
We can now be pretty confident about our evaluator. However, this whole business with using GHCi
to run our programs is rather inconvenient; I'd rather write `[drop]` than `E_quote (E_int Drop)`.
I'd also rather _read_ the former instead of the latter. We can do some work in Haskell to make
playing with our interpreter more convenient.
#### Pretty Printing Code
Coq generated Haskell data types that correspond precisely to the data types we defined for our
proofs. We can use the standard type class mechanism in Haskell to define how they should be
printed:
{{< codelines "Haskell" "dawn/Ucc.hs" 8 22 >}}
Now our expressions and values print a lot nicer:
```
ghci> true
[swap drop]
ghci> false
[drop]
ghci> UccGen.or
clone apply
```
#### Reading Code In
The Parsec library in Haskell can be used to convert text back into data structures.
It's not too hard to create a parser for UCC:
{{< codelines "Haskell" "dawn/Ucc.hs" 24 44 >}}
Now, `parseExpression` can be used to read code:
```
ghci> parseExpression "[drop] [drop]"
Right [drop] [drop]
```
#### The REPL
We now literally have the three pieces of a read-evaluate-print loop. All that's left
is putting them together:
{{< codelines "Haskell" "dawn/Ucc.hs" 53 64 >}}
We now have our own little verified evaluator:
```
$ ghci Ucc
$ ghc -main-is Ucc Ucc
$ ./Ucc
> [drop] [drop] clone apply
[[drop]]
> [drop] [swap drop] clone apply
[[swap drop]]
> [swap drop] [drop] clone apply
[[swap drop]]
> [swap drop] [swap drop] clone apply
[[swap drop]]
```
### Potential Future Work
We now have a verified UCC evaluator in Haskell. What now?
1. We only verified the evaluation component of our REPL. In fact,
the whole thing is technically a bit buggy due to the parser:
```
> [drop] drop coq is a weird name to say out loud in interviews
[]
```
Shouldn't this be an error? Do you see why it's not? There's work in
writing formally verified parsers; some time ago I saw a Galois
talk about a [formally verified parser generator](https://galois.com/blog/2019/09/tech-talk-a-verified-ll1-parser-generator/).
We could use this formally verified parser generator to create our parsers, and then be
sure that our grammar is precisely followed.
2. We could try make our evaluator a bit smarter. One thing we could definitely do
is maker our REPL support variables. Then, we would be able to write:
```
> true = [swap drop]
> false = [drop]
> or = clone apply
> true false or
[swap drop]
```
There are different ways to go about this. One way is to extend our `expr` data type
with a variable constructor. This would complicate the semantics (a _lot_), but it would
allow us to prove facts about variables. Alternatively, we could implement expressions
as syntax sugar in our parser. Using a variable would be the same as simply
pasting in the variable's definition. This is pretty much what the Dawn article
seems to be doing.
3. We could prove more things. Can we confirm, once and for all, the correctness of \\(\\text{quote}_n\\),
for _any_ \\(n\\)? Is there is a generalized way of converting inductive data types into a UCC encoding?
Or could we perhaps formally verify the following comment from Lobsters:
> with the encoding of natural numbers given, n+1 contains the definition of n duplicated two times.
This means that the encoding of n has size exponential in n, which seems extremely impractical.
The world's our oyster!
Despite all of these exciting possibilities, this is where I stop, for now. I hope you enjoyed this article,
and thank you for reading!