Actually push the rest of the new article
This commit is contained in:
parent
826dde759f
commit
2ce351f7ef
@ -124,7 +124,7 @@ Coq, why don't we use another language? Coq's extraction mechanism allows us to
|
||||
|
||||
I added the following code to the end of my file:
|
||||
|
||||
{{< codelines "Coq" "dawn/DawnEval.v" 219 223 >}}
|
||||
{{< codelines "Coq" "dawn/DawnEval.v" 231 235 >}}
|
||||
|
||||
Coq happily produces a new file, `UccGen.hs` with a lot of code. It's not exactly the most
|
||||
aesthetic; here's a quick peek:
|
||||
@ -172,11 +172,12 @@ Okay, so `true false or` evaluates to `true`, much like our semantics claims.
|
||||
However, does our evaluator _always_ match the semantics? So far, we have not
|
||||
claimed or verified that it does. Let's try giving it a shot.
|
||||
|
||||
#### First Steps and Evaluation Chains
|
||||
The first thing we can do is show that if we have a proof that `e` takes
|
||||
initial stack `vs` to final stack `vs'`, then each
|
||||
`eval_step` "makes progress" towards `vs'` (it doesn't simply _return_
|
||||
`vs'`, since it only takes a single step and doesn't always complete
|
||||
the evaluation). We also want to show that if the semanics dictates
|
||||
the evaluation). We also want to show that if the semantics dictates
|
||||
`e` finishes in stack `vs'`, then `eval_step` will never return an error.
|
||||
Thus, we have two possibilities:
|
||||
|
||||
@ -236,9 +237,9 @@ exists ei1 ei2 vsi1 vsi2,
|
||||
|
||||
This is awful! Not only is this a lot of writing, but it also makes various
|
||||
sequences of steps have a different "shape". This way, we can't make
|
||||
proofs about evalutions of an _arbitrary_ number of steps. Not all is lost, though: if we squint
|
||||
proofs about evaluations of an _arbitrary_ number of steps. Not all is lost, though: if we squint
|
||||
a little at the last example (three steps), a pattern starts to emerge.
|
||||
First, let's re-arrange the `exists` qualifiers:
|
||||
First, let's re-arrange the `exists` quantifiers:
|
||||
|
||||
```Coq
|
||||
exists ei1 vsi1, eval_step vs e = middle ei1 vsi1 /\ (* Cons *)
|
||||
@ -274,7 +275,7 @@ eval_chain vs e vs'
|
||||
```
|
||||
|
||||
Evaluation chains have a couple of interesting properties. First and foremost,
|
||||
they can be "concatenated". This is analagous to the `Sem_e_comp` rule
|
||||
they can be "concatenated". This is analogous to the `Sem_e_comp` rule
|
||||
in our original semantics: if an expression `e1` starts in stack `vs`
|
||||
and finishes in stack `vs'`, and another expression starts in stack `vs'`
|
||||
and finishes in stack `vs''`, then we can compose these two expressions,
|
||||
@ -283,7 +284,7 @@ and the result will start in `vs` and finish in `vs''`.
|
||||
{{< codelines "Coq" "dawn/DawnEval.v" 69 75 >}}
|
||||
|
||||
The proof is very short. We go
|
||||
{{< sidenote "right" "concat-note" "by induction on the left evaluation" >}}
|
||||
{{< sidenote "right" "concat-note" "by induction on the left evaluation chain" >}}
|
||||
It's not a coincidence that defining something like <code>(++)</code>
|
||||
(list concatenation) in Haskell typically starts by pattern matching
|
||||
on the <em>left</em> list. In fact, proofs by <code>induction</code>
|
||||
@ -294,8 +295,8 @@ in sidenotes, but if you're playing with the
|
||||
the <a href="https://en.wikipedia.org/wiki/Fixed-point_combinator">fixed point
|
||||
combinator</a> used to implement recursion.
|
||||
{{< /sidenote >}}
|
||||
chain (the one for `e1`). Coq takes care of most of the rest with `auto`.
|
||||
The key to this proof is that whatever `P` is cotained within a "node"
|
||||
(the one for `e1`). Coq takes care of most of the rest with `auto`.
|
||||
The key to this proof is that whatever `P` is contained within a "node"
|
||||
in the left chain determines how `eval_step (e_comp e1 e2)` behaves. Whether
|
||||
`e1` evaluates to `final` or `middle`, the composition evaluates to `middle`
|
||||
(a composition of two expressions cannot be done in one step), so we always
|
||||
@ -307,7 +308,7 @@ create new nodes when appending to an empty list.
|
||||
* If the step was `final`, then
|
||||
the rest of the steps use only `e2`, and good news, we already have a chain
|
||||
for that!
|
||||
* Otherwise, the step was `middle`, an we stll have a chain for some
|
||||
* Otherwise, the step was `middle`, an we still have a chain for some
|
||||
intermediate program `ei` that starts in some stack `vsi` and ends in `vs'`.
|
||||
By induction, we know that _this_ chain can be concatenated with the one
|
||||
for `e2`, resulting in a chain for `e_comp ei e2`. All that remains is to
|
||||
@ -323,3 +324,297 @@ one for each composed expression:
|
||||
|
||||
While interesting, this particular fact isn't used anywhere else in this
|
||||
post, and I will omit the proof here.
|
||||
|
||||
#### The Forward Direction
|
||||
Let's try rewording our very first proof, `eval_step_correct`, using
|
||||
chains. The _direction_ remains the same: given that an expression
|
||||
produces a final stack under the formal semantics, we need to
|
||||
prove that this expression _evaluates_ to the same final stack using
|
||||
a sequence of `eval_step`.
|
||||
|
||||
{{< codelines "Coq" "dawn/DawnEval.v" 93 96 >}}
|
||||
|
||||
The power of this theorem is roughly the same as that of
|
||||
the original one: we can use `eval_step_correct` to build up a chain
|
||||
by applying it over and over, and we can take the "first element"
|
||||
of a chain to serve as a witness for `eval_step_correct`. However,
|
||||
the formulation is arguably significantly cleaner, and contains
|
||||
a proof for _all_ steps right away, rather than a single one.
|
||||
|
||||
Before we go through the proof, notice that there's actually _two_
|
||||
theorems being stated here, which depend on each other. This
|
||||
is not surprising given that our semantics are given using two
|
||||
data types, `Sem_expr` and `Sem_int`, each of which contains the other.
|
||||
Regular proofs by induction, which work on only one of the data types,
|
||||
break down because we can't make claims "by induction" about the _other_
|
||||
type. For example, while going by induction on `Sem_expr`, we run into
|
||||
issues in the `e_int` case while handling \\(\\text{apply}\\). We know
|
||||
a single step -- popping the value being run from the stack. But what then?
|
||||
The rule for \\(\\text{apply}\\) in `Sem_int` contains another `Sem_expr`
|
||||
which confirms that the quoted value properly evaluates. But this other
|
||||
`Sem_expr` isn't directly a child of the "bigger" `Sem_expr`, and so we
|
||||
don't get an inductive hypothesis about it. We know nothing about it; we're stuck.
|
||||
|
||||
We therefore have two theorems declared together using `with` (just like we
|
||||
used `with` to declare `Sem_expr` and `Sem_int`). We have to prove both,
|
||||
which is once again a surprisingly easy task thanks to Coq's `auto`. Let's
|
||||
start with the first theorem, the one for expression semantics.
|
||||
|
||||
{{< codelines "Coq" "dawn/DawnEval.v" 98 107 >}}
|
||||
|
||||
We go by induction on the semantics data type. There are therefore three cases:
|
||||
intrinsics, quotes, and composition. The intrinsic case is handed right
|
||||
off to the second theorem (which we'll see momentarily). The quote case
|
||||
is very simple since quoted expressions are simply pushed onto the stack in
|
||||
a single step (we thus use `chain_final`). Finally, in the composition
|
||||
case, we have two sub-proofs, one for each expression being evaluated.
|
||||
By induction, we know that each of these sub-proofs can be turned into
|
||||
a chain, and we use `eval_chain_merge` to combine these two chains into one.
|
||||
That's it.
|
||||
|
||||
Now, let's try the second theorem. The code is even shorter:
|
||||
|
||||
{{< codelines "Coq" "dawn/DawnEval.v" 108 115 >}}
|
||||
|
||||
The first command, `inversion Hsem`, lets us go case-by-case on the various
|
||||
ways an intrinsic can be evaluated. Most intrinsics are quite boring;
|
||||
in our evaluator, they only need a single step, and their semantics
|
||||
rules guarantee that the stack has the exact kind of data that the evaluator
|
||||
expects. We dismiss such cases with `apply chain_final; auto`. The only
|
||||
time this doesn't work is when we encounter the \\(\\text{apply}\\) intrinsic;
|
||||
in that case, however, we can simply use the first theorem we proved.
|
||||
|
||||
#### A Quick Aside: Automation Using `Ltac2`
|
||||
Going in the other direction will involve lots of situations in which we _know_
|
||||
that `eval_step` evaluated to something. Here's a toy proof built around
|
||||
one such case:
|
||||
|
||||
{{< codelines "Coq" "dawn/DawnEval.v" 237 246 "hl_lines=5-8">}}
|
||||
|
||||
The proof claims that if the `swap` instruction was evaluated to something,
|
||||
then the initial stack must contain two values, and the final stack must
|
||||
have those two values on top but flipped. This is very easy to prove, since
|
||||
that's the exact behavior of `eval_step` for the `swap` intrinsic. However,
|
||||
notice how much boilerplate we're writing: lines 241 through 244 deal entirely
|
||||
with ensuring that our stack does, indeed, have two values. This is trivially true:
|
||||
if the stack _didn't_ have two values, it wouldn't evaluate to `final`, but to `error`.
|
||||
However, it takes some effort to convince Coq of this.
|
||||
|
||||
This was just for a single intrinsic. What if we're trying to prove something
|
||||
for _every_ intrinsic? Things will get messy very quickly. We can't even re-use
|
||||
our `destruct vs` code, because different intrinsics need stacks with different numbers
|
||||
of values (they have a different _arity_); if we try to handle all cases with at most
|
||||
2 values on the stack, we'd have to prove the same thing twice for simpler intrinsics.
|
||||
In short, proofs will look like a mess.
|
||||
|
||||
Things don't have to be this way, though! The boilerplate code is very repetitive, and
|
||||
this makes it a good candidate for automation. For this, Coq has `Ltac2`.
|
||||
|
||||
In short, `Ltac2` is another mini-language contained within Coq. We can use it to put
|
||||
into code the decision making that we'd be otherwise doing on our own. For example,
|
||||
here's a tactic that checks if the current proof has a hypothesis in which
|
||||
an intrinsic is evaluated to something:
|
||||
|
||||
{{< codelines "Coq" "dawn/DawnEval.v" 141 148 >}}
|
||||
|
||||
`Ltac2` has a pattern matching construct much like Coq itself does, but it can
|
||||
pattern match on Coq expressions and proof goals. When pattern matching
|
||||
on goals, we use the following notation: `[ h1 : t1, ..., hn : tn |- g ]`, which reads:
|
||||
|
||||
> Given hypotheses `h1` through `hn`, of types `t1` through `tn` respectively,
|
||||
we need to prove `g`.
|
||||
|
||||
In our pattern match, then, we're ignoring the actual thing we're supposed to prove
|
||||
(the tactic we're writing won't be that smart; its user -- ourselves -- will need
|
||||
to know when to use it). We are, however, searching for a hypothesis in the form
|
||||
`eval_step ?a (e_int ?b) = ?c`. The three variables, `?a`, `?b`, and `?c` are placeholders
|
||||
which can be matched with anything. Thus, we expect any hypothesis in which
|
||||
an intrinsic is evaluated to anything else (by the way, Coq checks all hypotheses one-by-one,
|
||||
starting at the most recent and finishing with the oldest). The code on lines 144 through 146 actually
|
||||
performs the `destruct` calls, as well as the `inversion` attempts that complete
|
||||
any proofs with an inconsistent assumption (like `err = final vs'`).
|
||||
|
||||
So how do these lines work? Well, first we need to get a handle on the hypothesis we found.
|
||||
Various tactics can manipulate the proof state, and cause hypotheses to disappear;
|
||||
by calling `Control.hyp` on `h`, we secure a more permanent hold on our evaluation assumption.
|
||||
In the next line, we call _another_ `Ltac2` function, `destruct_int_stack`, which unpacks
|
||||
the stack exactly as many times as necessary. We'll look at this function in a moment.
|
||||
Finally, we use regular old tactics. Specifically, we use `inversion` (as mentioned before).
|
||||
I use `fail` after `inversion` to avoid cluttering the proof in case there's _not_
|
||||
a contradiction.
|
||||
|
||||
On to `destruct_int_stack`. The definition is very simple:
|
||||
|
||||
{{< codelines "Coq" "dawn/DawnEval.v" 139 139 >}}
|
||||
|
||||
The main novelty is that this function takes two arguments, both of which are Coq expressions:
|
||||
`int`, or the intrinsic being evaluated, and `va`, the stack that's being analyzed. The `constr`
|
||||
type in Coq holds terms. Let's look at `int_arity` next:
|
||||
|
||||
{{< codelines "Coq" "dawn/DawnEval.v" 128 137 >}}
|
||||
|
||||
This is a pretty standard pattern match that assigns to each expression its arity. However, we're
|
||||
case analyzing over _literally any possible Coq expression_, so we need to handle the case in which
|
||||
the expression isn't actually a specific intrinsic. For this, we use `Control.throw` with
|
||||
the `Not_intrinsic` exception.
|
||||
|
||||
Wait a moment, does Coq just happen to include a `Not_intrinsic` exception? No, it does not.
|
||||
We have to register that one ourselves. In `Ltac2`, the type of exception (`exn`) is _extensible_,
|
||||
which means we can add on to it. We add just a single constructor with no arguments:
|
||||
|
||||
{{< codelines "Coq" "dawn/DawnEval.v" 118 118 >}}
|
||||
|
||||
Finally, unpacking the value stack. Here's the code for that:
|
||||
|
||||
{{< codelines "Coq" "dawn/DawnEval.v" 120 126 >}}
|
||||
|
||||
This function accepts the arity of an operation, and unpacks the stack that many times.
|
||||
`Ltac2` doesn't have a way to pattern match on numbers, so we resort to good old "less than or equal",
|
||||
and and `if`/`else`. If the arity is zero (or less than zero, since it's an integer), we don't
|
||||
need to unpack anymore. This is our base case. Otherwise, we generate two new free variables
|
||||
(we can't just hardcode `v1` and `v2`, since they may be in use elsewhere in the proof). To this
|
||||
end we use `Fresh.in_goal` and give it a "base symbol" to build on. We then use
|
||||
`destruct`, passing it our "scrutinee" (the expression being destructed) and the names
|
||||
we generated for its components. This generates multiple goals; the `Control.enter` tactic
|
||||
is used to run code for each one of these goals. In the non-empty list case, we try to break
|
||||
apart its tail as necessary by recursively calling `destruct_n`.
|
||||
|
||||
That's pretty much it! We can now use our tactic from Coq like any other. Rewriting
|
||||
our earlier proof about \\(\\text{swap}\\), we now only need to handle the valid case:
|
||||
|
||||
{{< codelines "Coq" "dawn/DawnEval.v" 248 254 >}}
|
||||
|
||||
We'll be using this new `ensure_valid_stack` tactic in our subsequent proofs.
|
||||
|
||||
#### The Backward Direction
|
||||
After our last proof before the `Ltac2` diversion, we know that our evaluator matches our semantics, but only when a `Sem_expr`
|
||||
object exists for our program. However, invalid programs don't have a `Sem_expr` object
|
||||
(there's no way to prove that they evaluate to anything). Does our evaluator behave
|
||||
properly on "bad" programs, too? We've not ruled out that our evaluator produces junk
|
||||
`final` or `middle` outputs whenever it encounters an error. We need another theorem:
|
||||
|
||||
{{< codelines "Coq" "dawn/DawnEval.v" 215 216 >}}
|
||||
|
||||
That is, if our evalutor reaches a final state, this state matches our semantics.
|
||||
This proof is most conveniently split into two pieces. The first piece says that
|
||||
if our evaluator completes in a single step, then this step matches our semantics.
|
||||
|
||||
{{< codelines "Coq" "dawn/DawnEval.v" 158 175 >}}
|
||||
|
||||
The statement for a non-`final` step is more involved; the one step by itself need
|
||||
not match the semantics, since it only carries out a part of a computation. However,
|
||||
we _can_ say that if the "rest of the program" matches the semantics, then so does
|
||||
the whole expression.
|
||||
|
||||
{{< codelines "Coq" "dawn/DawnEval.v" 177 213 >}}
|
||||
|
||||
Finally, we snap these two pieces together in `eval_step_sem_back`:
|
||||
|
||||
{{< codelines "Coq" "dawn/DawnEval.v" 215 222 >}}
|
||||
|
||||
We have now proved complete equivalence: our evaluator completes in a final state
|
||||
if and only if our semantics lead to this same final state. As a corollary of this,
|
||||
we can see that if a program is invalid (it doesn't evaluate to anything under our semantics),
|
||||
then our evaluator won't finish in a `final` state:
|
||||
|
||||
{{< codelines "Coq" "dawn/DawnEval.v" 224 229 >}}
|
||||
|
||||
### Making a Mini-REPL
|
||||
We can now be pretty confident about our evaluator. However, this whole business with using GHCi
|
||||
to run our programs is rather inconvenient; I'd rather write `[drop]` than `E_quote (E_int Drop)`.
|
||||
I'd also rather _read_ the former instead of the latter. We can do some work in Haskell to make
|
||||
playing with our interpreter more convenient.
|
||||
|
||||
#### Pretty Printing Code
|
||||
Coq generated Haskell data types that correspond precisely to the data types we defined for our
|
||||
proofs. We can use the standard type class mechanism in Haskell to define how they should be
|
||||
printed:
|
||||
|
||||
{{< codelines "Haskell" "dawn/Ucc.hs" 8 22 >}}
|
||||
|
||||
Now our expressions and values print a lot nicer:
|
||||
|
||||
```
|
||||
ghci> true
|
||||
[swap drop]
|
||||
ghci> false
|
||||
[drop]
|
||||
ghci> UccGen.or
|
||||
clone apply
|
||||
```
|
||||
|
||||
#### Reading Code In
|
||||
The Parsec library in Haskell can be used to convert text back into data structures.
|
||||
It's not too hard to create a parser for UCC:
|
||||
|
||||
{{< codelines "Haskell" "dawn/Ucc.hs" 24 44 >}}
|
||||
|
||||
Now, `parseExpression` can be used to read code:
|
||||
|
||||
```
|
||||
ghci> parseExpression "[drop] [drop]"
|
||||
Right [drop] [drop]
|
||||
```
|
||||
|
||||
#### The REPL
|
||||
We now literally have the three pieces of a read-evaluate-print loop. All that's left
|
||||
is putting them together:
|
||||
|
||||
{{< codelines "Haskell" "dawn/Ucc.hs" 53 64 >}}
|
||||
|
||||
We now have our own little verified evaluator:
|
||||
|
||||
```
|
||||
$ ghci Ucc
|
||||
$ ghc -main-is Ucc Ucc
|
||||
$ ./Ucc
|
||||
> [drop] [drop] clone apply
|
||||
[[drop]]
|
||||
> [drop] [swap drop] clone apply
|
||||
[[swap drop]]
|
||||
> [swap drop] [drop] clone apply
|
||||
[[swap drop]]
|
||||
> [swap drop] [swap drop] clone apply
|
||||
[[swap drop]]
|
||||
```
|
||||
|
||||
### Potential Future Work
|
||||
We now have a verified UCC evaluator in Haskell. What now?
|
||||
|
||||
1. We only verified the evaluation component of our REPL. In fact,
|
||||
the whole thing is technically a bit buggy due to the parser:
|
||||
```
|
||||
> [drop] drop coq is a weird name to say out loud in interviews
|
||||
[]
|
||||
```
|
||||
Shouldn't this be an error? Do you see why it's not? There's work in
|
||||
writing formally verified parsers; some time ago I saw a Galois
|
||||
talk about a [formally verified parser generator](https://galois.com/blog/2019/09/tech-talk-a-verified-ll1-parser-generator/).
|
||||
We could use this formally verified parser generator to create our parsers, and then be
|
||||
sure that our grammar is precisely followed.
|
||||
2. We could try make our evaluator a bit smarter. One thing we could definitely do
|
||||
is maker our REPL support variables. Then, we would be able to write:
|
||||
```
|
||||
> true = [swap drop]
|
||||
> false = [drop]
|
||||
> or = clone apply
|
||||
> true false or
|
||||
[swap drop]
|
||||
```
|
||||
There are different ways to go about this. One way is to extend our `expr` data type
|
||||
with a variable constructor. This would complicate the semantics (a _lot_), but it would
|
||||
allow us to prove facts about variables. Alternatively, we could implement expressions
|
||||
as syntax sugar in our parser. Using a variable would be the same as simply
|
||||
pasting in the variable's definition. This is pretty much what the Dawn article
|
||||
seems to be doing.
|
||||
3. We could prove more things. Can we confirm, once and for all, the correctness of \\(\\text{quote}_n\\),
|
||||
for _any_ \\(n\\)? Is there is a generalized way of converting inductive data types into a UCC encoding?
|
||||
Or could we perhaps formally verify the following comment from Lobsters:
|
||||
|
||||
> with the encoding of natural numbers given, n+1 contains the definition of n duplicated two times.
|
||||
This means that the encoding of n has size exponential in n, which seems extremely impractical.
|
||||
|
||||
The world's our oyster!
|
||||
|
||||
Despite all of these exciting possibilities, this is where I stop, for now. I hope you enjoyed this article,
|
||||
and thank you for reading!
|
||||
|
Loading…
Reference in New Issue
Block a user