blog-static/content/blog/01_aoc_coq.md
Danila Fedorin 189422bf1e Convert AoC Coq article to new math delimiters
Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
2024-05-13 18:16:20 -07:00

942 lines
45 KiB
Markdown

---
title: "Advent of Code in Coq - Day 8"
date: 2021-01-10T22:48:39-08:00
tags: ["Advent of Code", "Coq"]
series: "Advent of Code in Coq"
---
Huh? We're on day 8? What happened to days 2 through 7?
Well, for the most part, I didn't think they were that interesting from the Coq point of view.
Day 7 got close, but not close enough to inspire me to create a formalization. Day 8, on the other
hand, is
{{< sidenote "right" "pl-note" "quite interesting," >}}
Especially to someone like me who's interested in programming languages!
{{< /sidenote >}} and took quite some time to formalize.
As before, here's an (abridged) description of the problem:
> Given a tiny assembly-like language, determine the state of its accumulator
> when the same instruction is executed twice.
Before we start on the Coq formalization, let's talk about an idea from
Programming Language Theory (PLT), _big step operational semantics_.
### Big Step Operational Semantics
What we have in Advent of Code's Day 8 is, undeniably, a small programming language.
We are tasked with executing this language, or, in PLT lingo, defining its _semantics_.
There are many ways of doing this - at university, I've been taught of [denotational](https://en.wikipedia.org/wiki/Denotational_semantics), [axiomatic](https://en.wikipedia.org/wiki/Axiomatic_semantics),
and [operational](https://en.wikipedia.org/wiki/Operational_semantics) semantics.
I believe that Coq's mechanism of inductive definitions lends itself very well
to operational semantics, so we'll take that route. But even "operational semantics"
doesn't refer to a concrete technique - we have a choice between small-step (structural) and
big-step (natural) operational semantics. The former describe the minimal "steps" a program
takes as it's being evaluated, while the latter define the final results of evaluating a program.
I decided to go with big-step operational semantics, since they're more intutive (natural!).
So, how does one go about "[defining] the final results of evaluating a program?" Most commonly,
we go about using _inference rules_. Let's talk about those next.
#### Inference Rules
Inference rules are a very general notion. The describe how we can determine (infer) a conclusion
from a set of assumptions. It helps to look at an example. Here's a silly little inference rule:
{{< latex >}}
\frac
{\text{I'm allergic to cats} \quad \text{My friend has a cat}}
{\text{I will not visit my friend very much}}
{{< /latex >}}
It reads, "if I'm allergic to cats, and if my friend has a cat, then I will not visit my friend very much".
Here, "I'm allergic to cats" and "my friend has a cat" are _premises_, and "I will not visit my friend very much" is
a _conclusion_. An inference rule states that if all its premises are true, then its conclusion must be true.
Here's another inference rule, this time with some mathematical notation instead of words:
{{< latex >}}
\frac
{n < m}
{n + 1 < m + 1}
{{< /latex >}}
This one reads, "if \(n\) is less than \(m\), then \(n+1\) is less than \(m+1\)". We can use inference
rules to define various constructs. As an example, let's define what it means for a natural number to be even.
It takes two rules:
{{< latex >}}
\frac
{}
{0 \; \text{is even}}
\quad
\frac
{n \; \text{is even}}
{n+2 \; \text{is even}}
{{< /latex >}}
First of all, zero is even. We take this as fact - there are no premises for the first rule, so they
are all trivially true. Next, if a number is even, then adding 2 to that number results in another
even number. Using the two of these rules together, we can correctly determine whether any number
is or isn't even. We start knowing that 0 is even. Adding 2 we learn that 2 is even, and adding 2
again we see that 4 is even, as well. We can continue this to determine that 6, 8, 10, and so on
are even too. Never in this process will we visit the numbers 1 or 3 or 5, and that's good - they're not even!
Let's now extend this notion to programming languages, starting with a simple arithmetic language.
This language is made up of natural numbers and the \(\square\) operation, which represents the addition
of two numbers. Again, we need two rules:
{{< latex >}}
\frac
{n \in \mathbb{N}}
{n \; \text{evaluates to} \; n}
\quad
\frac
{e_1 \; \text{evaluates to} \; n_1 \quad e_2 \; \text{evaluates to} \; n_2}
{e_1 \square e_2 \; \text{evaluates to} \; n_1 + n_2}
{{< /latex >}}
First, let me explain myself. I used \(\square\) to demonstrate two important points. First, languages can be made of
any kind of characters we want; it's the rules that we define that give these languages meaning.
Second, while \(\square\) is the addition operation _in our language_, \(+\) is the _mathematical addition operator_.
They are not the same - we use the latter to define how the former works.
Finally, writing "evaluates to" gets quite tedious, especially for complex languages. Instead,
PLT people use notation to make their semantics more concise. The symbol \(\Downarrow\) is commonly
used to mean "evaluates to"; thus, \(e \Downarrow v\) reads "the expression \(e\) evaluates to the value \(v\).
Using this notation, our rules start to look like the following:
{{< latex >}}
\frac
{n \in \mathbb{N}}
{n \Downarrow n}
\quad
\frac
{e_1 \Downarrow n_1 \quad e_2 \Downarrow n_2}
{e_1 \square e_2 \Downarrow n_1 + n_2}
{{< /latex >}}
If nothing else, these are way more compact! Though these may look intimidating at first, it helps to
simply read each symbol as its English meaning.
#### Encoding Inference Rules in Coq
Now that we've seen what inference rules are, we can take a look at how they can be represented in Coq.
We can use Coq's `Inductive` mechanism to define the rules. Let's start with our "is even" property.
```Coq
Inductive is_even : nat -> Prop :=
| zero_even : is_even 0
| plustwo_even : is_even n -> is_even (n+2).
```
The first line declares the property `is_even`, which, given a natural number, returns proposition.
This means that `is_even` is not a proposition itself, but `is_even 0`, `is_even 1`, and `is_even 2`
are all propositions.
The following two lines each encode one of our aforementioned inference rules. The first rule, `zero_even`,
is of type `is_even 0`. The `zero_even` rule doesn't require any arguments, and we can use it to create
a proof that 0 is even. On the other hand, the `plustwo_even` rule _does_ require an argument, `is_even n`.
To construct a proof that a number `n+2` is even using `plustwo_even`, we need to provide a proof
that `n` itself is even. From this definition we can see a general principle: we encode each inference
rule as constructor of an inductive Coq type. Each rule encoded in this manner takes as arguments
the proofs of its premises, and returns a proof of its conclusion.
For another example, let's encode our simple addition language. First, we have to define the language
itself:
```Coq
Inductive tinylang : Type :=
| number (n : nat) : tinylang
| box (e1 e2 : tinylang) : tinylang.
```
This defines the two elements of our example language: `number n` corresponds to \(n\), and `box e1 e2` corresponds
to \(e_1 \square e_2\). Finally, we define the inference rules:
```Coq {linenos=true}
Inductive tinylang_sem : tinylang -> nat -> Prop :=
| number_sem : forall (n : nat), tinylang_sem (number n) n
| box_sem : forall (e1 e2 : tinylang) (n1 n2 : nat),
tinylang_sem e1 n1 -> tinylang_sem e2 n2 ->
tinylang_sem (box e1 e2) (n1 + n2).
```
When we wrote our rules earlier, by using arbitrary variables like \(e_1\) and \(n_1\), we implicitly meant
that our rules work for _any_ number or expression. When writing Coq we have to make this assumption explicit
by using `forall`. For instance, the rule on line 2 reads, "for any number `n`, the expression `n` evaluates to `n`".
#### Semantics of Our Language
We've now written some example big-step operational semantics, both "on paper" and in Coq. Now, it's time to take a look at
the specific semantics of the language from Day 8! Our language consists of a few parts.
First, there are three opcodes: \(\texttt{jmp}\), \(\texttt{nop}\), and \(\texttt{add}\). Opcodes, combined
with an integer, make up an instruction. For example, the instruction \(\texttt{add} \; 3\) will increase the
content of the accumulator by three. Finally, a program consists of a sequence of instructions; They're separated
by newlines in the puzzle input, but we'll instead separate them by semicolons. For example, here's a complete program.
{{< latex >}}
\texttt{add} \; 0; \; \texttt{nop} \; 2; \; \texttt{jmp} \; -2
{{< /latex >}}
Now, let's try evaluating this program. Starting at the beginning and with 0 in the accumulator,
it will add 0 to the accumulator (keeping it the same),
do nothing, and finally jump back to the beginning. At this point, it will try to run the addition instruction again,
which is not allowed; thus, the program will terminate.
Did you catch that? The semantics of this language will require more information than just our program itself (which we'll denote by \(p\)).
* First, to evaluate the program we will need a program counter, \(\textit{c}\). This program counter
will tell us the position of the instruction to be executed next. It can also point past the last instruction,
which means our program terminated successfully.
* Next, we'll need the accumulator \(a\). Addition instructions can change the accumulator, and we will be interested
in the number that ends up in the accumulator when our program finishes executing.
* Finally, and more subtly, we'll need to keep track of the states we visited. For instance,
in the course of evaluating our program above, we encounter the \((c, a)\) pair of \((0, 0)\) twice: once
at the beginning, and once at the end. However, whereas at the beginning we have not yet encountered the addition
instruction, at the end we have, so the evaluation behaves differently. To make the proofs work better in Coq,
we'll use a set \(v\) of
{{< sidenote "right" "allowed-note" "allowed (valid) program counters (as opposed to visited program counters)." >}}
Whereas the set of "visited" program counters keeps growing as our evaluation continues,
the set of "allowed" program counters keeps shrinking. Because the "allowed" set never stops shrinking,
assuming we're starting with a finite set, our execution will eventually terminate.
{{< /sidenote >}}
Now we have all the elements of our evaluation. Let's define some notation. A program starts at some state,
and terminates in another, possibly different state. In the course of a regular evaluation, the program
never changes; only the state does. So I propose this (rather unorthodox) notation:
{{< latex >}}
(c, a, v) \Rightarrow_p (c', a', v')
{{< /latex >}}
This reads, "after starting at program counter \(c\), accumulator \(a\), and set of valid addresses \(v\),
the program \(p\) terminates with program counter \(c'\), accumulator \(a'\), and set of valid addresses \(v'\)".
Before creating the inference rules for this evaluation relation, let's define the effect of evaluating a single
instruction, using notation \((c, a) \rightarrow_i (c', a')\). An addition instruction changes the accumulator,
and increases the program counter by 1.
{{< latex >}}
\frac{}
{(c, a) \rightarrow_{\texttt{add} \; n} (c+1, a+n)}
{{< /latex >}}
A no-op instruction does even less. All it does is increment the program counter.
{{< latex >}}
\frac{}
{(c, a) \rightarrow_{\texttt{nop} \; n} (c+1, a)}
{{< /latex >}}
Finally, a jump instruction leaves the accumulator intact, but adds a number to the program counter itself!
{{< latex >}}
\frac{}
{(c, a) \rightarrow_{\texttt{jmp} \; n} (c+n, a)}
{{< /latex >}}
None of these rules have any premises, and they really are quite simple. Now, let's define the rules
for evaluating a program. First of all, a program starting in a state that is not considered "valid"
is done evaluating, and is in a "failed" state.
{{< latex >}}
\frac{c \not \in v \quad c \not= \text{length}(p)}
{(c, a, v) \Rightarrow_{p} (c, a, v)}
{{< /latex >}}
We use \(\text{length}(p)\) to represent the number of instructions in \(p\). Note the second premise:
even if our program counter \(c\) is not included in the valid set, if it's "past the end of the program",
the program terminates in an "ok" state.
{{< sidenote "left" "avoid-c-note" "Here's a rule for terminating in the \"ok\" state:" >}}
In the presented rule, we don't use the variable <code>c</code> all that much, and we know its concrete
value (from the equality premise). We could thus avoid introducing the name \(c\) by
replacing it with said known value:
{{< latex >}}
\frac{}
{(\text{length}(p), a, v) \Rightarrow_{p} (\text{length}(p), a, v)}
{{< /latex >}}
This introduces some duplication, but that is really because all "base case" evaluation rules
start and stop in the same state. To work around this, we could define a separate proposition
to mean "program \(p\) is done in state \(s\)", then \(s\) will really only need to occur once,
and so will \(\text{length}(p)\). This is, in fact, what we will do later on,
since being able to talk abut "programs being done" will help us with
components of our proof.
{{< /sidenote >}}
{{< latex >}}
\frac{c = \text{length}(p)}
{(c, a, v) \Rightarrow_{p} (c, a, v)}
{{< /latex >}}
When our program counter reaches the end of the program, we are also done evaluating it. Even though
both rules {{< sidenote "right" "redundant-note" "lead to the same conclusion," >}}
In fact, if the end of the program is never included in the valid set, the second rule is completely redundant.
{{< /sidenote >}}
it helps to distinguish the two possible outcomes. Finally, if neither of the termination conditions are met,
our program can take a step, and continue evaluating from there.
{{< latex >}}
\frac{c \in v \quad p[c] = i \quad (c, a) \rightarrow_i (c', a') \quad (c', a', v - \{c\}) \Rightarrow_p (c'', a'', v'')}
{(c, a, v) \Rightarrow_{p} (c'', a'', v'')}
{{< /latex >}}
This is quite a rule. A lot of things need to work out for a program to evauate from a state that isn't
currently the final state:
* The current program counter \(c\) must be valid. That is, it must be an element of \(v\).
* This program counter must correspond to an instruction \(i\) in \(p\), which we write as \(p[c] = i\).
* This instruction must be executed, changing our program counter from \(c\) to \(c'\) and our
accumulator from \(a\) to \(a'\). The set of valid instructions will no longer include \(c\),
and will become \(v - \{c\}\).
* Our program must then finish executing, starting at state
\((c', a', v - \{c\})\), and ending in some (unknown) state \((c'', a'', v'')\).
If all of these conditions are met, our program, starting at \((c, a, v)\), will terminate in the state \((c'', a'', v'')\). This third rule completes our semantics; a program being executed will keep running instructions using the third rule, until it finally
hits an invalid program counter (terminating with the first rule) or gets to the end of the program (terminating with the second rule).
#### Aside: Vectors and Finite \\(\\mathbb{N}\\)
We'll be getting to the Coq implementation of our semantics soon, but before we do:
what type should \(c\) be? It's entirely possible for an instruction like \(\texttt{jmp} \; -10000\)
to throw our program counter way before the first instruction of our program, so at first, it seems
as though we should use an integer. But the prompt doesn't even specify what should happen in this
case - it only says an instruction shouldn't be run twice. The "valid set", although it may help resolve
this debate, is our invention, and isn't part of the original specification.
There is, however, something we can infer from this problem. Since the problem of jumping "too far behind" or
"too far ahead" is never mentioned, we can assume that _all jumps will lead either to an instruction,
or right to the end of a program_. This means that \(c\) is a natural number, with
{{< latex >}}
0 \leq c \leq \text{length}(p)
{{< /latex >}}
In a language like Coq, it's possible to represent such a number. Since we've gotten familliar with
inference rules, let's present two rules that define such a number:
{{< latex >}}
\frac
{n \in \mathbb{N}^+}
{Z : \text{Fin} \; n}
\quad
\frac
{f : \text{Fin} \; n}
{S f : \text{Fin} \; (n+1)}
{{< /latex >}}
This is a variation of the [Peano encoding](https://wiki.haskell.org/Peano_numbers) of natural numbers.
It reads as follows: zero (\(Z\)) is a finite natural number less than any positive natural number \(n\). Then, if a finite natural number
\(f\) is less than \(n\), then adding one to that number (using the successor function \(S\))
will create a natural number less than \(n+1\). We encode this in Coq as follows
([originally from here](https://coq.inria.fr/library/Coq.Vectors.Fin.html#t)):
```Coq
Inductive t : nat -> Set :=
| F1 : forall {n}, t (S n)
| FS : forall {n}, t n -> t (S n).
```
The `F1` constructor here is equivalent to our \(Z\), and `FS` is equivalent to our \(S\).
To represent positive natural numbers \(\mathbb{N}^+\), we simply take a regular natural
number from \(\mathbb{N}\) and find its successor using `S` (simply adding 1). Again, we have
to explicitly use `forall` in our type signatures.
We can use a similar technique to represent a list with a known number of elements, known
in the Idris and Coq world as a vector. Again, we only need two inference rules to define such
a vector:
{{< latex >}}
\frac
{t : \text{Type}}
{[] : \text{Vec} \; t \; 0}
\quad
\frac
{x : \text{t} \quad \textit{xs} : \text{Vec} \; t \; n}
{(x::\textit{xs}) : \text{Vec} \; t \; (n+1)}
{{< /latex >}}
These rules read: the empty list \([]\) is zero-length vector of any type \(t\). Then,
if we take an element \(x\) of type \(t\), and an \(n\)-long vector \(\textit{xs}\) of \(t\),
then we can prepend \(x\) to \(\textit{xs}\) and get an \((n+1)\)-long vector of \(t\).
In Coq, we write this as follows ([originally from here](https://coq.inria.fr/library/Coq.Vectors.VectorDef.html#t)):
```Coq
Inductive t A : nat -> Type :=
| nil : t A 0
| cons : forall (h:A) (n:nat), t A n -> t A (S n).
```
The `nil` constructor represents the empty list \([]\), and `cons` represents
the operation of prepending an element (called `h` in the code and \(x\) in our inference rules)
to another vector of length \(n\), which remains unnamed in the code but is called \(\textit{xs}\) in our rules.
These two definitions work together quite well. For instance, suppose we have a vector of length \(n\).
If we were to access its elements by indices starting at 0, we'd be allowed to access indices 0 through \(n-1\).
These are precisely the values of the finite natural numbers less than \(n\), \(\text{Fin} \; n \).
Thus, given such an index \(\text{Fin} \; n\) and a vector \(\text{Vec} \; t \; n\), we are guaranteed
to be able to retrieve the element at the given index! In our code, we will not have to worry about bounds checking.
Of course, if our program has \(n\) elements, our program counter will be a finite number less than \(n+1\),
since there's always the possibility of it pointing past the instructions, indicating that we've finished
running the program. This leads to some minor complications: we can't safely access the program instruction
at index \(\text{Fin} \; (n+1)\). We can solve this problem by considering two cases:
either our index points one past the end of the program (in which case its value is exactly the finite
representation of \(n\)), or it's less than \(n\), in which case we can "tighten" the upper bound,
and convert that index into a \(\text{Fin} \; n\). We formalize it in a lemma:
{{< codelines "Coq" "aoc-2020/day8.v" 80 82 >}}
There's a little bit of a gotcha here. Instead of translating our above statement literally,
and returning a value that's the result of "tightening" our input `f`, we return a value
`f'` that can be "weakened" to `f`. This is because "tightening" is not a total function -
it's not always possible to convert a \(\text{Fin} \; (n+1)\) into a \(\text{Fin} \; n\).
However, "weakening" \(\text{Fin} \; n\) _is_ a total function, since a number less than \(n\)
is, by the transitive property of a total order, also less than \(n+1\).
The Coq proof for this claim is as follows:
{{< codelines "Coq" "aoc-2020/day8.v" 88 97 >}}
The `Fin.rectS` function is a convenient way to perform inductive proofs over
our finite natural numbers. Informally, our proof proceeds as follows:
* If the current finite natural number is zero, take a look at the "bound" (which
we assume is nonzero, since there isn't a natural number less than zero).
* If this "bounding number" is one, our `f` can't be tightened any further,
since doing so would create a number less than zero. Fortunately, in this case,
`n` must be `0`, so `f` is the finite representation of `n`.
* Otherwise, `f` is most definitely a weakened version of another `f'`,
since the tightest possible type for zero has a "bounding number" of one, and
our "bounding number" is greater than that. We return a tighter version of our finite zero.
* If our number is a successor of another finite number, we check if that other number
can itself be tightened.
* If it can't be tightened, then our smaller number is a finite representation of
`n-1`. This, in turn, means that adding one to it will be the finite representation
of `n` (if \(x\) is equal to \(n-1\), then \(x+1\) is equal to \(n\)).
* If it _can_ be tightened, then so can the successor (if \(x\) is less
than \(n-1\), then \(x+1\) is less than \(n\)).
Next, let's talk about addition, specifically the kind of addition done by the \(\texttt{jmp}\) instruction.
We can always add an integer to a natural number, but we can at best guarantee that the result
will be an integer. For instance, we can add `-1000` to `1`, and get `-999`, which is _not_ a natural
number. We implement this kind of addition in a function called `jump_t`:
{{< codelines "Coq" "aoc-2020/day8.v" 56 56 >}}
At the moment, its definition is not particularly important. What is important, though,
is that it takes a bounded natural number `pc` (our program counter), an integer `off`
(the offset provided by the jump instruction) and returns another integer representing
the final offset. Why are integers of type `t`? Well, it so happens
that Coq provides facilities for working with arbitrary implementations of integers,
without relying on how they are implemented under the hood. This can be seen in its
[`Coq.ZArith.Int`](https://coq.inria.fr/library/Coq.ZArith.Int.html) module,
which describes what functions and types an implementation of integers should provide.
Among those is `t`, the type of an integer in such an arbitrary implementation. We too
will not make an assumption about how the integers are implemented, and simply
use this generic `t` from now on.
Now, suppose we wanted to write a function that _does_ return a valid program
counter after adding the offset to it. Since it's possible for this function to fail
(for instance, if the offset is very negative), it has to return `option (fin (S n))`.
That is, this function may either fail (returning `None`) or succeed, returning
`Some f`, where `f` is of type `fin (S n)`, aka \(\text{Fin} \; (n + 1)\). Here's
the function in Coq (again, don't worry too much about the definition):
{{< codelines "Coq" "aoc-2020/day8.v" 61 61 >}}
We will make use of this function when we define and verify our semantics.
Let's take a look at that next.
#### Semantics in Coq
Now that we've seen finite sets and vectors, it's time to use them to
encode our semantics in Coq. Before we do anything else, we need
to provide Coq definitions for the various components of our
language, much like what we did with `tinylang`. We can start with opcodes:
{{< codelines "Coq" "aoc-2020/day8.v" 20 23 >}}
Now we can define a few other parts of our language and semantics, namely
states, instructions and programs (which I called "inputs" since, we'll, they're
our puzzle input). A state is simply the 3-tuple of the program counter, the set
of valid program counters, and the accumulator. We write it as follows:
{{< codelines "Coq" "aoc-2020/day8.v" 33 33 >}}
The star `*` is used here to represent a [product type](https://en.wikipedia.org/wiki/Product_type)
rather than arithmetic multiplication. Our state type accepts an argument,
`n`, much like a finite natural number or a vector. In fact, this `n` is passed on
to the state's program counter and set types. Rightly, a state for a program
of length \(n\) will not be of the same type as a state for a program of length \(n+1\).
An instruction is also a tuple, but this time containing only two elements: the opcode and
the number. We write this as follows:
{{< codelines "Coq" "aoc-2020/day8.v" 36 36 >}}
Finally, we have to define the type of a program. This type will also be
indexed by `n`, the program's length. A program of length `n` is simply a
vector of instructions `inst` of length `n`. This leads to the following
definition:
{{< codelines "Coq" "aoc-2020/day8.v" 38 38 >}}
So far, so good! Finally, it's time to get started on the semantics themselves.
We begin with the inductive definition of \((\rightarrow_i)\).
I think this is fairly straightforward. However, we do use
`t` instead of \(n\) from the rules, and we use `FS`
instead of \(+1\). Also, we make the formerly implicit
assumption that \(c+n\) is valid explicit, by
providing a proof that `valid_jump_t pc t = Some pc'`.
{{< codelines "Coq" "aoc-2020/day8.v" 103 110 >}}
Next, it will help us to combine the premises for
"failed" and "ok" terminations into Coq data types.
This will make it easier for us to formulate a lemma later on.
Here are the definitions:
{{< codelines "Coq" "aoc-2020/day8.v" 112 117 >}}
Since all of out "termination" rules start and
end in the same state, there's no reason to
write that state twice. Thus, both `done`
and `stuck` only take the input `inp`,
and the state, which includes the accumulator
`acc`, the set of allowed program counters `v`, and
the program counter at which the program came to an end.
When the program terminates successfully, this program
counter will be equal to the length of the program `n`,
so we use `nat_to_fin n`. On the other hand, if the program
terminates in as stuck state, it must be that it terminated
at a program counter that points to an instruction. Thus, this
program counter is actually a \(\text{Fin} \; n\), and not
a \(\text{Fin} \ (n+1)\), and is not in the set of allowed program counters.
We use the same "weakening" trick we saw earlier to represent
this.
Finally, we encode the three inference rules we came up with:
{{< codelines "Coq" "aoc-2020/day8.v" 119 126 >}}
Notice that we fused two of the premises in the last rule.
Instead of naming the instruction at the current program
counter (by writing \(p[c] = i\)) and using it in another premise, we simply use
`nth inp pc`, which corresponds to \(p[c]\) in our
"paper" semantics.
Before we go on writing some actual proofs, we have
one more thing we have to address. Earlier, we said:
> All jumps will lead either to an instruction, or right to the end of a program.
To make Coq aware of this constraint, we'll have to formalize it. To
start off, we'll define the notion of a "valid instruction", which is guaranteed
to keep the program counter in the correct range.
There are a couple of ways to do this, but we'll use yet another definition based
on inference rules. First, though, observe that the same instruction may be valid
for one program, and invalid for another. For instance, \(\texttt{jmp} \; 100\)
is perfectly valid for a program with thousands of instructions, but if it occurs
in a program with only 3 instructions, it will certainly lead to disaster. Specifically,
the validity of an instruction depends on the length of the program in which it resides,
and the program counter at which it's encountered.
Thus, we refine our idea of validity to "being valid for a program of length \(n\) at program counter \(f\)".
For this, we can use the following two inference rules:
{{< latex >}}
\frac
{c : \text{Fin} \; n}
{\texttt{add} \; t \; \text{valid for} \; n, c }
\quad
\frac
{c : \text{Fin} \; n \quad o \in \{\texttt{nop}, \texttt{jmp}\} \quad J_v(c, t) = \text{Some} \; c' }
{o \; t \; \text{valid for} \; n, c }
{{< /latex >}}
The first rule states that if a program has length \(n\), then \(\texttt{add}\) is valid
at any program counter whose value is less than \(n\). This is because running
\(\texttt{add}\) will increment the program counter \(c\) by 1,
and thus, create a new program counter that's less than \(n+1\),
which, as we discussed above, is perfectly valid.
The second rule works for the other two instructions. It has an extra premise:
the result of `jump_valid_t` (written as \(J_v\)) has to be \(\text{Some} \; c'\),
that is, `jump_valid_t` must succeed. Note that we require this even for no-ops,
since it later turns out that one of the them may be a jump after all.
We now have our validity rules. If an instruction satisfies them for a given program
and at a given program counter, evaluating it will always result in a program counter that has a proper value.
We encode the rules in Coq as follows:
{{< codelines "Coq" "aoc-2020/day8.v" 152 157 >}}
Note that we have three rules instead of two. This is because we "unfolded"
\(o\) from our second rule: rather than using set notation (or "or"), we
just generated two rules that vary in nothing but the operation involved.
Of course, we must have that every instruction in a program is valid.
We don't really need inference rules for this, as much as a "forall" quantifier.
A program \(p\) of length \(n\) is valid if the following holds:
{{< latex >}}
\forall (c : \text{Fin} \; n). p[c] \; \text{valid for} \; n, c
{{< /latex >}}
That is, for every possible in-bounds program counter \(c\),
the instruction at the program counter is valid. We can now
encode this in Coq, too:
{{< codelines "Coq" "aoc-2020/day8.v" 160 161 >}}
In the above, `n` is made implicit where possible.
Since \(c\) (called `pc` in the code) is of type \(\text{Fin} \; n\), there's no
need to write \(n\) _again_. The curly braces tell Coq to infer that
argument where possible.
### Proving Termination
Here we go! It's finally time to make some claims about our
definitions. Who knows - maybe we wrote down total garbage!
We will be creating several related lemmas and theorems.
All of them share two common assumptions:
* We have some valid program `inp` of length `n`.
* This program is a valid input, that is, `valid_input` holds for it.
There's no sense in arguing based on an invalid input program.
We represent these grouped assumptions by opening a Coq
`Section`, which we call `ValidInput`, and listing our assumptions:
{{< codelines "Coq" "aoc-2020/day8.v" 163 166 >}}
We had to also explicitly mention the length `n` of our program.
From now on, the variables `n`, `inp`, and `Hv` will be
available to all of the proofs we write in this section.
The first proof is rather simple. The claim is:
> For our valid program, at any program counter `pc`
and accumulator `acc`, there must exist another program
counter `pc'` and accumulator `acc'` such that the
instruction evaluation relation \((\rightarrow_i)\)
connects the two. That is, valid addresses aside,
we can always make a step.
Here is this claim encoded in Coq:
{{< codelines "Coq" "aoc-2020/day8.v" 168 169 >}}
We start our proof by introducing all the relevant variables into
the global context. I've mentioned this when I wrote about
day 1, but here's the gist: the `intros` keyword takes
variables from a `forall`, and makes them concrete.
In short, `intros x` is very much like saying "suppose
we have an `x`", and going on with the proof.
{{< codelines "Coq" "aoc-2020/day8.v" 170 171 >}}
Here, we said "take any program counter `pc` and any
accumulator `acc`". Now what? Well, first of all,
we want to take a look at the instruction at the current
`pc`. We know that this instruction is a combination
of an opcode and a number, so we use `destruct` to get
access to both of these parts:
{{< codelines "Coq" "aoc-2020/day8.v" 172 172 >}}
Now, Coq reports the following proof state:
```
1 subgoal
n : nat
inp : input n
Hv : valid_input inp
pc : Fin.t n
acc : t
o : opcode
t0 : t
Hop : nth inp pc = (o, t0)
========================= (1 / 1)
exists (pc' : fin (S n)) (acc' : t),
step_noswap (o, t0) (pc, acc) (pc', acc')
```
We have some opcode `o`, and some associated number
`t0`, and we must show that there exist a `pc'`
and `acc'` to which we can move on. To prove
that something exists in Coq, we must provide
an instance of that "something". If we claim
that there exists a dog that's not a good boy,
we better have this elusive creature in hand.
In other words, proofs in Coq are [constructive](https://en.wikipedia.org/wiki/Constructive_proof).
Without knowing the kind of operation we're dealing with, we can't
say for sure how the step will proceed. Thus, we proceed by
case analysis on `o`.
{{< codelines "Coq" "aoc-2020/day8.v" 173 173 >}}
There are three possible cases we have to consider,
one for each type of instruction.
* If the instruction is \(\texttt{add}\), we know
that `pc' = pc + 1` and `acc' = acc + t0`. That is,
the program counter is simply incremented, and the accumulator
is modified with the number part of the instruction.
* If the instruction is \(\texttt{nop}\), the program
coutner will again be incremented (`pc' = pc + 1`),
but the accumulator will stay the same, so `acc' = acc`.
* If the instruction is \(\texttt{jmp}\), things are
more complicated. We must rely on the assumption
that our input is valid, which tells us that adding
`t0` to our `pc` will result in `Some f`, and not `None`.
Given this, we have `pc' = f`, and `acc' = acc`.
This is how these three cases are translated to Coq:
{{< codelines "Coq" "aoc-2020/day8.v" 174 177 >}}
For the first two cases, we simply provide the
values we expect for `pc'` and `acc'`, and
apply the corresponding inference rule that
is satisfied by these values. For the third case, we have
to invoke `Hv`, the hypothesis that our input is valid.
In particular, we care about the instruction at `pc`,
so we use `specialize` to plug `pc` into the more general
hypothesis. We then replace `nth inp pc` with its known
value, `(jmp, t0)`. This tells us the following, in Coq's words:
```
Hv : valid_inst (jmp, t0) pc
```
That is, `(jmp, t0)` is a valid instruction at `pc`. Then, using
Coq's `inversion` tactic, we ask: how is this possible? There is
only one inference rule that gives us such a conclusion, and it is named `valid_inst_jmp`
in our Coq code. Since we have a proof that our `jmp` is valid,
it must mean that this rule was used. Furthermore, since this
rule requires that `valid_jump_t` evaluates to `Some f'`, we know
that this must be the case here! Coq now has adds the following
two lines to our proof state:
```
f' : fin (S n)
H0 : valid_jump_t pc t0 = Some f'
```
Finally, we specify, as mentioned earlier, that `pc' = f'` and `acc' = acc`.
As before, we apply the corresponding step rule for `jmp`. When it asks
for a proof that `valid_jump_t` produces a valid program counter,
we hand it `H0` using `apply H0`. And with that, Coq is happy!
Next, we prove a claim that a valid program can always do _something_,
and that something is one of three things:
* It can terminate in the "ok" state if the program counter
reaches the programs' end.
* It can terminate with an error if it's currently at a program
counter that is not included in the valid set.
* Otherwise, it can run the current instruction and advance
to a "next" state.
Alternatively, we could say that one of the inference rules
for \((\Rightarrow_p)\) must apply. This is not the case if the input
is not valid, since, as I said
before, an arbitrary input program can lead us to jump
to a negative address (or to an address _way_ past the end of the program).
Here's the claim, translated to Coq:
{{< codelines "Coq" "aoc-2020/day8.v" 181 186 >}}
Informally, we can prove this as follows:
* If the current program counter is equal to the length
of the program, we've reached the end. Thus, the program
can terminate in the "ok" state.
* Otherwise, the current program counter must be
less than the length of the program.
* If we've already encountered this program counter (that is,
if it's gone from the set of valid program counters),
then the program will terminate in the "error" state.
* Otherwise, the program counter is in the set of
valid instructions. By our earlier theorem, in a valid
program, the instruction at any program counter can be correctly
executed, taking us to the next state. Now too
our program can move to this next state.
Below is the Coq translation of the above.
{{< codelines "Coq" "aoc-2020/day8.v" 187 203 >}}
It doesn't seem like we're that far from being done now.
A program can always take a step, and each time it does,
the set of valid program counters decreases in size. Eventually,
this set will become empty, so if nothing else, our program will
eventually terminate in an "error" state. Thus, it will stop
running no matter what.
This seems like a task for induction, in this case on the size
of the valid set. In particular, strong mathematical induction
{{< sidenote "right" "strong-induction-note" "seem to work best." >}}
Why strong induction? If we remove a single element from a set,
its size should decrease strictly by 1. Thus, why would we need
to care about sets of <em>all</em> sizes less than the current
set's size?<br>
<br>
Unfortunately, we're not working with purely mathematical sets.
Coq's default facility for sets is simply a layer on top
of good old lists, and makes no effort to be "correct by construction".
It is thus perfectly possible to have a "set" which inlcudes an element
twice. Depending on the implementation of <code>set_remove</code>,
we may end up removing the repeated element multiple times, thereby
shrinking the length of our list by more than 1. I'd rather
not worry about implementation details like that.
{{< /sidenote >}}
Someone on StackOverflow [implemented this](https://stackoverflow.com/questions/45872719/how-to-do-induction-on-the-length-of-a-list-in-coq),
so I'll just use it. The Coq theorem corresonding to strong induction
on the length of a list is as follows:
{{< codelines "Coq" "aoc-2020/day8.v" 205 207 >}}
It reads,
> If for some list `l`, the property `P` holding for all lists
shorter than `l` means that it also holds for `l` itself, then
`P` holds for all lists.
This is perhaps not particularly elucidating. We can alternatively
think of this as trying to prove some property for all lists `l`.
We start with all empty lists. Here, we have nothing else to rely
on; there are no lists shorter than the empty list, and our property
must hold for all empty lists. Then, we move on to proving
the property for all lists of length 1, already knowing that it holds
for all empty lists. Once we're done there, we move on to proving
that `P` holds for all lists of length 2, now knowing that it holds
for all empty lists _and_ all lists of length 1. We continue
doing this, eventually covering lists of any length.
Before proving termination, there's one last thing we have to
take care off. Coq's standard library does not come with
a proof that removing an element from a set makes it smaller;
we have to provide it ourselves. Here's the claim encoded
in Coq:
{{< codelines "Coq" "aoc-2020/day8.v" 217 219 >}}
This reads, "if a set `s` contains a finite natural
number `f`, removing `f` from `s` reduces the set's size".
The details of the proof are not particularly interesting,
and I hope that you understand intuitively why this is true.
Finally, we make our termination claim.
{{< codelines "Coq" "aoc-2020/day8.v" 230 231 >}}
It's quite a strong claim - given _any_ program counter,
set of valid addresses, and accumulator, a valid input program
will terminate. Let's take a look at the proof.
{{< codelines "Coq" "aoc-2020/day8.v" 232 234 >}}
We use `intros` again. However, it brings in variables
in order, and we really only care about the _second_ variable.
We thus `intros` the first two, and then "put back" the first
one using `generalize dependent`. Then, we proceed by
induction on length, as seen above.
{{< codelines "Coq" "aoc-2020/day8.v" 235 236>}}
Now we're in the "inductive step". Our inductive hypothesis
is that any set of valid addresses smaller than the current one will
guarantee that the program will terminate. We must show
that using our set, too, will guarantee termination. We already
know that a valid input, given a state, can have one of three
possible outcomes: "ok" termination, "failed" termination,
or a "step". We use `destruct` to take a look at each of these
in turn. The first two cases ("ok" termination and "failed" termination)
are fairly trivial:
{{< codelines "Coq" "aoc-2020/day8.v" 237 240 >}}
We basically connect the dots between the premises (in a form like `done`)
and the corresponding inference rule (`run_noswap_ok`). The more
interesting case is when we can take a step.
{{< codelines "Coq" "aoc-2020/day8.v" 241 253 >}}
Since we know we can take a step, we know that we'll be removing
the current program counter from the set of valid addresses. This
set must currently contain the present program counter (since otherwise
we'd have "failed"), and thus will shrink when we remove it. This,
in turn, lets us use the inductive hypothesis: it tells us that no matter the
program counter or accumulator, if we start with this new "shrunk"
set, we will terminate in some state. Coq's constructive
nature helps us here: it doesn't just tells us that there is some state
in which we terminate - it gives us that state! We use `edestruct` to get
a handle on this final state, which Coq automatically names `x`. At this
time Coq still isn't convinced that our new set is smaller, so we invoke
our earlier `set_remove_length` theorem to placate it.
We now have all the pieces: we know that we can take a step, removing
the current program counter from our current set. We also know that
with that newly shrunken set, we'll terminate in some final state `x`.
Thus, all that's left to say is to apply our "step" rule. It asks
us for three things:
1. That the current program counter is in the set. We've long since
established this, and `auto` takes care of that.
2. That a step is possible. We've already established this, too,
since we're in the "can take a step" case. We apply `Hst`,
the hypothesis that confirms that we can, indeed, step.
3. That we terminate after this. The `x` we got
from our induction hypothesis came with a proof that
running with the "next" program counter and accumulator
will result in termination. We apply this proof, automatically
named `H0` by Coq.
And that's it! We've proved that a program terminates no matter what.
This has also (almost!) given us a solution to part 1. Consider the case
in which we start with program counter 0, accumulator 0, and the "full"
set of allowed program counters. Since our proof works for _all_ configurations,
it will also work for this one. Furthermore, since Coq proofs are constructive,
this proof will __return to us the final program counter and accumulator!__
This is precisely what we'd need to solve part 1.
But wait, almost? What's missing? We're missing a few implementation details:
* We've not provided a concrete impelmentation of integers. The simplest
thing to do here would be to use [`Coq.ZArith.BinInt`](https://coq.inria.fr/library/Coq.ZArith.BinInt.html),
for which there is a module [`Z_as_Int`](https://coq.inria.fr/library/Coq.ZArith.Int.html#Z_as_Int)
that provides `t` and friends.
* We assumed (reasonably, I would say) that it's possible to convert a natural
number to an integer. If we're using the aforementioned `BinInt` module,
we can use [`Z.of_nat`](https://coq.inria.fr/library/Coq.ZArith.BinIntDef.html#Z.of_nat).
* We also assumed (still reasonably) that we can try convert an integer
back to a finite natural number, failing if it's too small or too large.
There's no built-in function for this, but `Z`, for one, distinguishes
between the "positive", "zero", and "negative" cases, and we have
`Pos.to_nat` for the positive case.
Well, I seem to have covered all the implementation details. Why not just
go ahead and solve the problem? I tried, and ran into two issues:
* Although this is "given", we assumed that our input program will be
valid. For us to use the result of our Coq proof, we need to provide it
a constructive proof that our program is valid. Creating this proof is tedious
in theory, and quite difficult in practice: I've run into a
strange issue trying to pattern match on finite naturals.
* Even supposing we _do_ have a proof of validity, I'm not certain
if it's possible to actually extract an answer from it. It seems
that Coq distinguishes between proofs (things of type `Prop`) and
values (things of type `Set`). things of types `Prop` are supposed
to be _erased_. This means that when you convert Coq code,
to, say, Haskell, you will see no trace of any `Prop`s in that generated
code. Unfortunately, this also means we
[can't use our proofs to construct values](https://stackoverflow.com/questions/27322979/why-coq-doesnt-allow-inversion-destruct-etc-when-the-goal-is-a-type),
even though our proof objects do indeed contain them.
So, we "theoretically" have a solution to part 1, down to the algorithm
used to compute it and a proof that our algorithm works. In "reality", though, we
can't actually use this solution to procure an answer. Like we did with day 1, we'll have
to settle for only a proof.
Let's wrap up for this post. It would be more interesting to devise and
formally verify an algorithm for part 2, but this post has already gotten
quite long and contains a lot of information. Perhaps I will revisit this
at a later time. Thanks for reading!