blog-static/content/blog/01_aoc_coq.md

26 KiB

title date tags draft
Advent of Code in Coq - Day 8 2020-12-31T17:55:25-08:00
Advent of Code
Coq
true

Huh? We're on day 8? What happened to days 2 through 7?

Well, for the most part, I didn't think they were that interesting from the Coq point of view. Day 7 got close, but not close enough to inspire me to create a formalization. Day 8, on the other hand, is {{< sidenote "right" "pl-note" "quite interesting," >}} Especially to someone like me who's interested in programming languages! {{< /sidenote >}} and took quite some time to formalize.

As before, here's an (abridged) description of the problem:

Given a tiny assembly-like language, determine the state of its accumulator when the same instruction is executed twice.

Before we start on the Coq formalization, let's talk about an idea from Programming Language Theory (PLT), big step operational semantics.

Big Step Operational Semantics

What we have in Advent of Code's Day 8 is, undeniably, a small programming language. We are tasked with executing this language, or, in PLT lingo, defining its semantics. There are many ways of doing this - at university, I've been taught of denotational, axiomatic, and operational semantics. I believe that Coq's mechanism of inductive definitions lends itself very well to operational semantics, so we'll take that route. But even "operational semantics" doesn't refer to a concrete technique - we have a choice between small-step (structural) and big-step (natural) operational semantics. The former describe the minimal "steps" a program takes as it's being evaluated, while the latter define the final results of evaluating a program. I decided to go with big-step operational semantics, since they're more intutive (natural!).

So, how does one go about "[defining] the final results of evaluating a program?" Most commonly, we go about using inference rules. Let's talk about those next.

Inference Rules

Inference rules are a very general notion. The describe how we can determine (infer) a conclusion from a set of assumption. It helps to look at an example. Here's a silly little inference rule:

{{< latex >}} \frac {\text{I'm allergic to cats} \quad \text{My friend has a cat}} {\text{I will not visit my friend very much}} {{< /latex >}}

It reads, "if I'm allergic to cats, and if my friend has a cat, then I will not visit my friend very much". Here, "I'm allergic to cats" and "my friend has a cat" are premises, and "I will not visit my friend very much" is a conclusion. An inference rule states that if all its premises are true, then its conclusion must be true. Here's another inference rule, this time with some mathematical notation instead of words:

{{< latex >}} \frac {n < m} {n + 1 < m + 1} {{< /latex >}}

This one reads, "if \(n\) is less than \(m\), then \(n+1\) is less than \(m+1\)". We can use inference rules to define various constructs. As an example, let's define what it means for a natural number to be even. It takes two rules:

{{< latex >}} \frac {} {0 ; \text{is even}} \quad \frac {n ; \text{is even}} {n+2 ; \text{is even}} {{< /latex >}}

First of all, zero is even. We take this as fact - there are no premises for the first rule, so they are all trivially true. Next, if a number is even, then adding 2 to that number results in another even number. Using the two of these rules together, we can correctly determine whether any number is or isn't even. We start knowing that 0 is even. Adding 2 we learn that 2 is even, and adding 2 again we see that 4 is even, as well. We can continue this to determine that 6, 8, 10, and so on are even too. Never in this process will we visit the numbers 1 or 3 or 5, and that's good - they're not even!

Let's now extend this notion to programming languages, starting with a simple arithmetic language. This language is made up of natural numbers and the \(\square\) operation, which represents the addition of two numbers. Again, we need two rules:

{{< latex >}} \frac {n \in \mathbb{N}} {n ; \text{evaluates to} ; n} \quad \frac {e_1 ; \text{evaluates to} ; n_1 \quad e_2 ; \text{evaluates to} ; n_2} {e_1 \square e_2 ; \text{evaluates to} ; n_1 + n_2} {{< /latex >}}

First, let me explain myself. I used \(\square\) to demonstrate two important points. First, languages can be made of any kind of characters we want; it's the rules that we define that give these languages meaning. Second, while \(\square\) is the addition operation in our language, \(+\) is the mathematical addition operator. They are not the same - we use the latter to define how the former works.

Finally, writing "evaluates to" gets quite tedious, especially for complex languages. Instead, PLT people use notation to make their semantics more concise. The symbol \(\Downarrow\) is commonly used to mean "evaluates to"; thus, \(e \Downarrow v\) reads "the expression \(e\) evaluates to the value \(v\). Using this notation, our rules start to look like the following:

{{< latex >}} \frac {n \in \mathbb{N}} {n \Downarrow n} \quad \frac {e_1 \Downarrow n_1 \quad e_2 \Downarrow n_2} {e_1 \square e_2 \Downarrow n_1 + n_2} {{< /latex >}}

If nothing else, these are way more compact! Though these may look intimidating at first, it helps to simply read each symbol as its English meaning.

Encoding Inference Rules in Coq

Now that we've seen what inference rules are, we can take a look at how they can be represented in Coq. We can use Coq's Inductive mechanism to define the rules. Let's start with our "is even" property.

Inductive is_even : nat -> Prop :=
    | zero_even : is_even 0
    | plustwo_even : is_even n -> is_even (n+2).

The first line declares the property is_even, which, given a natural number, returns proposition. This means that is_even is not a proposition itself, but is_even 0, is_even 1, and is_even 2 are all propositions.

The following two lines each encode one of our aforementioned inference rules. The first rule, zero_even, is of type is_even 0. The zero_even rule doesn't require any arguments, and we can use it to create a proof that 0 is even. On the other hand, the plustwo_even rule does require an argument, is_even n. To construct a proof that a number n+2 is even using plustwo_even, we need to provide a proof that n itself is even. From this definition we can see a general principle: we encode each inference rule as constructor of an inductive Coq type. Each rule encoded in this manner takes as arguments the proofs of its premises, and returns a proof of its conclusion.

For another example, let's encode our simple addition language. First, we have to define the language itself:

Inductive tinylang : Type :=
    | number (n : nat) : tinylang
    | box (e1 e2 : tinylang) : tinylang.

This defines the two elements of our example language: number n corresponds to \(n\), and box e1 e2 corresponds to \(e_1 \square e_2\). Finally, we define the inference rules:

Inductive tinylang_sem : tinylang -> nat -> Prop :=
    | number_sem : forall (n : nat), tinylang_sem (number n) n
    | box_sem : forall (e1 e2 : tinylang) (n1 n2 : nat),
        tinylang_sem e1 n1 -> tinylang_sem e2 n2 ->
        tinylang_sem (box e1 e2) (n1 + n2).

When we wrote our rules earlier, by using arbitrary variables like \(e_1\) and \(n_1\), we implicitly meant that our rules work for any number or expression. When writing Coq we have to make this assumption explicit by using forall. For instance, the rule on line 2 reads, "for any number n, the expression n evaluates to n".

Semantics of Our Language

We've now written some example big-step operational semantics, both "on paper" and in Coq. Now, it's time to take a look at the specific semantics of the language from Day 8! Our language consists of a few parts.

First, there are three opcodes: \(\texttt{jmp}\), \(\texttt{nop}\), and \(\texttt{add}\). Opcodes, combined with an integer, make up an instruction. For example, the instruction \(\texttt{add} \; 3\) will increase the content of the accumulator by three. Finally, a program consists of a sequence of instructions; They're separated by newlines in the puzzle input, but we'll instead separate them by semicolons. For example, here's a complete program.

{{< latex >}} \texttt{add} ; 0; ; \texttt{nop} ; 2; ; \texttt{jmp} ; -2 {{< /latex >}}

Now, let's try evaluating this program. Starting at the beginning and with 0 in the accumulator, it will add 0 to the accumulator (keeping it the same), do nothing, and finally jump back to the beginning. At this point, it will try to run the addition instruction again, which is not allowed; thus, the program will terminate.

Did you catch that? The semantics of this language will require more information than just our program itself (which we'll denote \(p\)).

  • First, to evaluate the program we will need a program counter, \(\textit{c}\). This program counter will tell us the position of the instruction to be executed next. It can also point past the last instruction, which means our program terminated successfully.
  • Next, we'll need the accumulator \(a\). Addition instructions can change the accumulator, and we will be interested in the number that ends up in the accumulator when our program finishes executing.
  • Finally, and more subtly, we'll need to keep track of the states we visited. For instance, in the course of evaluating our program above, we encounter the \((c, a)\) pair of \((0, 0)\) twice: once at the beginning, and once at the end. However, whereas at the beginning we have not yet encountered the addition instruction, at the end we have, so the evaluation behaves differently. To make the proofs work better in Coq, we'll use a set \(v\) of {{< sidenote "right" "allowed-note" "allowed (valid) program counters (as opposed to visited program counters)." >}} Whereas the set of "visited" program counters keeps growing as our evaluation continues, the set of "allowed" program counters keeps shrinking. Because the "allowed" set never stops shrinking, assuming we're starting with a finite set, our execution will eventually terminate. {{< /sidenote >}}

Now we have all the elements of our evaluation. Let's define some notation. A program starts at some state, and terminates in another, possibly different state. In the course of a regular evaluation, the program never changes; only the state does. So I propose this (rather unorthodox) notation:

{{< latex >}} (c, a, v) \Rightarrow_p (c', a', v') {{< /latex >}}

This reads, "after starting at program counter \(c\), accumulator \(a\), and set of valid addresses \(v\), the program \(p\) terminates with program counter \(c'\), accumulator \(a'\), and set of valid addresses \(v'\)". Before creating the inference rules for this evaluation relation, let's define the effect of evaluating a single instruction, using notation \((c, a) \rightarrow_i (c', a')\). An addition instruction changes the accumulator, and increases the program counter by 1.

{{< latex >}} \frac{} {(c, a) \rightarrow_{\texttt{add} ; n} (c+1, a+n)} {{< /latex >}}

A no-op instruction does even less. All it does is increment the program counter.

{{< latex >}} \frac{} {(c, a) \rightarrow_{\texttt{nop} ; n} (c+1, a)} {{< /latex >}}

Finally, a jump instruction leaves the accumulator intact, but adds a number to the program counter itself!

{{< latex >}} \frac{} {(c, a) \rightarrow_{\texttt{jmp} ; n} (c+n, a)} {{< /latex >}}

None of these rules have any premises, and they really are quite simple. Now, let's define the rules for evaluating a program. First of all, a program starting in a state that is not considered "valid" is done evaluating, and is in a "failed" state.

{{< latex >}} \frac{c \not \in v \quad c \not= \text{length}(p)} {(c, a, v) \Rightarrow_{p} (c, a, v)} {{< /latex >}}

We use \(\text{length}(p)\) to represent the number of instructions in \(p\). Note the second premise: even if our program counter \(c\) is not included in the valid set, if it's "past the end of the program", the program terminates in an "ok" state. Here's a rule for terminating in the "ok" state:

{{< latex >}} \frac{c = \text{length}(p)} {(c, a, v) \Rightarrow_{p} (c, a, v)} {{< /latex >}}

{{< todo >}} We can make this closer to the Coq version. {{< /todo >}} When our program counter reaches the end of the program, we are also done evaluating it. Even though both rules {{< sidenote "right" "redundant-note" "lead to the same conclusion," >}} In fact, if the end of the program is never included in the valid set, the second rule is completely redundant. {{< /sidenote >}} it helps to distinguish the two possible outcomes. Finally, if neither of the termination conditions are met, our program can take a step, and continue evaluating from there.

{{< latex >}} \frac{c \in v \quad p[c] = i \quad (c, a) \rightarrow_i (c', a') \quad (c', a', v - {c}) \Rightarrow_p (c'', a'', v'')} {(c, a, v) \Rightarrow_{p} (c'', a'', v'')} {{< /latex >}}

This is quite a rule. A lot of things need to work out for a program to evauate from a state that isn't currently the final state:

  • The current program counter \(c\) must be valid. That is, it must be an element of \(v\).
  • This program counter must correspond to an instruction \(i\) in \(p\), which we write as \(p[c] = i\).
  • This instruction must be executed, changing our program counter from \(c\) to \(c'\) and our accumulator from \(a\) to \(a'\). The set of valid instructions will no longer include \(c\), and will become \(v - \{c\}\).
  • Our program must then finish executing, starting at state \((c', a', v - \{c\})\), and ending in some (unknown) state \((c'', a'', v'')\).

If all of these conditions are met, our program, starting at \((c, a, v)\), will terminate in the state \((c'', a'', v'')\). This third rule completes our semantics; a program being executed will keep running instructions using the third rule, until it finally hits an invalid program counter (terminating with the first rule) or gets to the end of the program (terminating with the second rule).

Aside: Vectors and Finite \(\mathbb{N}\)

We'll be getting to the Coq implementation of our semantics soon, but before we do: what type should \(c\) be? It's entirely possible for an instruction like \(\texttt{jmp} \; -10000\) to throw our program counter way before the first instruction of our program, so at first, it seems as though we should use an integer. But the prompt doesn't even specify what should happen in this case - it only says an instruction shouldn't be run twice. The "valid set", although it may help resolve this debate, is our invention, and isn't part of the original specification.

There is, however, something we can infer from this problem. Since the problem of jumping "too far behind" or "too far ahead" is never mentioned, we can assume that all jumps will lead either to an instruction, or right to the end of a program. This means that \(c\) is a natural number, with

{{< latex >}} 0 \leq c \leq \text{length}(p) {{< /latex >}}

In a language like Coq, it's possible to represent such a number. Since we've gotten familliar with inference rules, let's present two rules that define such a number:

{{< latex >}} \frac {n \in \mathbb{N}^+} {Z : \text{Fin} ; n} \quad \frac {f : \text{Fin} ; n} {S f : \text{Fin} ; (n+1)} {{< /latex >}}

This is a variation of the Peano encoding of natural numbers. It reads as follows: zero (\(Z\)) is a finite natural number less than any positive natural number \(n\). Then, if a finite natural number \(f\) is less than \(n\), then adding one to that number (using the successor function \(S\)) will create a natural number less than \(n+1\). We encode this in Coq as follows (originally from here):

Inductive t : nat -> Set :=
    | F1 : forall {n}, t (S n)
    | FS : forall {n}, t n -> t (S n).

The F1 constructor here is equivalent to our \(Z\), and FS is equivalent to our \(S\). To represent positive natural numbers \(\mathbb{N}^+\), we simply take a regular natural number from \(\mathbb{N}\) and find its successor using S (simply adding 1). Again, we have to explicitly use forall in our type signatures.

We can use a similar technique to represent a list with a known number of elements, known in the Idris and Coq world as a vector. Again, we only need two inference rules to define such a vector:

{{< latex >}} \frac {t : \text{Type}} {[] : \text{Vec} ; t ; 0} \quad \frac {x : \text{t} \quad \textit{xs} : \text{Vec} ; t ; n} {(x::\textit{xs}) : \text{Vec} ; t ; (n+1)} {{< /latex >}}

These rules read: the empty list \([]\) is zero-length vector of any type \(t\). Then, if we take an element \(x\) of type \(t\), and an \(n\)-long vector \(\textit{xs}\) of \(t\), then we can prepend \(x\) to \(\textit{xs}\) and get an \((n+1)\)-long vector of \(t\). In Coq, we write this as follows (originally from here):

Inductive t A : nat -> Type :=
    | nil : t A 0
    | cons : forall (h:A) (n:nat), t A n -> t A (S n).

The nil constructor represents the empty list \([]\), and cons represents the operation of prepending an element (called h in the code in \(x\) in our inference rules) to another vector of length \(n\), which is remains unnamed in the code but is called \(\textit{xs}\) in our rules.

These two definitions work together quite well. For instance, suppose we have a vector of length \(n\). If we were to access its elements by indices starting at 0, we'd be allowed to access indices 0 through \(n-1\). These are precisely the values of the finite natural numbers less than \(n\), \(\text{Fin} \; n \). Thus, given such an index \(\text{Fin} \; n\) and a vector \(\text{Vec} \; t \; n\), we are guaranteed to be able to retrieve the element at the given index! In our code, we will not have to worry about bounds checking.

Of course, if our program has \(n\) elements, our program counter will be a finite number less than \(n+1\), since there's always the possibility of it pointing past the instructions, indicating that we've finished running the program. This leads to some minor complications: we can't safely access the program instruction at index \(\text{Fin} \; (n+1)\). We can solve this problem by considering two cases: either our index points one past the end of the program (in which case its value is exactly the finite representation of \(n\)), or it's less than \(n\), in which case we can "tighten" the upper bound, and convert that index into a \(\text{Fin} \; n\). We formalize it in a lemma:

{{< codelines "Coq" "aoc-2020/day8.v" 80 82 >}}

{{< todo >}}Prove this (at least informally) {{< /todo >}}

There's a little bit of a gotcha here. Instead of translating our above statement literally, and returning a value that's the result of "tightening" our input f, we return a value f' that can be "weakened" to f. This is because "tightening" is not a total function - it's not always possible to convert a \(\text{Fin} \; (n+1)\) into a \(\text{Fin} \; n\). However, "weakening" \(\text{Fin} \; n\) is a total function, since a number less than \(n\) is, by the transitive property of a total order, also less than \(n+1\).

Next, let's talk about addition, specifically the kind of addition done by the \(\texttt{jmp}\) instruction. We can always add an integer to a natural number, but we can at best guarantee that the result will be an integer. For instance, we can add -1000 to 1, and get -999, which is not a natural number. We implement this kind of addition in a function called jump_t:

{{< codelines "Coq" "aoc-2020/day8.v" 56 56 >}}

At the moment, its definition is not particularly important. What is important, though, is that it takes a bounded natural number pc (our program counter), an integer off (the offset provided by the jump instruction) and returns another integer representing the final offset. Why are integers of type t? Well, it so happens that Coq provides facilities for working with arbitrary implementations of integers, without relying on how they are implemented under the hood. This can be seen in its Coq.ZArith.Int module, which describes what functions and types an implementation of integers should provide. Among those is t, the type an integer in such an arbitrary implementation. We too will not make an assumption about how the integers are implemented, and simply use this generic t from now on.

Semantics in Coq

Now that we've seen finite sets and vectors, it's time to use them to encode our semantics in Coq. Let's start with jumps. Suppose we wanted to write a function that does return a valid program counter after adding the offset to it. Since it's possible for this function to fail (for instance, if the offset is very negative), it has to return option (fin (S n)). That is, this function may either fail (returning None) or succeed, returning Some f, where f is of type fin (S n), aka \(\text{Fin} \; (n + 1)\). Here's the function in Coq (again, don't worry too much about the definition):

{{< codelines "Coq" "aoc-2020/day8.v" 61 61 >}}

But earlier, didn't we say:

All jumps will lead either to an instruction, or right to the end of a program.

To make Coq aware of this constraint, we'll have to formalize this notion. To start off, we'll define the notion of a "valid instruction", which is guaranteed to keep the program counter in the correct range. There are a couple of ways to do this, but we'll use yet another definition based on inference rules. First, though, observe that the same instruction may be valid for one program, and invalid for another. For instance, \(\texttt{jmp} \; 100\) is perfectly valid for a program with thousands of instructions, but if it occurs in a program with only 3 instructions, it will certainly lead to disaster. Specifically, the validity of an instruction depends on the length of the program in which it resides, and the program counter at which it's encountered. Thus, we refine our idea of validity to "being valid for a program of length n at program counter f". For this, we can use the following two inference rules:

{{< latex >}} \frac {c : \text{Fin} ; n} {\texttt{acc} ; t ; \text{valid for} ; n, c } \quad \frac {c : \text{Fin} ; n \quad o \in {\texttt{nop}, \texttt{jmp}} \quad J_v(c, t) = \text{Some} ; c' } {o ; t ; \text{valid for} ; n, c } {{< /latex >}}

The first rule states that if a program has length \(n\), then it's valid at any program counter whose value is less than \(n\). This is because running \(\texttt{add}\) will increment the program counter \(c\) by 1, and thus, create a new program counter that's less than \(n+1\), which, as we discussed above, is perfectly valid.

The second rule works for the other two instructions. It has an extra premise: the result of jump_valid_t (written as \(J_v\)) has to be \(\text{Some} \; c'\), that is, jump_valid_t must succeed. Now, if an instruction satisfies these validity rules for a given program at a given program counter, evaluating it will always result in a program counter that has a proper value.

We encode this in Coq as follows:

{{< codelines "Coq" "aoc-2020/day8.v" 152 157 >}}

Note that we have three rules instead of two. This is because we "unfolded" \(o\) from our second rule: rather than using set notation (or "or"), we just generated two rules that vary in nothing but the operation involved.

Of course, we must have that every instruction in a program is valid. We don't really need inference rules for this, as much as a "forall" quantifier. A program \(p\) of length \(n\) is valid if the following holds:

{{< latex >}} \forall (c : \text{Fin} ; n). p[c] ; \text{valid for} ; n, c {{< /latex >}}

That is, for every possible in-bounds program counter \(c\), the instruction at the program counter is valid. We can now encode this in Coq, too:

{{< codelines "Coq" "aoc-2020/day8.v" 160 161 >}}

In the above, we use input n to mean "a program of length n". This is just an alias for vect inst n, a vector of instructions of length n. In the above, n is made implicit where possible. Since \(c\) (called pc in the code) is of type \(\text{Fin} \; n\), there's no need to write \(n\) again.

Finally, it's time to get started on the semantics themselves. We start with the inductive definition of \((\rightarrow_i)\). I think this is fairly straightforward. We use t instead of \(n\) from the rules, and we use FS instead of \(+1\). Also, we make the formerly implicit assumption that \(c+n\) is valid explicit, by providing a proof that valid_jump_t pc t = Some pc'.

{{< codelines "Coq" "aoc-2020/day8.v" 103 110 >}}

Next, it will help us to combine the premises for a "failed" and "ok" terminations into Coq data types. This will help us formulate a lemma later on. Here they are:

{{< codelines "Coq" "aoc-2020/day8.v" 112 117 >}}

Since all of out "termination" rules start and end in the same state, there's no reason to write that state twice. Thus, both done and stuck only take the input inp, and the state, which includes the accumulator acc, set of allowed program counters v, and the program counter at which the program came to an end. When the program terminates successfully, this program counter will be equal to the length of the program n, so we use nat_to_fin n. On the other hand, if the program terminates in as stuck state, it must be that it terminated at a program counter that points to an instruction. Thus, this program counter is actually a \(\text{Fin} \; n\), and not a \(\text{Fin} \ (n+1)\) (we use the same "weakening" trick we saw earlier), and is not in the set of allowed program counters.

Finally, we encode the three inference rules we came up with:

{{< codelines "Coq" "aoc-2020/day8.v" 119 126 >}}