blog-static/content/blog/coq_docs/index.md

5.4 KiB
Raw Blame History

title date expirydate draft tags
I Don't Like Coq's Documentation 2021-11-24T21:48:59-08:00 2021-11-24T21:48:59-08:00 true
Coq

Recently, I wrote an article on [Formalizing Dawn's Core Calculus in Coq]({{< relref "./coq_dawn.md" >}}). One of the proofs (specifically, correctness of \(\text{quote}_3\)) was the best candidate I've ever encountered for proof automation. I knew that proof automation was possible from the second book of Software Foundations, Programming Language Foundations. I went there to learn more, and started my little journey in picking up Coq's Ltac2.

Before I go any further, let me say that I'd self-describe as an "advanced beginner" in Coq, maybe "intermediate" on a good day. I am certainly far from a master. I will also say that I am quite young, and thus possibly spoiled by the explosion of well-documented languages, tools, and libraries. I don't frequently check man pages, and I don't often read straight up technical manuals. Maybe the fault lies with me. Nevertheles, I feel like I am where I am in the process of learning Coq in part because of the state of its learning resources. As a case study, let's take my attempt to automate away a pretty simple proof.

Grammars instead of Examples

I did not specifically remember the chapter of Software Foundation in which tactics were introduced. Instead of skimming through chapters until I found it, I tried to look up "coq custom tactic". The first thing that comes up is the page for Ltac.

After a brief explanation of what Ltac is, the documentation jumps straight into the grammar of the entire tactic language. Here' a screenshot of what that looks like:

{{< figure src="ltac_grammar.png" caption="The grammar of Ltac from the Coq page." class="large" alt="A grammar for the Ltac language within Coq. The grammar is in BackusNaur form, and has 9 nonterminals." >}}

Good old Backus-Naur form. Now, my two main areas of interest are programming language theory and compilers, and so I'm no stranger to BNF. In fact, the first time I saw such a documentation page (most pages describing Coq language feature have some production rules), I didn't even consciously process that I was looking at grammar rules. However, and despite CompCert (a compiler) being probably the most well known example of a project in Coq, I don't think that Coq is made just for people familiar with PLT or compilers. I don't think it should be expected for the average newcomer to Coq (or the average beginner-ish person like me) to know how to read production rules, or to know what terminals and nonterminals are. Logical Founadtions sure managed to explain Gallina without resorting to BNFs.

And even if I, as a newcomer, know what BNF is, and how to read the rules, there's another layer to this specification: the precedence of various operators is encoded in the BNF rules. This is a pretty common pattern for writing down programming language grammars; for each level of operator precedence, there's another nonterminal. We have ltac_expr4 for sequencing (the least binding operator), and ltac_expr3 for "level 3 tactics", ltac_expr2 for addition, logical "or", and "level 2 tactics". The creators of this documentation page clearly knew what they were getting at here, and I've seen this pattern enough times to recognize it right away. But if you haven't seen this pattern before (and why should you have?), you'll need to take some time to decipher the rules. That's time that you'd rather spend trying to write your tactic.

The page could just as well have mentioned the types of constructs in Ltac, and given a table of their relative precedence. This could be an opportunity to give an example of what a program in Ltac looks like. Instead, despite having seen all of these nonterminals, I still don't have an image in my mind's eye of what the language looks like. And better yet, I think that the grammar's incorrect:

ltac_expr4 ::= ltac_expr3 ; ltac_expr3|binder_tactic

The way this is written, there's no way to sequence (using a semicolon) more than two things. The semicolon only occurs on level four, and both nonterminals in this rule are level three. However, Coq is perfectly happy to accept the following:

Ltac test := auto; auto; auto.

In the Ltac2 grammar, this is written the way I'd expect:

ltac2_expr ::= ltac2_expr5 ; ltac2_expr

Let's do a quick recap. We have an encoding that requires a degree of experience with grammars and programming languages to be useful to the reader, and this encoding {{< sidenote "right" "errors-note" "leaves room for errors," >}} Here's a question to ponder: how come this error has gone unnoticed? Surely people used this page to learn Ltac. I believe that part of the reason is pattern matching: an experienced reader will recognize the "precedence trick", and quickly scan the grammar levels to estblish precedence. The people writing and proofreading this documentation likely read it this way, too. {{< /sidenote >}} errors that actually appear in practice. {{< sidenote "left" "achilles-note" "We have a very precise, yet incorrect definition." >}} Amusingly, I think this is very close to what is considered the achilles heel of formal verification: software that precisely adheres to an incorrect or incomplete specification. {{< /sidenote >}} Somehow, "a sequence of statements separated by a semicolon" seems like a simpler explanation.