diff --git a/content/blog/coq_docs/index.md b/content/blog/coq_docs/index.md new file mode 100644 index 0000000..2472932 --- /dev/null +++ b/content/blog/coq_docs/index.md @@ -0,0 +1,83 @@ +--- +title: "I Don't Like Coq's Documentation" +date: 2021-11-24T21:48:59-08:00 +draft: true +tags: ["Coq"] +--- + +Recently, I wrote an article on [Formalizing Dawn's Core Calculus in Coq]({{< relref "./coq_dawn.md" >}}). +One of the proofs (specifically, correctness of \\(\\text{quote}_3\\)) was the best candidate I've ever +encountered for proof automation. I knew that proof automation was possible from the second book of Software +Foundations, [Programming Language Foundations](https://softwarefoundations.cis.upenn.edu/plf-current/index.html). +I went there to learn more, and started my little journey in picking up Coq's `Ltac2`. + +Before I go any further, let me say that I'd self-describe as an "advanced beginner" in Coq, maybe "intermediate" +on a good day. I am certainly far from a master. I will also say that I am quite young, and thus possibly spoiled +by the explosion of well-documented languages, tools, and libraries. I don't frequently check `man` pages, and I don't +often read straight up technical manuals. Maybe the fault lies with me. +Nevertheles, I feel like I am where I am in the process of learning Coq +in part because of the state of its learning resources. +As a case study, let's take my attempt to automate away a pretty simple proof. + +### Grammars instead of Examples +I did not specifically remember the chapter of Software Foundation in which tactics were introduced. +Instead of skimming through chapters until I found it, I tried to look up "coq custom tactic". The +first thing that comes up is the page for `Ltac`. + +After a brief explanation of what `Ltac` is, the documentation jumps straight into the grammar of the entire +tactic language. Here' a screenshot of what that looks like: + +{{< figure src="ltac_grammar.png" caption="The grammar of Ltac from the Coq page." class="large" alt="A grammar for the `Ltac` language within Coq. The grammar is in Backus–Naur form, and has 9 nonterminals." >}} + +Good old Backus-Naur form. Now, my two main areas of interest are programming language theory and compilers, and so I'm no stranger to BNF. In fact, the first +time I saw such a documentation page (most pages describing Coq language feature have some production rules), I didn't even consciously process that I was looking +at grammar rules. However, and despite CompCert (a compiler) being probably the most well known example of a project in Coq, I don't think that Coq is made _just_ +for people familiar with PLT or compilers. I don't think it should be expected for the average newcomer to Coq (or the average beginner-ish person like me) to know how to read production rules, or to know +what terminals and nonterminals are. Logical Founadtions sure managed to explain Gallina without resorting to BNFs. + +And even if I, as a newcomer, know what BNF is, and how to read the rules, there's another layer to this specification: +the precedence of various operators is encoded in the BNF rules. This is a pretty common pattern +for writing down programming language grammars; for each level of operator precedence, there's another +nonterminal. We have `ltac_expr4` for sequencing (the least binding operator), and `ltac_expr3` for "level 3 tactics", +`ltac_expr2` for addition, logical "or", and "level 2 tactics". The creators of this documentation page clearly knew +what they were getting at here, and I've seen this pattern enough times to recognize it right away. But if you +_haven't_ seen this pattern before (and why should you have?), you'll need to take some time to decipher the rules. +That's time that you'd rather spend trying to write your tactic. + +The page could just as well have mentioned the types of constructs in `Ltac`, and given a table of their relative precedence. +This could be an opportunity to give an example of what a program in `Ltac` looks like. Instead, despite having seen all of these nonterminals, +I still don't have an image in my mind's eye of what the language looks like. And better yet, I think that the grammar's incorrect: + +``` +ltac_expr4 ::= ltac_expr3 ; ltac_expr3|binder_tactic +``` + +The way this is written, there's no way to sequence (using a semicolon) more than two things. The semicolon +only occurs on level four, and both nonterminals in this rule are level three. However, Coq is perfectly happy +to accept the following: + +```Coq +Ltac test := auto; auto; auto. +``` + +In the `Ltac2` grammar, this is written the way I'd expect: + +``` +ltac2_expr ::= ltac2_expr5 ; ltac2_expr +``` + +Let's do a quick recap. We have an encoding that requires a degree of experience with grammars and +programming languages to be useful to the reader, _and_ this encoding +{{< sidenote "right" "errors-note" "leaves room for errors," >}} +Here's a question to ponder: how come this error has gone unnoticed? Surely +people used this page to learn Ltac. I believe that part of the reason is +pattern matching: an experienced reader will recognize the "precedence trick", and quickly +scan the grammar levels to estblish precedence. The people writing and proofreading this +documentation likely read it this way, too. +{{< /sidenote >}} +errors that _actually appear in practice_. +{{< sidenote "left" "achilles-note" "We have a very precise, yet incorrect definition." >}} +Amusingly, I think this is very close to what is considered the achilles heel of formal verification: +software that precisely adheres to an incorrect or incomplete specification. +{{< /sidenote >}} +Somehow, "a sequence of statements separated by a semicolon" seems like a simpler explanation. diff --git a/content/blog/coq_docs/ltac_grammar.png b/content/blog/coq_docs/ltac_grammar.png new file mode 100644 index 0000000..3949f87 Binary files /dev/null and b/content/blog/coq_docs/ltac_grammar.png differ