blog-static/content/blog/coq_docs/index.md

85 lines
5.4 KiB
Markdown
Raw Normal View History

---
title: "I Don't Like Coq's Documentation"
date: 2021-11-24T21:48:59-08:00
expirydate: 2021-11-24T21:48:59-08:00
draft: true
tags: ["Coq"]
---
Recently, I wrote an article on [Formalizing Dawn's Core Calculus in Coq]({{< relref "./coq_dawn.md" >}}).
One of the proofs (specifically, correctness of \\(\\text{quote}_3\\)) was the best candidate I've ever
encountered for proof automation. I knew that proof automation was possible from the second book of Software
Foundations, [Programming Language Foundations](https://softwarefoundations.cis.upenn.edu/plf-current/index.html).
I went there to learn more, and started my little journey in picking up Coq's `Ltac2`.
Before I go any further, let me say that I'd self-describe as an "advanced beginner" in Coq, maybe "intermediate"
on a good day. I am certainly far from a master. I will also say that I am quite young, and thus possibly spoiled
by the explosion of well-documented languages, tools, and libraries. I don't frequently check `man` pages, and I don't
often read straight up technical manuals. Maybe the fault lies with me.
Nevertheles, I feel like I am where I am in the process of learning Coq
in part because of the state of its learning resources.
As a case study, let's take my attempt to automate away a pretty simple proof.
### Grammars instead of Examples
I did not specifically remember the chapter of Software Foundation in which tactics were introduced.
Instead of skimming through chapters until I found it, I tried to look up "coq custom tactic". The
first thing that comes up is the page for `Ltac`.
After a brief explanation of what `Ltac` is, the documentation jumps straight into the grammar of the entire
tactic language. Here' a screenshot of what that looks like:
{{< figure src="ltac_grammar.png" caption="The grammar of Ltac from the Coq page." class="large" alt="A grammar for the `Ltac` language within Coq. The grammar is in BackusNaur form, and has 9 nonterminals." >}}
Good old Backus-Naur form. Now, my two main areas of interest are programming language theory and compilers, and so I'm no stranger to BNF. In fact, the first
time I saw such a documentation page (most pages describing Coq language feature have some production rules), I didn't even consciously process that I was looking
at grammar rules. However, and despite CompCert (a compiler) being probably the most well known example of a project in Coq, I don't think that Coq is made _just_
for people familiar with PLT or compilers. I don't think it should be expected for the average newcomer to Coq (or the average beginner-ish person like me) to know how to read production rules, or to know
what terminals and nonterminals are. Logical Founadtions sure managed to explain Gallina without resorting to BNFs.
And even if I, as a newcomer, know what BNF is, and how to read the rules, there's another layer to this specification:
the precedence of various operators is encoded in the BNF rules. This is a pretty common pattern
for writing down programming language grammars; for each level of operator precedence, there's another
nonterminal. We have `ltac_expr4` for sequencing (the least binding operator), and `ltac_expr3` for "level 3 tactics",
`ltac_expr2` for addition, logical "or", and "level 2 tactics". The creators of this documentation page clearly knew
what they were getting at here, and I've seen this pattern enough times to recognize it right away. But if you
_haven't_ seen this pattern before (and why should you have?), you'll need to take some time to decipher the rules.
That's time that you'd rather spend trying to write your tactic.
The page could just as well have mentioned the types of constructs in `Ltac`, and given a table of their relative precedence.
This could be an opportunity to give an example of what a program in `Ltac` looks like. Instead, despite having seen all of these nonterminals,
I still don't have an image in my mind's eye of what the language looks like. And better yet, I think that the grammar's incorrect:
```
ltac_expr4 ::= ltac_expr3 ; ltac_expr3|binder_tactic
```
The way this is written, there's no way to sequence (using a semicolon) more than two things. The semicolon
only occurs on level four, and both nonterminals in this rule are level three. However, Coq is perfectly happy
to accept the following:
```Coq
Ltac test := auto; auto; auto.
```
In the `Ltac2` grammar, this is written the way I'd expect:
```
ltac2_expr ::= ltac2_expr5 ; ltac2_expr
```
Let's do a quick recap. We have an encoding that requires a degree of experience with grammars and
programming languages to be useful to the reader, _and_ this encoding
{{< sidenote "right" "errors-note" "leaves room for errors," >}}
Here's a question to ponder: how come this error has gone unnoticed? Surely
people used this page to learn <code>Ltac</code>. I believe that part of the reason is
pattern matching: an experienced reader will recognize the "precedence trick", and quickly
scan the grammar levels to estblish precedence. The people writing and proofreading this
documentation likely read it this way, too.
{{< /sidenote >}}
errors that _actually appear in practice_.
{{< sidenote "left" "achilles-note" "We have a very precise, yet incorrect definition." >}}
Amusingly, I think this is very close to what is considered the achilles heel of formal verification:
software that precisely adheres to an incorrect or incomplete specification.
{{< /sidenote >}}
Somehow, "a sequence of statements separated by a semicolon" seems like a simpler explanation.