Start on a draft about Agda and Hugo

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
2024-05-25 23:28:18 -07:00 · 2024-05-25 23:28:18 -07:00 · 6a168f2fe1
commit 6a168f2fe1
parent 04f12b545d
1 changed files with 256 additions and 0 deletions
--- a/content/blog/agda_hugo.md
+++ b/content/blog/agda_hugo.md
@ -0,0 +1,256 @@
+---
+title: "Integrating Agda's HTML Output with Hugo"
+date: 2024-05-25T21:02:10-07:00
+draft: true
+tags: ["Agda", "Hugo", "Ruby", "Nix"]
+---
+
+One of my favorite things about Agda are its clickable HTML pages. If you don't
+know what they are, that's pages like [`Data.List.Properties`](https://agda.github.io/agda-stdlib/master/Data.List.Properties.html);
+they just give the code from a particular Agda file, but make every identifier
+clickable. Then, if you see some variable or function that you don't know, you
+can just click it and jump right to it! It makes exploring the documentation
+a lot smoother. I've found that these HTML pages provide all the information
+I need for writing proofs.
+
+Recently, I've been writing a fair bit about Agda; mostly about the patterns
+that I've learned about, such as the ["is something" pattern]({{< relref "agda_is_pattern" >}})
+and the ["deeply embedded expression" trick]({{< relref "agda_expr_pattern" >}}).
+I've found myself wanting to click on definitions in my own code blocks; recently,
+I got this working, and I wanted to share how I did it, in case someone else
+wants to integrate Agda into their own static website. Though my stack
+is based on Hugo, the general idea should work with any other static site
+generator.
+
+### TL;DR and Demo
+
+I wrote a script to transfer links from an Agda HTML file into Hugo's HTML
+output, making it possible to embellish "plain" Hugo output with Agda's
+'go-to-definition links'. It looks like this. Here's an Agda code block
+defining an 'expression' data type, from a project of mine:
+
+{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 543 546 >}}
+
+And here's the denotational semantics for that expression:
+
+{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 586 589 >}}
+
+Notice that you can click `Expr`, `_∪_`, `⟦`, etc.! All of this integrates
+with my existing Hugo site, and only required a little bit of additional
+metadata to make it work.
+
+Now, the details. Right now, the solution is pretty tailored to my site
+and workflow, but the core of the script -- a piece that transfers links
+from an Agda HTML file into a syntax highlighted Hugo HTML block -- should
+be fairly reusable.
+
+### The Constraints
+The goal was simple: to allow the code blocks on my Hugo-generated site to
+have links that take the user to the definition of a given symbol.
+Specifically, if the symbol occurs somewhere on the same blog page, the link
+should take the user there (and not to a regular `Module.html` file). That
+way, the reader can not only get to the code that they want to see, but also
+have a chance to read the surrounding prose in properly-rendered Markdown.
+
+Next, unlike standard "literate Agda" files, my blog posts are not single
+`.agda` files with Markdown in comments. Rather, I use regular Hugo
+Markdown, and present portions of an existing project, weaving together many
+files, and showing the fragments out of order. So, my tool needs to support
+links that come from distinct modules, in any order.
+
+Additionally, I've recently been writing a whole series about an Agda project
+of mine; in this series, I gradually build up to the final product, explaining
+one or two modules at a time. I would expect that links on pages in this series
+could jump to other pages in the same series: if I cover module `A` in part 1,
+then write `A.f` in part 2, clicking on `A` -- and maybe `f` -- should take
+the reader back to the first part's page; once again, this would help provide
+them with the surrounding explanation.
+
+Finally, I wanted the Agda code to appear exactly the same as any other code
+on my site, including the Hugo-provided syntax highlighting and theme. This
+ruled out just copy-pasting pieces of the Agda-generated HTML in place of
+code blocks on my page (and redirecting the links). Thought it was not
+a hard requirement, I also hoped to include Agda code in the same
+manner that I include all other code: [my `codelines` shortcode]({{< relref "codelines" >}}).
+In brief, the `codelines` shortcode creates a syntax-highlighted code block,
+as well as a surrounding "context" that says what file the code is from,
+which lines are listed, and where to find the full code (e.g., on my Git server).
+It looks something like this:
+
+{{< codelines "Agda" "agda-spa/Language.agda" 20 27 >}}
+
+In summary:
+
+1. I want to create cross-links between symbols in Agda blocks in a blog post.
+2. These code blocks could include code from disjoint files, and be out of order.
+3. Code blocks among a whole series of posts should be cross-linked too.
+4. The code blocks should be syntax highlighted the same way as the rest of the
+   code on the site.
+5. Ideally, I should be able to use my regular method for referencing code.
+
+I've hit all of these requirements; now it's time to dig into how I got there.
+
+### Implementation
+
+It's pretty much a no-go to try to resolve Agda from Hugo, or perform some
+sort of "heuristic" to detect cross-links. Agda is a very complex programming
+language, and Hugo's templating engine, though powerful, is just not
+up to this task. Fortunately, Agda has support for
+[HTML output using the `--html` flag](https://agda.readthedocs.io/en/v2.6.4.3-r1/tools/generating-html.html).
+As a build step, I can invoke Agda on files that are referenced by my blog,
+and generate HTML. This would decidedly slow down the site build process,
+but it would guarantee accurate link information.
+
+
+On the other hand, to satisfy the 4th constraint, I need to somehow mimic --
+or keep -- the format of Hugo's existing HTML output. The easiest way to
+do this without worrying about breaking changes and version incompatibility
+is to actually use the existing syntax-highlighted HTML, and annotate it
+with links as I discover them. Effectively, what I need to do is a "link
+transfer": I need to identify regions of code that are highlighted in Agda's HTML,
+find those regions in Hugo's HTML output, and mark them with links. In addition,
+I'll need to fix up the links themselves: the HTML output assumes that each
+Agda file is its own HTML page, but this is ruled out by the second constraint.
+
+As a little visualization, the overall problems looks something like this:
+
+````Agda {linenos=table}
+-- Agda's HTML output (blocks of 't' are links):
+-- |tttttt| |tttt|  |t|  |t| |ttttt|
+    module   ModX  ( x  : T ) where
+-- |tttttt| |tt|t|  |t|  |t| |ttttt|
+-- Hugo's HTML output (blocks of 't' are syntax highlighting spans)
+````
+
+Both Agda and Hugo output a preformatted code block, decorated with various
+inline HTMl that indicates information (token color for Hugo; symbol IDs and
+links in Agda). However, Agda and Hugo do not use the same process to create
+this decorated output; it's entirely possible -- and not uncommon -- for
+Hugo and Agda to produce misaligned HTML nodes. In my diagram above,
+this is reflected as `ModX` being considered a single token by Agda, but
+two tokens (`Mod` and `X`) by the syntax highlighter. As a result, it's
+difficult to naively iterate the two HTML formats in parallel.
+
+What I ended up doing is translating Agda's HTML output into offsets and data
+about the code block's _plain text_ -- the source code being decorated.
+Both the Agda and Hugo HTML describe the same code; thus, the plain text
+is the common denominator between the two.
+
+I wrote a Ruby script to extract the decorations from the Agda output; here
+it is in slightly abridged form. You can find the [original `agda.rb` file here](https://dev.danilafe.com/Web-Projects/blog-static/src/commit/04f12b545d5692a78b1a2f13ef968417c749e295/agda.rb).
+
+```Ruby
+# Traverse the preformatted Agda block in the given Agda HTML file
+# and find which textual ranges have IDs and links to other ranges.
+# Store this information in a hash, line => links[]
+def process_agda_html_file(file)
+  document = Nokogiri::HTML.parse(File.open(file))
+  pre_code = document.css("pre.Agda")[0]
+
+  # The traversal postorder; we always visit children before their
+  # parents, and we visit leaves in sequence.
+  line_infos = []
+  offset = 0 # Column index within the current Agda source code line
+  line = 1
+  pre_code.traverse do |at|
+    # Text nodes are always leaves; visiting a new leaf means we've advanced
+    # in the text by the length of that text. However, if there are newlines
+    # -- since this is a preformatted block -- we also advanced by a line.
+    # At this time, do not support links that span multiple lines, but
+    # Agda doesn't produce those either.
+    if at.text?
+      if at.content.include? "\n"
+        raise "no support for links with newlines inside" if at.parent.name != "pre"
+
+        # Increase the line and track the final offset. Written as a loop
+        # in case we eventually want to add some handling for the pieces
+        # sandwiched between newlines.
+        at.content.split("\n", -1).each_with_index do |bit, idx|
+          line += 1 unless idx == 0
+          offset = bit.length
+        end
+      else
+        # It's not a newline node. Just adjust the offset within the plain text.
+        offset += at.content.length
+      end
+    elsif at.name == "a"
+      # Agda emits both links and things-to-link-to as 'a' nodes.
+
+      line_info = line_infos.fetch(line) { line_infos[line] = [] }
+      href = at.attribute("href")
+      id = at.attribute("id")
+      if href or id
+        new_node = { :from => offset-at.content.length, :to => offset }
+        new_node[:href] = href if href
+        new_node[:id] = id if id
+
+        line_info << new_node
+      end
+    end
+  end
+  return line_infos
+end
+```
+
+This script takes an Agda HTML file and returns a map in which each line
+of the Agda source code is associated with a list of ranges; the ranges
+indicate links or places that can be linked to. For example, for the `ModX`
+example above, the script might produce:
+
+```Ruby
+3 => [
+  { :from => 3, :to => 9, id => "..." },       # Agda creates <a> nodes even for keywords.
+  { :from => 12, :to => 16, id => "ModX-id" }, # The IDs Agda generates aren't usually this nice.
+  { :from => 20, :to => 21, id => "x-id" },
+]
+```
+
+
+{{< todo >}}This isn't as important probably, but might be worth talking about. {{< /todo >}}
+
+The most challenging step is probably to identify the Agda "projects" that need
+to be built. Since different articles have different modules (possibly with
+the same name), I would need to keep them separate. Also, I'm not ruling
+out the possibility of one project including another as a submodule. To
+make this work, I wrote a little Ruby script to find all Agda files,
+guess their project folder, and invoke the Agda compiler there. It boils
+down to something like this:
+
+```Ruby
+# For each Agda file, find the most specific project / subproject to which
+# it belongs.
+files_for_paths = {}
+Dir.glob("**/*.agda", base: root_path) do |agda_file|
+  best_path = max_path.call(agda_file)
+  files_for_path = files_for_paths.fetch(best_path) do
+    files_for_paths[best_path] = []
+  end
+
+  # Strip the project prefix from the Agda file's path, since
+  # Agda compiler will be invoked in the project's folder.
+  files_for_path << agda_file[best_path.length + File::SEPARATOR.length..-1]
+end
+
+original_wd = Dir.getwd
+files_for_paths.each do |path, files|
+  # There might be a cleaner way of doing this, but it's convenient.
+  Dir.chdir(original_wd)
+  Dir.chdir(File.join(root_path, path))
+
+  # Wherever the target directory is, create a folder that corresponds to
+  # the project being built, to avoid "cross-contaminating" the output
+  # folder with distinct modules with the same name.
+  html_dir = File.join [target_dir, path, "html"]
+  FileUtils.mkdir_p html_dir
+
+  # Just shell out to Agda using the --html folder.
+  files.each do |file|
+    command = "#{ARGV[0]} --local-interfaces #{file} --html --html-dir=#{html_dir}"
+    puts command
+    puts `#{command}`
+  end
+end
+```
+
+In short, it traverses all the folders in my `code` directory -- which is where
+I keep my code, looking for Agda source files. Once it finds them,