diff --git a/content/blog/agda_hugo.md b/content/blog/agda_hugo.md new file mode 100644 index 0000000..e1413c1 --- /dev/null +++ b/content/blog/agda_hugo.md @@ -0,0 +1,256 @@ +--- +title: "Integrating Agda's HTML Output with Hugo" +date: 2024-05-25T21:02:10-07:00 +draft: true +tags: ["Agda", "Hugo", "Ruby", "Nix"] +--- + +One of my favorite things about Agda are its clickable HTML pages. If you don't +know what they are, that's pages like [`Data.List.Properties`](https://agda.github.io/agda-stdlib/master/Data.List.Properties.html); +they just give the code from a particular Agda file, but make every identifier +clickable. Then, if you see some variable or function that you don't know, you +can just click it and jump right to it! It makes exploring the documentation +a lot smoother. I've found that these HTML pages provide all the information +I need for writing proofs. + +Recently, I've been writing a fair bit about Agda; mostly about the patterns +that I've learned about, such as the ["is something" pattern]({{< relref "agda_is_pattern" >}}) +and the ["deeply embedded expression" trick]({{< relref "agda_expr_pattern" >}}). +I've found myself wanting to click on definitions in my own code blocks; recently, +I got this working, and I wanted to share how I did it, in case someone else +wants to integrate Agda into their own static website. Though my stack +is based on Hugo, the general idea should work with any other static site +generator. + +### TL;DR and Demo + +I wrote a script to transfer links from an Agda HTML file into Hugo's HTML +output, making it possible to embellish "plain" Hugo output with Agda's +'go-to-definition links'. It looks like this. Here's an Agda code block +defining an 'expression' data type, from a project of mine: + +{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 543 546 >}} + +And here's the denotational semantics for that expression: + +{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 586 589 >}} + +Notice that you can click `Expr`, `_∪_`, `⟦`, etc.! All of this integrates +with my existing Hugo site, and only required a little bit of additional +metadata to make it work. + +Now, the details. Right now, the solution is pretty tailored to my site +and workflow, but the core of the script -- a piece that transfers links +from an Agda HTML file into a syntax highlighted Hugo HTML block -- should +be fairly reusable. + +### The Constraints +The goal was simple: to allow the code blocks on my Hugo-generated site to +have links that take the user to the definition of a given symbol. +Specifically, if the symbol occurs somewhere on the same blog page, the link +should take the user there (and not to a regular `Module.html` file). That +way, the reader can not only get to the code that they want to see, but also +have a chance to read the surrounding prose in properly-rendered Markdown. + +Next, unlike standard "literate Agda" files, my blog posts are not single +`.agda` files with Markdown in comments. Rather, I use regular Hugo +Markdown, and present portions of an existing project, weaving together many +files, and showing the fragments out of order. So, my tool needs to support +links that come from distinct modules, in any order. + +Additionally, I've recently been writing a whole series about an Agda project +of mine; in this series, I gradually build up to the final product, explaining +one or two modules at a time. I would expect that links on pages in this series +could jump to other pages in the same series: if I cover module `A` in part 1, +then write `A.f` in part 2, clicking on `A` -- and maybe `f` -- should take +the reader back to the first part's page; once again, this would help provide +them with the surrounding explanation. + +Finally, I wanted the Agda code to appear exactly the same as any other code +on my site, including the Hugo-provided syntax highlighting and theme. This +ruled out just copy-pasting pieces of the Agda-generated HTML in place of +code blocks on my page (and redirecting the links). Thought it was not +a hard requirement, I also hoped to include Agda code in the same +manner that I include all other code: [my `codelines` shortcode]({{< relref "codelines" >}}). +In brief, the `codelines` shortcode creates a syntax-highlighted code block, +as well as a surrounding "context" that says what file the code is from, +which lines are listed, and where to find the full code (e.g., on my Git server). +It looks something like this: + +{{< codelines "Agda" "agda-spa/Language.agda" 20 27 >}} + +In summary: + +1. I want to create cross-links between symbols in Agda blocks in a blog post. +2. These code blocks could include code from disjoint files, and be out of order. +3. Code blocks among a whole series of posts should be cross-linked too. +4. The code blocks should be syntax highlighted the same way as the rest of the + code on the site. +5. Ideally, I should be able to use my regular method for referencing code. + +I've hit all of these requirements; now it's time to dig into how I got there. + +### Implementation + +It's pretty much a no-go to try to resolve Agda from Hugo, or perform some +sort of "heuristic" to detect cross-links. Agda is a very complex programming +language, and Hugo's templating engine, though powerful, is just not +up to this task. Fortunately, Agda has support for +[HTML output using the `--html` flag](https://agda.readthedocs.io/en/v2.6.4.3-r1/tools/generating-html.html). +As a build step, I can invoke Agda on files that are referenced by my blog, +and generate HTML. This would decidedly slow down the site build process, +but it would guarantee accurate link information. + + +On the other hand, to satisfy the 4th constraint, I need to somehow mimic -- +or keep -- the format of Hugo's existing HTML output. The easiest way to +do this without worrying about breaking changes and version incompatibility +is to actually use the existing syntax-highlighted HTML, and annotate it +with links as I discover them. Effectively, what I need to do is a "link +transfer": I need to identify regions of code that are highlighted in Agda's HTML, +find those regions in Hugo's HTML output, and mark them with links. In addition, +I'll need to fix up the links themselves: the HTML output assumes that each +Agda file is its own HTML page, but this is ruled out by the second constraint. + +As a little visualization, the overall problems looks something like this: + +````Agda {linenos=table} +-- Agda's HTML output (blocks of 't' are links): +-- |tttttt| |tttt| |t| |t| |ttttt| + module ModX ( x : T ) where +-- |tttttt| |tt|t| |t| |t| |ttttt| +-- Hugo's HTML output (blocks of 't' are syntax highlighting spans) +```` + +Both Agda and Hugo output a preformatted code block, decorated with various +inline HTMl that indicates information (token color for Hugo; symbol IDs and +links in Agda). However, Agda and Hugo do not use the same process to create +this decorated output; it's entirely possible -- and not uncommon -- for +Hugo and Agda to produce misaligned HTML nodes. In my diagram above, +this is reflected as `ModX` being considered a single token by Agda, but +two tokens (`Mod` and `X`) by the syntax highlighter. As a result, it's +difficult to naively iterate the two HTML formats in parallel. + +What I ended up doing is translating Agda's HTML output into offsets and data +about the code block's _plain text_ -- the source code being decorated. +Both the Agda and Hugo HTML describe the same code; thus, the plain text +is the common denominator between the two. + +I wrote a Ruby script to extract the decorations from the Agda output; here +it is in slightly abridged form. You can find the [original `agda.rb` file here](https://dev.danilafe.com/Web-Projects/blog-static/src/commit/04f12b545d5692a78b1a2f13ef968417c749e295/agda.rb). + +```Ruby +# Traverse the preformatted Agda block in the given Agda HTML file +# and find which textual ranges have IDs and links to other ranges. +# Store this information in a hash, line => links[] +def process_agda_html_file(file) + document = Nokogiri::HTML.parse(File.open(file)) + pre_code = document.css("pre.Agda")[0] + + # The traversal postorder; we always visit children before their + # parents, and we visit leaves in sequence. + line_infos = [] + offset = 0 # Column index within the current Agda source code line + line = 1 + pre_code.traverse do |at| + # Text nodes are always leaves; visiting a new leaf means we've advanced + # in the text by the length of that text. However, if there are newlines + # -- since this is a preformatted block -- we also advanced by a line. + # At this time, do not support links that span multiple lines, but + # Agda doesn't produce those either. + if at.text? + if at.content.include? "\n" + raise "no support for links with newlines inside" if at.parent.name != "pre" + + # Increase the line and track the final offset. Written as a loop + # in case we eventually want to add some handling for the pieces + # sandwiched between newlines. + at.content.split("\n", -1).each_with_index do |bit, idx| + line += 1 unless idx == 0 + offset = bit.length + end + else + # It's not a newline node. Just adjust the offset within the plain text. + offset += at.content.length + end + elsif at.name == "a" + # Agda emits both links and things-to-link-to as 'a' nodes. + + line_info = line_infos.fetch(line) { line_infos[line] = [] } + href = at.attribute("href") + id = at.attribute("id") + if href or id + new_node = { :from => offset-at.content.length, :to => offset } + new_node[:href] = href if href + new_node[:id] = id if id + + line_info << new_node + end + end + end + return line_infos +end +``` + +This script takes an Agda HTML file and returns a map in which each line +of the Agda source code is associated with a list of ranges; the ranges +indicate links or places that can be linked to. For example, for the `ModX` +example above, the script might produce: + +```Ruby +3 => [ + { :from => 3, :to => 9, id => "..." }, # Agda creates nodes even for keywords. + { :from => 12, :to => 16, id => "ModX-id" }, # The IDs Agda generates aren't usually this nice. + { :from => 20, :to => 21, id => "x-id" }, +] +``` + + +{{< todo >}}This isn't as important probably, but might be worth talking about. {{< /todo >}} + +The most challenging step is probably to identify the Agda "projects" that need +to be built. Since different articles have different modules (possibly with +the same name), I would need to keep them separate. Also, I'm not ruling +out the possibility of one project including another as a submodule. To +make this work, I wrote a little Ruby script to find all Agda files, +guess their project folder, and invoke the Agda compiler there. It boils +down to something like this: + +```Ruby +# For each Agda file, find the most specific project / subproject to which +# it belongs. +files_for_paths = {} +Dir.glob("**/*.agda", base: root_path) do |agda_file| + best_path = max_path.call(agda_file) + files_for_path = files_for_paths.fetch(best_path) do + files_for_paths[best_path] = [] + end + + # Strip the project prefix from the Agda file's path, since + # Agda compiler will be invoked in the project's folder. + files_for_path << agda_file[best_path.length + File::SEPARATOR.length..-1] +end + +original_wd = Dir.getwd +files_for_paths.each do |path, files| + # There might be a cleaner way of doing this, but it's convenient. + Dir.chdir(original_wd) + Dir.chdir(File.join(root_path, path)) + + # Wherever the target directory is, create a folder that corresponds to + # the project being built, to avoid "cross-contaminating" the output + # folder with distinct modules with the same name. + html_dir = File.join [target_dir, path, "html"] + FileUtils.mkdir_p html_dir + + # Just shell out to Agda using the --html folder. + files.each do |file| + command = "#{ARGV[0]} --local-interfaces #{file} --html --html-dir=#{html_dir}" + puts command + puts `#{command}` + end +end +``` + +In short, it traverses all the folders in my `code` directory -- which is where +I keep my code, looking for Agda source files. Once it finds them,