--- title: "Integrating Agda's HTML Output with Hugo" date: 2024-05-25T21:02:10-07:00 draft: true tags: ["Agda", "Hugo", "Ruby", "Nix"] --- One of my favorite things about Agda are its clickable HTML pages. If you don't know what they are, that's pages like [`Data.List.Properties`](https://agda.github.io/agda-stdlib/master/Data.List.Properties.html); they just give the code from a particular Agda file, but make every identifier clickable. Then, if you see some variable or function that you don't know, you can just click it and jump right to it! It makes exploring the documentation a lot smoother. I've found that these HTML pages provide all the information I need for writing proofs. Recently, I've been writing a fair bit about Agda; mostly about the patterns that I've learned about, such as the ["is something" pattern]({{< relref "agda_is_pattern" >}}) and the ["deeply embedded expression" trick]({{< relref "agda_expr_pattern" >}}). I've found myself wanting to click on definitions in my own code blocks; recently, I got this working, and I wanted to share how I did it, in case someone else wants to integrate Agda into their own static website. Though my stack is based on Hugo, the general idea should work with any other static site generator. ### TL;DR and Demo I wrote a script to transfer links from an Agda HTML file into Hugo's HTML output, making it possible to embellish "plain" Hugo output with Agda's 'go-to-definition links'. It looks like this. Here's an Agda code block defining an 'expression' data type, from a project of mine: {{< codelines "Agda" "agda-spa/Lattice/Map.agda" 543 546 >}} And here's the denotational semantics for that expression: {{< codelines "Agda" "agda-spa/Lattice/Map.agda" 586 589 >}} Notice that you can click `Expr`, `_∪_`, `⟦`, etc.! All of this integrates with my existing Hugo site, and only required a little bit of additional metadata to make it work. Now, the details. Right now, the solution is pretty tailored to my site and workflow, but the core of the script -- a piece that transfers links from an Agda HTML file into a syntax highlighted Hugo HTML block -- should be fairly reusable. ### The Constraints The goal was simple: to allow the code blocks on my Hugo-generated site to have links that take the user to the definition of a given symbol. Specifically, if the symbol occurs somewhere on the same blog page, the link should take the user there (and not to a regular `Module.html` file). That way, the reader can not only get to the code that they want to see, but also have a chance to read the surrounding prose in properly-rendered Markdown. Next, unlike standard "literate Agda" files, my blog posts are not single `.agda` files with Markdown in comments. Rather, I use regular Hugo Markdown, and present portions of an existing project, weaving together many files, and showing the fragments out of order. So, my tool needs to support links that come from distinct modules, in any order. Additionally, I've recently been writing a whole series about an Agda project of mine; in this series, I gradually build up to the final product, explaining one or two modules at a time. I would expect that links on pages in this series could jump to other pages in the same series: if I cover module `A` in part 1, then write `A.f` in part 2, clicking on `A` -- and maybe `f` -- should take the reader back to the first part's page; once again, this would help provide them with the surrounding explanation. Finally, I wanted the Agda code to appear exactly the same as any other code on my site, including the Hugo-provided syntax highlighting and theme. This ruled out just copy-pasting pieces of the Agda-generated HTML in place of code blocks on my page (and redirecting the links). Thought it was not a hard requirement, I also hoped to include Agda code in the same manner that I include all other code: [my `codelines` shortcode]({{< relref "codelines" >}}). In brief, the `codelines` shortcode creates a syntax-highlighted code block, as well as a surrounding "context" that says what file the code is from, which lines are listed, and where to find the full code (e.g., on my Git server). It looks something like this: {{< codelines "Agda" "agda-spa/Language.agda" 20 27 >}} In summary: 1. I want to create cross-links between symbols in Agda blocks in a blog post. 2. These code blocks could include code from disjoint files, and be out of order. 3. Code blocks among a whole series of posts should be cross-linked too. 4. The code blocks should be syntax highlighted the same way as the rest of the code on the site. 5. Ideally, I should be able to use my regular method for referencing code. I've hit all of these requirements; now it's time to dig into how I got there. ### Implementation It's pretty much a no-go to try to resolve Agda from Hugo, or perform some sort of "heuristic" to detect cross-links. Agda is a very complex programming language, and Hugo's templating engine, though powerful, is just not up to this task. Fortunately, Agda has support for [HTML output using the `--html` flag](https://agda.readthedocs.io/en/v2.6.4.3-r1/tools/generating-html.html). As a build step, I can invoke Agda on files that are referenced by my blog, and generate HTML. This would decidedly slow down the site build process, but it would guarantee accurate link information. On the other hand, to satisfy the 4th constraint, I need to somehow mimic -- or keep -- the format of Hugo's existing HTML output. The easiest way to do this without worrying about breaking changes and version incompatibility is to actually use the existing syntax-highlighted HTML, and annotate it with links as I discover them. Effectively, what I need to do is a "link transfer": I need to identify regions of code that are highlighted in Agda's HTML, find those regions in Hugo's HTML output, and mark them with links. In addition, I'll need to fix up the links themselves: the HTML output assumes that each Agda file is its own HTML page, but this is ruled out by the second constraint. As a little visualization, the overall problems looks something like this: ````Agda {linenos=table} -- Agda's HTML output (blocks of 't' are links): -- |tttttt| |tttt| |t| |t| |ttttt| module ModX ( x : T ) where -- |tttttt| |tt|t| |t| |t| |ttttt| -- Hugo's HTML output (blocks of 't' are syntax highlighting spans) ```` Both Agda and Hugo output a preformatted code block, decorated with various inline HTMl that indicates information (token color for Hugo; symbol IDs and links in Agda). However, Agda and Hugo do not use the same process to create this decorated output; it's entirely possible -- and not uncommon -- for Hugo and Agda to produce misaligned HTML nodes. In my diagram above, this is reflected as `ModX` being considered a single token by Agda, but two tokens (`Mod` and `X`) by the syntax highlighter. As a result, it's difficult to naively iterate the two HTML formats in parallel. What I ended up doing is translating Agda's HTML output into offsets and data about the code block's _plain text_ -- the source code being decorated. Both the Agda and Hugo HTML describe the same code; thus, the plain text is the common denominator between the two. I wrote a Ruby script to extract the decorations from the Agda output; here it is in slightly abridged form. You can find the [original `agda.rb` file here](https://dev.danilafe.com/Web-Projects/blog-static/src/commit/04f12b545d5692a78b1a2f13ef968417c749e295/agda.rb). ```Ruby # Traverse the preformatted Agda block in the given Agda HTML file # and find which textual ranges have IDs and links to other ranges. # Store this information in a hash, line => links[] def process_agda_html_file(file) document = Nokogiri::HTML.parse(File.open(file)) pre_code = document.css("pre.Agda")[0] # The traversal postorder; we always visit children before their # parents, and we visit leaves in sequence. line_infos = [] offset = 0 # Column index within the current Agda source code line line = 1 pre_code.traverse do |at| # Text nodes are always leaves; visiting a new leaf means we've advanced # in the text by the length of that text. However, if there are newlines # -- since this is a preformatted block -- we also advanced by a line. # At this time, do not support links that span multiple lines, but # Agda doesn't produce those either. if at.text? if at.content.include? "\n" raise "no support for links with newlines inside" if at.parent.name != "pre" # Increase the line and track the final offset. Written as a loop # in case we eventually want to add some handling for the pieces # sandwiched between newlines. at.content.split("\n", -1).each_with_index do |bit, idx| line += 1 unless idx == 0 offset = bit.length end else # It's not a newline node. Just adjust the offset within the plain text. offset += at.content.length end elsif at.name == "a" # Agda emits both links and things-to-link-to as 'a' nodes. line_info = line_infos.fetch(line) { line_infos[line] = [] } href = at.attribute("href") id = at.attribute("id") if href or id new_node = { :from => offset-at.content.length, :to => offset } new_node[:href] = href if href new_node[:id] = id if id line_info << new_node end end end return line_infos end ``` This script takes an Agda HTML file and returns a map in which each line of the Agda source code is associated with a list of ranges; the ranges indicate links or places that can be linked to. For example, for the `ModX` example above, the script might produce: ```Ruby 3 => [ { :from => 3, :to => 9, id => "..." }, # Agda creates nodes even for keywords. { :from => 12, :to => 16, id => "ModX-id" }, # The IDs Agda generates aren't usually this nice. { :from => 20, :to => 21, id => "x-id" }, ] ``` {{< todo >}}This isn't as important probably, but might be worth talking about. {{< /todo >}} The most challenging step is probably to identify the Agda "projects" that need to be built. Since different articles have different modules (possibly with the same name), I would need to keep them separate. Also, I'm not ruling out the possibility of one project including another as a submodule. To make this work, I wrote a little Ruby script to find all Agda files, guess their project folder, and invoke the Agda compiler there. It boils down to something like this: ```Ruby # For each Agda file, find the most specific project / subproject to which # it belongs. files_for_paths = {} Dir.glob("**/*.agda", base: root_path) do |agda_file| best_path = max_path.call(agda_file) files_for_path = files_for_paths.fetch(best_path) do files_for_paths[best_path] = [] end # Strip the project prefix from the Agda file's path, since # Agda compiler will be invoked in the project's folder. files_for_path << agda_file[best_path.length + File::SEPARATOR.length..-1] end original_wd = Dir.getwd files_for_paths.each do |path, files| # There might be a cleaner way of doing this, but it's convenient. Dir.chdir(original_wd) Dir.chdir(File.join(root_path, path)) # Wherever the target directory is, create a folder that corresponds to # the project being built, to avoid "cross-contaminating" the output # folder with distinct modules with the same name. html_dir = File.join [target_dir, path, "html"] FileUtils.mkdir_p html_dir # Just shell out to Agda using the --html folder. files.each do |file| command = "#{ARGV[0]} --local-interfaces #{file} --html --html-dir=#{html_dir}" puts command puts `#{command}` end end ``` In short, it traverses all the folders in my `code` directory -- which is where I keep my code, looking for Agda source files. Once it finds them,