--- title: "Integrating Agda's HTML Output with Hugo" date: 2024-05-30T00:29:26-07:00 tags: ["Agda", "Hugo", "Ruby"] --- One of my favorite things about Agda are its clickable HTML pages. If you don't know what they are, that's pages like [`Data.List.Properties`](https://agda.github.io/agda-stdlib/master/Data.List.Properties.html); they just give the code from a particular Agda file, but make every identifier clickable. Then, if you see some variable or function that you don't know, you can just click it and jump right to it! It makes exploring the documentation a lot smoother. I've found that these HTML pages provide all the information I need for writing proofs. Recently, I've been writing a fair bit about Agda; mostly about the patterns that I've learned about, such as the ["is something" pattern]({{< relref "agda_is_pattern" >}}) and the ["deeply embedded expression" trick]({{< relref "agda_expr_pattern" >}}). I've found myself wanting to click on definitions in my own code blocks; recently, I got this working, and I wanted to share how I did it, in case someone else wants to integrate Agda into their own static website. Though my stack is based on Hugo, the general idea should work with any other static site generator. ### TL;DR and Demo I wrote a script to transfer links from an Agda HTML file into Hugo's HTML output, making it possible to embellish "plain" Hugo output with Agda's 'go-to-definition links'. It looks like this. Here's an Agda code block defining an 'expression' data type, from a project of mine: {{< codelines "Agda" "agda-spa/Lattice/Map.agda" 543 546 >}} And here's the denotational semantics for that expression: {{< codelines "Agda" "agda-spa/Lattice/Map.agda" 586 589 >}} Notice that you can click `Expr`, `_∪_`, `⟦`, etc.! All of this integrates with my existing Hugo site, and only required a little bit of additional metadata to make it work. The conversion is implemented as [a Ruby script](https://dev.danilafe.com/Web-Projects/blog-static/src/commit/04f12b545d5692a78b1a2f13ef968417c749e295/agda.rb); this script transfers the link structure from an Agda-generated documentation HTML file onto lightly-annotated Hugo code blocks. To use the script, your Hugo theme (or your Markdown content) must annotate the code blocks with several properties: * `data-agda-block`, which marks code that needs to be processed. * `data-file-path`, which tells the script what Agda file provided the code in the block, and therefore what Agda HTML file should be searched for links. * `data-first-line` and `data-last-line`, which tell the script what section of the Agda HTML file should be searched for said links. Given this -- and a couple of other assumptions, such as that all Agda projects are in a `code/` folder, the script post-processes the HTML files automatically. Right now, the solution is pretty tailored to my site and workflow, but the core of the script -- the piece that transfers links from an Agda HTML file into a syntax-highlighted Hugo HTML block -- should be fairly reusable. Now, the details. ### The Constraints The goal was simple: to allow the code blocks on my Hugo-generated site to have links that take the user to the definition of a given symbol. Specifically, if the symbol occurs somewhere on the same blog page, the link should take the user there (and not to a regular `Module.html` file). That way, the reader can not only get to the code that they want to see, but also have a chance to read the surrounding prose in properly-rendered Markdown. Next, unlike standard "literate Agda" files, my blog posts are not single `.agda` files with Markdown in comments. Rather, I use regular Hugo Markdown, and present portions of an existing project, weaving together many files, and showing the fragments out of order. So, my tool needs to support links that come from distinct modules, in any order. Additionally, I've recently been writing a whole series about an Agda project of mine; in this series, I gradually build up to the final product, explaining one or two modules at a time. I would expect that links on pages in this series could jump to other pages in the same series: if I cover module `A` in part 1, then write `A.f` in part 2, clicking on `A` -- and maybe `f` -- should take the reader back to the first part's page; once again, this would help provide them with the surrounding explanation. Finally, I wanted the Agda code to appear exactly the same as any other code on my site, including the Hugo-provided syntax highlighting and theme. This ruled out just copy-pasting pieces of the Agda-generated HTML in place of code blocks on my page (and redirecting the links). Thought it was not a hard requirement, I also hoped to include Agda code in the same manner that I include all other code: [my `codelines` shortcode]({{< relref "codelines" >}}). In brief, the `codelines` shortcode creates a syntax-highlighted code block, as well as a surrounding "context" that says what file the code is from, which lines are listed, and where to find the full code (e.g., on my Git server). It looks something like this: {{< codelines "Agda" "agda-spa/Language.agda" 20 27 >}} In summary: 1. I want to create cross-links between symbols in Agda blocks in a blog post. 2. These code blocks could include code from disjoint files, and be out of order. 3. Code blocks among a whole series of posts should be cross-linked too. 4. The code blocks should be syntax highlighted the same way as the rest of the code on the site. 5. Ideally, I should be able to use my regular method for referencing code. I've hit all of these requirements; now it's time to dig into how I got there. ### Implementation #### Processing Agda's HTML Output It's pretty much a no-go to try to resolve Agda from Hugo, or perform some sort of "heuristic" to detect cross-links. Agda is a very complex programming language, and Hugo's templating engine, though powerful, is just not up to this task. Fortunately, Agda has support for [HTML output using the `--html` flag](https://agda.readthedocs.io/en/v2.6.4.3-r1/tools/generating-html.html). As a build step, I can invoke Agda on files that are referenced by my blog, and generate HTML. This would decidedly slow down the site build process, but it would guarantee accurate link information. On the other hand, to satisfy the 4th constraint, I need to somehow mimic -- or keep -- the format of Hugo's existing HTML output. The easiest way to do this without worrying about breaking changes and version incompatibility is to actually use the existing syntax-highlighted HTML, and annotate it with links as I discover them. Effectively, what I need to do is a "link transfer": I need to identify regions of code that are highlighted in Agda's HTML, find those regions in Hugo's HTML output, and mark them with links. In addition, I'll need to fix up the links themselves: the HTML output assumes that each Agda file is its own HTML page, but this is ruled out by the second constraint of mine. As a little visualization, the overall problems looks something like this: ````Agda {linenos=table} -- Agda's HTML output (blocks of 't' are links): -- |tttttt| |tttt| |t| |t| |ttttt| module ModX ( x : T ) where -- |tttttt| |tt|t| |t| |t| |ttttt| -- Hugo's HTML output (blocks of 't' are syntax highlighting spans) ```` Both Agda and Hugo output a preformatted code block, decorated with various inline HTMl that indicates information (token color for Hugo; symbol IDs and links in Agda). However, Agda and Hugo do not use the same process to create this decorated output; it's entirely possible -- and not uncommon -- for Hugo and Agda to produce misaligned HTML nodes. In my diagram above, this is reflected as `ModX` being considered a single token by Agda, but two tokens (`Mod` and `X`) by the syntax highlighter. As a result, it's difficult to naively iterate the two HTML formats in parallel. What I ended up doing is translating Agda's HTML output into offsets and data about the code block's _plain text_ -- the source code being decorated. Both the Agda and Hugo HTML describe the same code; thus, the plain text is the common denominator between the two. {#plain-text} I wrote a Ruby script to extract the decorations from the Agda output; here it is in slightly abridged form. You can find the [original `agda.rb` file here](https://dev.danilafe.com/Web-Projects/blog-static/src/commit/04f12b545d5692a78b1a2f13ef968417c749e295/agda.rb). ```Ruby # Traverse the preformatted Agda block in the given Agda HTML file # and find which textual ranges have IDs and links to other ranges. # Store this information in a hash, line => links[] def process_agda_html_file(file) document = Nokogiri::HTML.parse(File.open(file)) pre_code = document.css("pre.Agda")[0] # The traversal is postorder; we always visit children before their # parents, and we visit leaves in sequence. line_infos = [] offset = 0 # Column index within the current Agda source code line line = 1 pre_code.traverse do |at| # Text nodes are always leaves; visiting a new leaf means we've advanced # in the text by the length of that text. However, if there are newlines # -- since this is a preformatted block -- we also advanced by a line. # At this time, do not support links that span multiple lines, but # Agda doesn't produce those either. if at.text? if at.content.include? "\n" raise "no support for links with newlines inside" if at.parent.name != "pre" # Increase the line and track the final offset. Written as a loop # in case we eventually want to add some handling for the pieces # sandwiched between newlines. at.content.split("\n", -1).each_with_index do |bit, idx| line += 1 unless idx == 0 offset = bit.length end else # It's not a newline node. Just adjust the offset within the plain text. offset += at.content.length end elsif at.name == "a" # Agda emits both links and things-to-link-to as 'a' nodes. line_info = line_infos.fetch(line) { line_infos[line] = [] } href = at.attribute("href") id = at.attribute("id") if href or id new_node = { :from => offset-at.content.length, :to => offset } new_node[:href] = href if href new_node[:id] = id if id line_info << new_node end end end return line_infos end ``` This script takes an Agda HTML file and returns a map in which each line of the Agda source code is associated with a list of ranges; the ranges indicate links or places that can be linked to. For example, for the `ModX` example above, the script might produce: ```Ruby 3 => [ { :from => 3, :to => 9, id => "..." }, # Agda creates nodes even for keywords. { :from => 12, :to => 16, id => "ModX-id" }, # The IDs Agda generates aren't usually this nice. { :from => 20, :to => 21, id => "x-id" }, ] ``` #### Modifying Hugo's HTML Given such line information, the next step is to transfer it onto existing Hugo HTML files. Within a file, I've made my `codelines` shortcode emit custom attributes that can be used to find syntax-highlighted Agda code. The chief such attribute is `data-agda-block`; my script traverses all elements with this attribute. ```Ruby def process_source_file(file, document) # Process each highlight group that's been marked as an Agda file. document.css('div[data-agda-block]').each do |t| # ... ``` To figure out which Agda HTML file to use, and which lines to search for links, the script also expects some additional attributes. ```Ruby # ... first_line, last_line = nil, nil if first_line_attr = t.attribute("data-first-line") first_line = first_line_attr.to_s.to_i end if last_line_attr = t.attribute("data-last-line") last_line = last_line_attr.to_s.to_i end if first_line and last_line line_range = first_line..last_line else # no line number attributes = the code block contains the whole file line_range = 1.. end full_path = t.attribute("data-file-path").to_s # ... ``` At this point, the Agda file could be in some nested directory, like `A/B/C/File.agda`. However, the project root -- the place where Agda modules are compiled from -- could be any one of the folders `A`, `B`, or `C`. Thus, the fully qualified module name for `File.agda` could be `File`, `C.File`, `B.C.File`, or `A.B.C.File`. Since Agda's HTML output produces files named after the fully qualified module name, the script needs to guess what the module file is. This is where some conventions come in play: I keep my code in folders directly nested within a top-level `code` directory; thus, I'll have folders `project1` or `project2` inside `code`, and those will always be project roots. As a result, I guess that the first directory relative to `code` should be discarded, while the rest should be included in the path. The only exception to this is Git submodules: if an Agda file is included using a submodule, the root directory of the submodule is considered the Agda project root. My Hugo theme indicates the submodule using an additional `data-base-path` attribute; in all, that leads to the following logic: ```Ruby # ... full_path_dirs = Pathname(full_path).each_filename.to_a base_path = t.attribute("data-base-path").to_s base_dir_depth = 0 if base_path.empty? # No submodules were used. Assume code/ is the root. # The path of the file is given relative to `code`, so need # to strip only the one outermost directory. base_dir_depth = 1 base_path = full_path_dirs[0] else # The code is in a submodule. Assume that the base path / submodule # root is the Agda module root, ignore all folders before that. base_path_dirs = Pathname(base_path).each_filename.to_a base_dir_depth = base_path_dirs.length end # ... ``` With that, the script determines the actual HTML file path --- by assuming that there's an `html` folder in the same place as the Agda project root --- and runs the above `process_agda_html_file`: ```Ruby # ... dirs_in_base = full_path_dirs[base_dir_depth..-1] html_file = dirs_in_base.join(".").gsub(/\.agda$/, ".html") html_path = File.join(["code", base_path, "html", html_file]) agda_info = process_agda_html_file(html_path) # ... ``` The next step is specific to the output of Hugo's syntax highlighter, [Chroma](https://github.com/alecthomas/chroma). When line numbers are enabled -- and they are on my site -- Chroma generates a table that, at some point, contains a bunch of `span` HTML nodes, each with the `line` class. Each such `span` corresponds to a single line of output; naturally, the first one contains the code from `first_line`, the second from `first_line + 1`, and so on until `last_line`. This is quite convenient, because it saves the headache of counting newlines the way that the Agda processing code above has to. For each line of syntax-highlighted code, the script retrieves the corresponding list of links that were collected from the Agda HTML file. ```Ruby # ... lines = t.css("pre.chroma code[data-lang] .line") lines.zip(line_range).each do |line, line_no| line_info = agda_info[line_no] next unless line_info # ... ``` The subsequent traversal -- which picks out the plain text of the Agda file as [reasoned above](#plain-text) -- is very similar to the previous one. Here too there's an `offset` variable, which gets incremented with the length of a new plain text pieces. Since we know the lines match up to `span`s, there's no need to count newlines. ```Ruby # ... offset = 0 line.traverse do |lt| if lt.text? content = lt.content new_offset = offset + content.length # ... ``` At this point, we have a line number, and an offset within that line number that describes the portion of the text under consideration. We can traverse all the links for the line, and find ones that mark a piece of text somewhere in this range. For the time being -- since inserting overlapping spans is quite complicated -- I require the links to lie entirely within a particular plain text region. As a result, if Chroma splits a single Agda identifier into several tokens, it will not be linked. For now, this seems like the most conservative and safe approach. ```Ruby # ... matching_links = line_info.links.filter do |link| link[:from] >= offset and link[:to] <= new_offset end # ... ``` All that's left is to slice up the plain text fragment into a bunch of HTML pieces: the substrings that are links will turn into `a` HTML nodes, while the substrings that are "in between" the links will be left over as plain text nodes. The code to do so is relatively verbose, but not all that complicated. ```Ruby replace_with = [] replace_offset = 0 matching_links.each do |match| # The link's range is an offset from the beginning of the line, # but the text piece we're splitting up might be partway into # the line. Convert the link coordinates to piece-relative ones. relative_from = match[:from] - offset relative_to = match[:to] - offset # If the previous link ended some time before the new link # began (or if the current link is the first one, and is not # at the beginning), ensure that the plain text "in between" # is kept. replace_with << content[replace_offset...relative_from] tag = (match.include? :href) ? 'a' : 'span' new_node = Nokogiri::XML::Node.new(tag, document) if match.include? :href # For nodes with links, note what they're referring to, so # we can adjust their hrefs when we assign global IDs. href = match[:href].to_s new_node['href'] = note_used_href file, new_node, href end if match.include? :id # For nodes with IDs visible in the current Hugo file, we'll # want to redirect links that previously go to other Agda # module HTML files. So, note the ID that we want to redirect, # and pick a new unique ID to replace it with. id = match[:id].to_s new_node['id'] = note_defined_href file, "#{html_file}##{id}" end new_node.content = content[relative_from...relative_to] replace_with << new_node replace_offset = relative_to end replace_with << content[replace_offset..-1] ``` There's a little bit of a subtlety in the above code: specifically, I use the `note_used_href` and `note_defined_href` methods. These are important for rewriting links. Like I mentioned earlier, Agda's HTML output assumes that each source file should produce a single HTML file -- named after its qualified module -- and creates links accordingly. However, my blog posts interweave multiple source files. Some links that would've jumped to a different file must now point to an internal identifier within the page. Another important aspect of the transformation is that, since I'm pulling HTML files from distinct files, it's not guaranteed that each of them will have a unique `id` attribute. After all, Agda just assigns sequential numbers to each node that it generates; it would only take, e.g., including the first line from two distinct modules to end up with two nodes with `id="1"`. The solution is then twofold: 1. Track all the nodes referencing a particular `href` (made up of an HTML file and a numerical identifier, like `File.html#123`). When we pick new IDs -- thus guaranteeing their uniqueness -- we'll visit all the nodes that refer to the old ID and HTML file, and update their `href`. 2. Track all existing Agda HTML IDs that we're inserting. If we transfer an `` onto the Hugo content, we know we'll need to pick a new ID for it (since `1234` need not be unique), and that we'll need to redirect the other links to that new ID as the previous bullet describes. Here's how these two methods work: ```Ruby def note_defined_href(file, href) file_hrefs = @local_seen_hrefs.fetch(file) do @local_seen_hrefs[file] = {} end uniq_id = file_hrefs.fetch(href) do new_id = "agda-unique-ident-#{@id_counter}" @id_counter += 1 file_hrefs[href] = new_id end unless @global_seen_hrefs.include? href @global_seen_hrefs[href] = { :file => file, :id => uniq_id } end return uniq_id end def note_used_href(file, node, href) ref_list = @nodes_referencing_href.fetch(href) { @nodes_referencing_href[href] = [] } ref_list << { :file => file, :node => node } return href end ``` Note that they use class variables: these are methods on a `FileGroup` class. I've omitted the various classes I've declared from the above code for brevity, but here it makes sense to show them. Like I mentioned earlier, you can view the [complete code here](https://dev.danilafe.com/Web-Projects/blog-static/src/commit/6a168f2fe144850ed3a81b796e07266cbf80f382/agda.rb). Interestingly, `note_defined_href` makes use of _two_ global maps: `@local_seen_hrefs` and `@global_seen_hrefs`. This helps satisfy the third constraint above, which is linking between code defined in the same series. The logic is as follows: when rewriting a link to a new HTML file and ID, if the code we're trying to link to exists on the current page, we should link to that. Otherwise, if the code we're trying to link to was presented in a different part of the series, then we should link to that other part. So, we consult the "local" map for `href`s that will be rewritten to HTML nodes in the current file, and as a fallback, consult the "global" map for `hrefs` that were introduced in other parts. The `note_defined_href` populates both maps, and is "biased" towards the first occurrence of a piece of code: if posts A and B define a function `f`, and post C only references `f`, then that link will go to post A's definition, which came earlier. The other method, `note_used_href`, is simpler. It just appends to a list of Nokogiri HTML nodes that reference a given `href`. We keep track of the file in which the reference occurred so we can be sure to consult the right sub-map of `@local_seen_hrefs` when checking for in-page rewrites. After running `process_source_file` on all Hugo HTML files within a particular series, the following holds true: * We have inserted `span` or `a` nodes wherever Agda's original output had nodes with `id` or `href` elements. This is with the exception of the case where Hugo's inline HTML doesn't "line up" with Agda's inline HTML, which I've only found to happen when the leading character of an identifier is a digit. * We have picked new IDs for each HTML node we inserted that had an ID, noting them both globally and for the current file. We noted their original `href` value (in the form `File.html#123`) and that it should be transformed into our globally-unique identifiers, in the form `agda-unique-ident-1234`. * For each HTML node we inserted that links to another, we noted the `href` of the reference (also in the form `File.html#123`). Now, all that's left is to redirect the `href`s of the nodes we inserted from their old values to the new ones. I do this by iterating over `@nodes_referencing_href`, which contains every link we inserted. ```Ruby def cross_link_files @nodes_referencing_href.each do |href, references| references.each do |reference| file = reference[:file] node = reference[:node] local_targets = @local_seen_hrefs[file] if local_targets.include? href # A code block in this file provides this href, create a local link. node['href'] = "##{local_targets[href]}" elsif @global_seen_hrefs.include? href # A code block in this series, but not in this file, defines # this href. Create a cross-file link. target = @global_seen_hrefs[href] other_file = target[:file] id = target[:id] relpath = Pathname.new(other_file).dirname.relative_path_from(Pathname.new(file).dirname) node['href'] = "#{relpath}##{id}" else # No definitions in any blog page. For now, just delete the anchor. node.replace node.content end end end end ``` Notice that for the time being, I simply remove links to Agda definitions that didn't occur in the Hugo post. Ideally, this would link to the plain, non-blog documentation page generated by Agda; however, this requires either hosting those documentation pages, or expecting the Agda standard library HTML pages to remain stable and hosted at a fixed URL. Neither was simple enough to do, so I opted for the conservative "just don't insert links" approach. And that's all of the approach that I wanted to show off today! There are other details, like finding posts in the same series (I achieve this with a `meta` element) and invoking `agda --html` on the necessary source files (my [`build-agda-html.rb`](https://dev.danilafe.com/Web-Projects/blog-static/src/branch/master/build-agda-html.rb) script is how I personally do this), but I don't think it's all that valuable to describe them here. Unfortunately, the additional metadata I had my theme insert makes it harder for others to use this approach out of the box. However, I hope that by sharing my experience, others who write Agda and post about it might be able to get a similar solution working. And of course, it's always fun to write about a recent project or endeavor. Happy (dependently typed) programming and blogging!