2 changed files with 46 additions and 349 deletions
--- a/content/blog/agda_hugo.md
+++ b/content/blog/agda_hugo.md
@ -1,7 +1,8 @@
 ---
 title: "Integrating Agda's HTML Output with Hugo"
-date: 2024-05-30T00:29:26-07:00
+date: 2024-05-25T21:02:10-07:00
-tags: ["Agda", "Hugo", "Ruby"]
+draft: true
 tags: ["Agda", "Hugo", "Ruby", "Nix"]
 ---
 One of my favorite things about Agda are its clickable HTML pages. If you don't
@ -36,29 +37,13 @@ And here's the denotational semantics for that expression:
 Notice that you can click `Expr`, `_∪_`, `⟦`, etc.! All of this integrates
 with my existing Hugo site, and only required a little bit of additional
-metadata to make it work. The conversion is implemented as
+metadata to make it work.
 [a Ruby script](https://dev.danilafe.com/Web-Projects/blog-static/src/commit/04f12b545d5692a78b1a2f13ef968417c749e295/agda.rb);
 this script transfers the link structure from an Agda-generated documentation
 HTML file onto lightly-annotated Hugo code blocks.
-To use the script, your Hugo theme (or your Markdown content) must
+Now, the details. Right now, the solution is pretty tailored to my site
-annotate the code blocks with several properties:
+and workflow, but the core of the script -- a piece that transfers links
-* `data-agda-block`, which marks code that needs to be processed.
+from an Agda HTML file into a syntax highlighted Hugo HTML block -- should
 * `data-file-path`, which tells the script what Agda file provided the
  code in the block, and therefore what Agda HTML file should be searched
  for links.
 * `data-first-line` and `data-last-line`, which tell the script what
  section of the Agda HTML file should be searched for said links.
 Given this -- and a couple of other assumptions, such as that all Agda
 projects are in a `code/<project>` folder, the script post-processes
 the HTML files automatically. Right now, the solution is pretty tailored to my
 site and workflow, but the core of the script -- the piece that transfers links
 from an Agda HTML file into a syntax-highlighted Hugo HTML block -- should
 be fairly reusable.
 Now, the details.
 ### The Constraints
 The goal was simple: to allow the code blocks on my Hugo-generated site to
 have links that take the user to the definition of a given symbol.
@ -107,7 +92,6 @@ I've hit all of these requirements; now it's time to dig into how I got there.
 ### Implementation
 #### Processing Agda's HTML Output
 It's pretty much a no-go to try to resolve Agda from Hugo, or perform some
 sort of "heuristic" to detect cross-links. Agda is a very complex programming
 language, and Hugo's templating engine, though powerful, is just not
@ -126,7 +110,7 @@ with links as I discover them. Effectively, what I need to do is a "link
 transfer": I need to identify regions of code that are highlighted in Agda's HTML,
 find those regions in Hugo's HTML output, and mark them with links. In addition,
 I'll need to fix up the links themselves: the HTML output assumes that each
-Agda file is its own HTML page, but this is ruled out by the second constraint of mine.
+Agda file is its own HTML page, but this is ruled out by the second constraint.
 As a little visualization, the overall problems looks something like this:
@ -151,7 +135,6 @@ What I ended up doing is translating Agda's HTML output into offsets and data
 about the code block's _plain text_ -- the source code being decorated.
 Both the Agda and Hugo HTML describe the same code; thus, the plain text
 is the common denominator between the two.
 {#plain-text}
 I wrote a Ruby script to extract the decorations from the Agda output; here
 it is in slightly abridged form. You can find the [original `agda.rb` file here](https://dev.danilafe.com/Web-Projects/blog-static/src/commit/04f12b545d5692a78b1a2f13ef968417c749e295/agda.rb).
@ -164,7 +147,7 @@ def process_agda_html_file(file)
  document = Nokogiri::HTML.parse(File.open(file))
  pre_code = document.css("pre.Agda")[0]
-  # The traversal is postorder; we always visit children before their
+  # The traversal postorder; we always visit children before their
  # parents, and we visit leaves in sequence.
  line_infos = []
  offset = 0 # Column index within the current Agda source code line
@ -222,338 +205,52 @@ example above, the script might produce:
 ]
 ```
 #### Modifying Hugo's HTML
-Given such line information, the next step is to transfer it onto existing
+{{< todo >}}This isn't as important probably, but might be worth talking about. {{< /todo >}}
-Hugo HTML files. Within a file, I've made my `codelines` shortcode emit
+
-custom attributes that can be used to find syntax-highlighted Agda code.
+The most challenging step is probably to identify the Agda "projects" that need
-The chief such attribute is `data-agda-block`; my script traverses all
+to be built. Since different articles have different modules (possibly with
-elements with this attribute.
+the same name), I would need to keep them separate. Also, I'm not ruling
 out the possibility of one project including another as a submodule. To
 make this work, I wrote a little Ruby script to find all Agda files,
 guess their project folder, and invoke the Agda compiler there. It boils
 down to something like this:
 ```Ruby
-def process_source_file(file, document)
+# For each Agda file, find the most specific project / subproject to which
-  # Process each highlight group that's been marked as an Agda file.
+# it belongs.
-  document.css('div[data-agda-block]').each do |t|
+files_for_paths = {}
-    # ...
+Dir.glob("**/*.agda", base: root_path) do |agda_file|
-```
+  best_path = max_path.call(agda_file)
-
+  files_for_path = files_for_paths.fetch(best_path) do
-To figure out which Agda HTML file to use, and which lines to search for links,
+    files_for_paths[best_path] = []
 the script also expects some additional attributes.
 ```Ruby
    # ...
    first_line, last_line = nil, nil
    if first_line_attr = t.attribute("data-first-line")
      first_line = first_line_attr.to_s.to_i
    end
    if last_line_attr = t.attribute("data-last-line")
      last_line = last_line_attr.to_s.to_i
  end
-    if first_line and last_line
+  # Strip the project prefix from the Agda file's path, since
-      line_range = first_line..last_line
+  # Agda compiler will be invoked in the project's folder.
-    else
+  files_for_path << agda_file[best_path.length + File::SEPARATOR.length..-1]
      # no line number attributes = the code block contains the whole file
      line_range = 1..
 end
-    full_path = t.attribute("data-file-path").to_s
+original_wd = Dir.getwd
-    # ...
+files_for_paths.each do |path, files|
-```
+  # There might be a cleaner way of doing this, but it's convenient.
  Dir.chdir(original_wd)
  Dir.chdir(File.join(root_path, path))
-At this point, the Agda file could be in some nested directory, like
+  # Wherever the target directory is, create a folder that corresponds to
-`A/B/C/File.agda`. However, the project root -- the place where Agda modules
+  # the project being built, to avoid "cross-contaminating" the output
-are compiled from -- could be any one of the folders `A`, `B`, or `C`.
+  # folder with distinct modules with the same name.
-Thus, the fully qualified module name for `File.agda` could be `File`,
+  html_dir = File.join [target_dir, path, "html"]
-`C.File`, `B.C.File`, or `A.B.C.File`. Since Agda's HTML output produces
+  FileUtils.mkdir_p html_dir
 files named after the fully qualified module name, the script needs to guess
 what the module file is. This is where some conventions come in play:
 I keep my code in folders directly nested within a top-level `code` directory;
 thus, I'll have folders `project1` or `project2` inside `code`, and those
 will always be project roots. As a result,
 I guess that the first directory relative to `code` should be discarded,
 while the rest should be included in the path. The only exception to this is
 Git submodules: if an Agda file is included using a submodule, the root
 directory of the submodule is considered the Agda project root. My Hugo theme
 indicates the submodule using an additional `data-base-path` attribute; in all,
 that leads to the following logic:
-```Ruby
+  # Just shell out to Agda using the --html folder.
-    # ...
+  files.each do |file|
-    full_path_dirs = Pathname(full_path).each_filename.to_a
+    command = "#{ARGV[0]} --local-interfaces #{file} --html --html-dir=#{html_dir}"
-    base_path = t.attribute("data-base-path").to_s
+    puts command
-    base_dir_depth = 0
+    puts `#{command}`
    if base_path.empty?
      # No submodules were used. Assume code/<X> is the root.
      # The path of the file is given relative to `code`, so need
      # to strip only the one outermost directory.
      base_dir_depth = 1
      base_path = full_path_dirs[0]
    else
      # The code is in a submodule. Assume that the base path / submodule
      # root is the Agda module root, ignore all folders before that.
      base_path_dirs = Pathname(base_path).each_filename.to_a
      base_dir_depth = base_path_dirs.length
    end
    # ...
 ```
 With that, the script determines the actual HTML file path ---
 by assuming that there's an `html` folder in the same place as the Agda
 project root ---  and runs the above `process_agda_html_file`:
 ```Ruby
    # ...
    dirs_in_base = full_path_dirs[base_dir_depth..-1]
    html_file = dirs_in_base.join(".").gsub(/\.agda$/, ".html")
    html_path = File.join(["code", base_path, "html", html_file])
    agda_info = process_agda_html_file(html_path)
    # ...
 ```
 The next step is specific to the output of Hugo's syntax highlighter,
 [Chroma](https://github.com/alecthomas/chroma). When line numbers are enabled
 -- and they are on my site -- Chroma generates a table that, at some point,
 contains a bunch of `span` HTML nodes, each with the `line` class. Each
 such `span` corresponds to a single line of output; naturally, the first
 one contains the code from `first_line`, the second from `first_line + 1`,
 and so on until `last_line`. This is quite convenient, because it saves the
 headache of counting newlines the way that the Agda processing code above has to.
 For each line of syntax-highlighted code, the script retrieves the corresponding
 list of links that were collected from the Agda HTML file.
 ```Ruby
    # ...
    lines = t.css("pre.chroma code[data-lang] .line")
    lines.zip(line_range).each do |line, line_no|
      line_info = agda_info[line_no]
      next unless line_info
      # ...
 ```
 The subsequent traversal -- which picks out the plain text of the Agda file
 as [reasoned above](#plain-text) -- is very similar to the previous
 one. Here too there's an `offset` variable, which gets incremented with
 the length of a new plain text pieces. Since we know the lines match up
 to `span`s, there's no need to count newlines.
 ```Ruby
      # ...
      offset = 0
      line.traverse do |lt|
        if lt.text?
          content = lt.content
          new_offset = offset + content.length
          # ...
 ```
 At this point, we have a line number, and an offset within that line number
 that describes the portion of the text under consideration. We can
 traverse all the links for the line, and find ones that mark a piece of
 text somewhere in this range. For the time being -- since inserting overlapping
 spans is quite complicated -- I require the links to lie entirely within a
 particular plain text region. As a result, if Chroma splits a single Agda
 identifier into several tokens, it will not be linked. For now, this seems
 like the most conservative and safe approach.
 ```Ruby
          # ...
          matching_links = line_info.links.filter do |link|
            link[:from] >= offset and link[:to] <= new_offset
          end
          # ...
 ```
 All that's left is to slice up the plain text fragment into a bunch of HTML
 pieces: the substrings that are links will turn into `a` HTML nodes, while
 the substrings that are "in between" the links will be left over as plain
 text nodes. The code to do so is relatively verbose, but not all that complicated.
 ```Ruby
          replace_with = []
          replace_offset = 0
          matching_links.each do |match|
            # The link's range is an offset from the beginning of the line,
            # but the text piece we're splitting up might be partway into
            # the line. Convert the link coordinates to piece-relative ones.
            relative_from = match[:from] - offset
            relative_to = match[:to] - offset
            # If the previous link ended some time before the new link
            # began (or if the current link is the first one, and is not
            # at the beginning), ensure that the plain text "in between"
            # is kept.
            replace_with << content[replace_offset...relative_from]
            tag = (match.include? :href) ? 'a' : 'span'
            new_node = Nokogiri::XML::Node.new(tag, document)
            if match.include? :href
              # For nodes with links, note what they're referring to, so
              # we can adjust their hrefs when we assign global IDs.
              href = match[:href].to_s
              new_node['href'] = note_used_href file, new_node, href
            end
            if match.include? :id
              # For nodes with IDs visible in the current Hugo file, we'll
              # want to redirect links that previously go to other Agda
              # module HTML files. So, note the ID that we want to redirect,
              # and pick a new unique ID to replace it with.
              id = match[:id].to_s
              new_node['id'] = note_defined_href file, "#{html_file}##{id}"
            end
            new_node.content = content[relative_from...relative_to]
            replace_with << new_node
            replace_offset = relative_to
          end
          replace_with << content[replace_offset..-1]
 ```
 There's a little bit of a subtlety in the above code: specifically, I use
 the `note_used_href` and `note_defined_href` methods. These are important
 for rewriting links. Like I mentioned earlier, Agda's HTML output assumes
 that each source file should produce a single HTML file -- named after its
 qualified module -- and creates links accordingly. However, my blog posts
 interweave multiple source files. Some links that would've jumped to a different
 file must now point to an internal identifier within the page. Another
 important aspect of the transformation is that, since I'm pulling HTML files
 from distinct files, it's not guaranteed that each of them will have a unique
 `id` attribute. After all, Agda just assigns sequential numbers to each
 node that it generates; it would only take, e.g., including the first line
 from two distinct modules to end up with two nodes with `id="1"`. 
 The solution is then twofold:
 1. Track all the nodes referencing a particular `href` (made up of an HTML
   file and a numerical identifier, like `File.html#123`). When we pick
   new IDs -- thus guaranteeing their uniqueness -- we'll visit all the
   nodes that refer to the old ID and HTML file, and update their `href`.
 2. Track all existing Agda HTML IDs that we're inserting. If we transfer
   an `<a id="1234">` onto the Hugo content, we know we'll need to pick a new
   ID for it (since `1234` need not be unique), and that we'll need to redirect
   the other links to that new ID as the previous bullet describes.
 Here's how these two methods work:
 ```Ruby
 def note_defined_href(file, href)
  file_hrefs = @local_seen_hrefs.fetch(file) do
    @local_seen_hrefs[file] = {}
  end
  uniq_id = file_hrefs.fetch(href) do
    new_id = "agda-unique-ident-#{@id_counter}"
    @id_counter += 1
    file_hrefs[href] = new_id
  end
  unless @global_seen_hrefs.include? href
    @global_seen_hrefs[href] = { :file => file, :id => uniq_id }
  end
  return uniq_id
 end
 def note_used_href(file, node, href)
  ref_list = @nodes_referencing_href.fetch(href) { @nodes_referencing_href[href] = [] }
  ref_list << { :file => file, :node => node }
  return href
 end
 ```
 Note that they use class variables: these are methods on a `FileGroup` class.
 I've omitted the various classes I've declared from the above code for brevity,
 but here it makes sense to show them. Like I mentioned earlier, you can
 view the [complete code here](https://dev.danilafe.com/Web-Projects/blog-static/src/commit/6a168f2fe144850ed3a81b796e07266cbf80f382/agda.rb).
 Interestingly, `note_defined_href` makes use of _two_ global maps:
 `@local_seen_hrefs` and `@global_seen_hrefs`. This helps satisfy the third
 constraint above, which is linking between code defined in the same series.
 The logic is as follows: when rewriting a link to a new HTML file and ID,
 if the code we're trying to link to exists on the current page, we should link
 to that. Otherwise, if the code we're trying to link to was presented in
 a different part of the series, then we should link to that other part.
 So, we consult the "local" map for `href`s that will be rewritten to HTML
 nodes in the current file, and as a fallback, consult the "global" map for
 `hrefs` that were introduced in other parts. The `note_defined_href` populates
 both maps, and is "biased" towards the first occurrence of a piece of code:
 if posts A and B define a function `f`, and post C only references `f`, then
 that link will go to post A's definition, which came earlier.
 The other method, `note_used_href`, is simpler. It just appends to a list
 of Nokogiri HTML nodes that reference a given `href`. We keep track of the file
 in which the reference occurred so we can be sure to consult the right sub-map
 of `@local_seen_hrefs` when checking for in-page rewrites.
 After running `process_source_file` on all Hugo HTML files within a particular
 series, the following holds true:
 * We have inserted `span` or `a` nodes wherever Agda's original output
  had nodes with `id` or `href` elements. This is with the exception of the
  case where Hugo's inline HTML doesn't "line up" with Agda's inline HTML,
  which I've only found to happen when the leading character of an identifier is a digit.
 * We have picked new IDs for each HTML node we inserted that had an ID,
  noting them both globally and for the current file. We noted their original
  `href` value (in the form `File.html#123`) and that it should be transformed
  into our globally-unique identifiers, in the form `agda-unique-ident-1234`.
 * For each HTML node we inserted that links to another, we noted the `href`
  of the reference (also in the form `File.html#123`).
 Now, all that's left is to redirect the `href`s of the nodes we inserted
 from their old values to the new ones. I do this by iterating over `@nodes_referencing_href`,
 which contains every link we inserted.
 ```Ruby
 def cross_link_files
  @nodes_referencing_href.each do |href, references|
    references.each do |reference|
      file = reference[:file]
      node = reference[:node]
      local_targets = @local_seen_hrefs[file]
      if local_targets.include? href
        # A code block in this file provides this href, create a local link.
        node['href'] = "##{local_targets[href]}"
      elsif @global_seen_hrefs.include? href
        # A code block in this series, but not in this file, defines
        # this href. Create a cross-file link.
        target = @global_seen_hrefs[href]
        other_file = target[:file]
        id = target[:id]
        relpath = Pathname.new(other_file).dirname.relative_path_from(Pathname.new(file).dirname)
        node['href'] = "#{relpath}##{id}"
      else
        # No definitions in any blog page. For now, just delete the anchor.
        node.replace node.content
      end
    end
  end
 end
 ```
-Notice that for the time being, I simply remove links to Agda definitions that
+In short, it traverses all the folders in my `code` directory -- which is where
-didn't occur in the Hugo post. Ideally, this would link to the plain, non-blog
+I keep my code, looking for Agda source files. Once it finds them,
 documentation page generated by Agda; however, this requires either hosting
 those documentation pages, or expecting the Agda standard library HTML pages
 to remain stable and hosted at a fixed URL. Neither was simple enough to do,
 so I opted for the conservative "just don't insert links" approach.
 And that's all of the approach that I wanted to show off today! There
 are other details, like finding posts in the same series (I achieve this
 with a `meta` element) and invoking `agda --html` on the necessary source files
 (my [`build-agda-html.rb`](https://dev.danilafe.com/Web-Projects/blog-static/src/branch/master/build-agda-html.rb)
 script is how I personally do this), but I don't think it's all that valuable
 to describe them here.
 Unfortunately, the additional metadata I had my theme insert makes it
 harder for others to use this approach out of the box. However, I hope that by
 sharing my experience, others who write Agda and post about it might be able
 to get a similar solution working. And of course, it's always fun to write
 about a recent project or endeavor.
 Happy (dependently typed) programming and blogging!
--- a/themes/vanilla
+++ b/themes/vanilla
@ -1 +1 @@
-Subproject commit 97e965feece9fd3d3a5328156c043bd2a0cca80f
+Subproject commit 5bfbaf397b26e20b9f2e8f7e1246674e2c2417b3
		`@ -1 +1 @@`
			`Subproject commit 97e965feece9fd3d3a5328156c043bd2a0cca80f`				`Subproject commit 5bfbaf397b26e20b9f2e8f7e1246674e2c2417b3`