Start on a draft about Agda and Hugo

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
This commit is contained in:
Danila Fedorin 2024-05-25 23:28:18 -07:00
parent 04f12b545d
commit 6a168f2fe1

256
content/blog/agda_hugo.md Normal file
View File

@ -0,0 +1,256 @@
---
title: "Integrating Agda's HTML Output with Hugo"
date: 2024-05-25T21:02:10-07:00
draft: true
tags: ["Agda", "Hugo", "Ruby", "Nix"]
---
One of my favorite things about Agda are its clickable HTML pages. If you don't
know what they are, that's pages like [`Data.List.Properties`](https://agda.github.io/agda-stdlib/master/Data.List.Properties.html);
they just give the code from a particular Agda file, but make every identifier
clickable. Then, if you see some variable or function that you don't know, you
can just click it and jump right to it! It makes exploring the documentation
a lot smoother. I've found that these HTML pages provide all the information
I need for writing proofs.
Recently, I've been writing a fair bit about Agda; mostly about the patterns
that I've learned about, such as the ["is something" pattern]({{< relref "agda_is_pattern" >}})
and the ["deeply embedded expression" trick]({{< relref "agda_expr_pattern" >}}).
I've found myself wanting to click on definitions in my own code blocks; recently,
I got this working, and I wanted to share how I did it, in case someone else
wants to integrate Agda into their own static website. Though my stack
is based on Hugo, the general idea should work with any other static site
generator.
### TL;DR and Demo
I wrote a script to transfer links from an Agda HTML file into Hugo's HTML
output, making it possible to embellish "plain" Hugo output with Agda's
'go-to-definition links'. It looks like this. Here's an Agda code block
defining an 'expression' data type, from a project of mine:
{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 543 546 >}}
And here's the denotational semantics for that expression:
{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 586 589 >}}
Notice that you can click `Expr`, `__`, `⟦`, etc.! All of this integrates
with my existing Hugo site, and only required a little bit of additional
metadata to make it work.
Now, the details. Right now, the solution is pretty tailored to my site
and workflow, but the core of the script -- a piece that transfers links
from an Agda HTML file into a syntax highlighted Hugo HTML block -- should
be fairly reusable.
### The Constraints
The goal was simple: to allow the code blocks on my Hugo-generated site to
have links that take the user to the definition of a given symbol.
Specifically, if the symbol occurs somewhere on the same blog page, the link
should take the user there (and not to a regular `Module.html` file). That
way, the reader can not only get to the code that they want to see, but also
have a chance to read the surrounding prose in properly-rendered Markdown.
Next, unlike standard "literate Agda" files, my blog posts are not single
`.agda` files with Markdown in comments. Rather, I use regular Hugo
Markdown, and present portions of an existing project, weaving together many
files, and showing the fragments out of order. So, my tool needs to support
links that come from distinct modules, in any order.
Additionally, I've recently been writing a whole series about an Agda project
of mine; in this series, I gradually build up to the final product, explaining
one or two modules at a time. I would expect that links on pages in this series
could jump to other pages in the same series: if I cover module `A` in part 1,
then write `A.f` in part 2, clicking on `A` -- and maybe `f` -- should take
the reader back to the first part's page; once again, this would help provide
them with the surrounding explanation.
Finally, I wanted the Agda code to appear exactly the same as any other code
on my site, including the Hugo-provided syntax highlighting and theme. This
ruled out just copy-pasting pieces of the Agda-generated HTML in place of
code blocks on my page (and redirecting the links). Thought it was not
a hard requirement, I also hoped to include Agda code in the same
manner that I include all other code: [my `codelines` shortcode]({{< relref "codelines" >}}).
In brief, the `codelines` shortcode creates a syntax-highlighted code block,
as well as a surrounding "context" that says what file the code is from,
which lines are listed, and where to find the full code (e.g., on my Git server).
It looks something like this:
{{< codelines "Agda" "agda-spa/Language.agda" 20 27 >}}
In summary:
1. I want to create cross-links between symbols in Agda blocks in a blog post.
2. These code blocks could include code from disjoint files, and be out of order.
3. Code blocks among a whole series of posts should be cross-linked too.
4. The code blocks should be syntax highlighted the same way as the rest of the
code on the site.
5. Ideally, I should be able to use my regular method for referencing code.
I've hit all of these requirements; now it's time to dig into how I got there.
### Implementation
It's pretty much a no-go to try to resolve Agda from Hugo, or perform some
sort of "heuristic" to detect cross-links. Agda is a very complex programming
language, and Hugo's templating engine, though powerful, is just not
up to this task. Fortunately, Agda has support for
[HTML output using the `--html` flag](https://agda.readthedocs.io/en/v2.6.4.3-r1/tools/generating-html.html).
As a build step, I can invoke Agda on files that are referenced by my blog,
and generate HTML. This would decidedly slow down the site build process,
but it would guarantee accurate link information.
On the other hand, to satisfy the 4th constraint, I need to somehow mimic --
or keep -- the format of Hugo's existing HTML output. The easiest way to
do this without worrying about breaking changes and version incompatibility
is to actually use the existing syntax-highlighted HTML, and annotate it
with links as I discover them. Effectively, what I need to do is a "link
transfer": I need to identify regions of code that are highlighted in Agda's HTML,
find those regions in Hugo's HTML output, and mark them with links. In addition,
I'll need to fix up the links themselves: the HTML output assumes that each
Agda file is its own HTML page, but this is ruled out by the second constraint.
As a little visualization, the overall problems looks something like this:
````Agda {linenos=table}
-- Agda's HTML output (blocks of 't' are links):
-- |tttttt| |tttt| |t| |t| |ttttt|
module ModX ( x : T ) where
-- |tttttt| |tt|t| |t| |t| |ttttt|
-- Hugo's HTML output (blocks of 't' are syntax highlighting spans)
````
Both Agda and Hugo output a preformatted code block, decorated with various
inline HTMl that indicates information (token color for Hugo; symbol IDs and
links in Agda). However, Agda and Hugo do not use the same process to create
this decorated output; it's entirely possible -- and not uncommon -- for
Hugo and Agda to produce misaligned HTML nodes. In my diagram above,
this is reflected as `ModX` being considered a single token by Agda, but
two tokens (`Mod` and `X`) by the syntax highlighter. As a result, it's
difficult to naively iterate the two HTML formats in parallel.
What I ended up doing is translating Agda's HTML output into offsets and data
about the code block's _plain text_ -- the source code being decorated.
Both the Agda and Hugo HTML describe the same code; thus, the plain text
is the common denominator between the two.
I wrote a Ruby script to extract the decorations from the Agda output; here
it is in slightly abridged form. You can find the [original `agda.rb` file here](https://dev.danilafe.com/Web-Projects/blog-static/src/commit/04f12b545d5692a78b1a2f13ef968417c749e295/agda.rb).
```Ruby
# Traverse the preformatted Agda block in the given Agda HTML file
# and find which textual ranges have IDs and links to other ranges.
# Store this information in a hash, line => links[]
def process_agda_html_file(file)
document = Nokogiri::HTML.parse(File.open(file))
pre_code = document.css("pre.Agda")[0]
# The traversal postorder; we always visit children before their
# parents, and we visit leaves in sequence.
line_infos = []
offset = 0 # Column index within the current Agda source code line
line = 1
pre_code.traverse do |at|
# Text nodes are always leaves; visiting a new leaf means we've advanced
# in the text by the length of that text. However, if there are newlines
# -- since this is a preformatted block -- we also advanced by a line.
# At this time, do not support links that span multiple lines, but
# Agda doesn't produce those either.
if at.text?
if at.content.include? "\n"
raise "no support for links with newlines inside" if at.parent.name != "pre"
# Increase the line and track the final offset. Written as a loop
# in case we eventually want to add some handling for the pieces
# sandwiched between newlines.
at.content.split("\n", -1).each_with_index do |bit, idx|
line += 1 unless idx == 0
offset = bit.length
end
else
# It's not a newline node. Just adjust the offset within the plain text.
offset += at.content.length
end
elsif at.name == "a"
# Agda emits both links and things-to-link-to as 'a' nodes.
line_info = line_infos.fetch(line) { line_infos[line] = [] }
href = at.attribute("href")
id = at.attribute("id")
if href or id
new_node = { :from => offset-at.content.length, :to => offset }
new_node[:href] = href if href
new_node[:id] = id if id
line_info << new_node
end
end
end
return line_infos
end
```
This script takes an Agda HTML file and returns a map in which each line
of the Agda source code is associated with a list of ranges; the ranges
indicate links or places that can be linked to. For example, for the `ModX`
example above, the script might produce:
```Ruby
3 => [
{ :from => 3, :to => 9, id => "..." }, # Agda creates <a> nodes even for keywords.
{ :from => 12, :to => 16, id => "ModX-id" }, # The IDs Agda generates aren't usually this nice.
{ :from => 20, :to => 21, id => "x-id" },
]
```
{{< todo >}}This isn't as important probably, but might be worth talking about. {{< /todo >}}
The most challenging step is probably to identify the Agda "projects" that need
to be built. Since different articles have different modules (possibly with
the same name), I would need to keep them separate. Also, I'm not ruling
out the possibility of one project including another as a submodule. To
make this work, I wrote a little Ruby script to find all Agda files,
guess their project folder, and invoke the Agda compiler there. It boils
down to something like this:
```Ruby
# For each Agda file, find the most specific project / subproject to which
# it belongs.
files_for_paths = {}
Dir.glob("**/*.agda", base: root_path) do |agda_file|
best_path = max_path.call(agda_file)
files_for_path = files_for_paths.fetch(best_path) do
files_for_paths[best_path] = []
end
# Strip the project prefix from the Agda file's path, since
# Agda compiler will be invoked in the project's folder.
files_for_path << agda_file[best_path.length + File::SEPARATOR.length..-1]
end
original_wd = Dir.getwd
files_for_paths.each do |path, files|
# There might be a cleaner way of doing this, but it's convenient.
Dir.chdir(original_wd)
Dir.chdir(File.join(root_path, path))
# Wherever the target directory is, create a folder that corresponds to
# the project being built, to avoid "cross-contaminating" the output
# folder with distinct modules with the same name.
html_dir = File.join [target_dir, path, "html"]
FileUtils.mkdir_p html_dir
# Just shell out to Agda using the --html folder.
files.each do |file|
command = "#{ARGV[0]} --local-interfaces #{file} --html --html-dir=#{html_dir}"
puts command
puts `#{command}`
end
end
```
In short, it traverses all the folders in my `code` directory -- which is where
I keep my code, looking for Agda source files. Once it finds them,