25 KiB
title | date | tags | |||
---|---|---|---|---|---|
Integrating Agda's HTML Output with Hugo | 2024-05-30T00:29:26-07:00 |
|
One of my favorite things about Agda are its clickable HTML pages. If you don't
know what they are, that's pages like Data.List.Properties
;
they just give the code from a particular Agda file, but make every identifier
clickable. Then, if you see some variable or function that you don't know, you
can just click it and jump right to it! It makes exploring the documentation
a lot smoother. I've found that these HTML pages provide all the information
I need for writing proofs.
Recently, I've been writing a fair bit about Agda; mostly about the patterns that I've learned about, such as the ["is something" pattern]({{< relref "agda_is_pattern" >}}) and the ["deeply embedded expression" trick]({{< relref "agda_expr_pattern" >}}). I've found myself wanting to click on definitions in my own code blocks; recently, I got this working, and I wanted to share how I did it, in case someone else wants to integrate Agda into their own static website. Though my stack is based on Hugo, the general idea should work with any other static site generator.
TL;DR and Demo
I wrote a script to transfer links from an Agda HTML file into Hugo's HTML output, making it possible to embellish "plain" Hugo output with Agda's 'go-to-definition links'. It looks like this. Here's an Agda code block defining an 'expression' data type, from a project of mine:
{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 543 546 >}}
And here's the denotational semantics for that expression:
{{< codelines "Agda" "agda-spa/Lattice/Map.agda" 586 589 >}}
Notice that you can click Expr
, _∪_
, ⟦
, etc.! All of this integrates
with my existing Hugo site, and only required a little bit of additional
metadata to make it work. The conversion is implemented as
a Ruby script;
this script transfers the link structure from an Agda-generated documentation
HTML file onto lightly-annotated Hugo code blocks.
To use the script, your Hugo theme (or your Markdown content) must annotate the code blocks with several properties:
data-agda-block
, which marks code that needs to be processed.data-file-path
, which tells the script what Agda file provided the code in the block, and therefore what Agda HTML file should be searched for links.data-first-line
anddata-last-line
, which tell the script what section of the Agda HTML file should be searched for said links.
Given this -- and a couple of other assumptions, such as that all Agda
projects are in a code/<project>
folder, the script post-processes
the HTML files automatically. Right now, the solution is pretty tailored to my
site and workflow, but the core of the script -- the piece that transfers links
from an Agda HTML file into a syntax-highlighted Hugo HTML block -- should
be fairly reusable.
Now, the details.
The Constraints
The goal was simple: to allow the code blocks on my Hugo-generated site to
have links that take the user to the definition of a given symbol.
Specifically, if the symbol occurs somewhere on the same blog page, the link
should take the user there (and not to a regular Module.html
file). That
way, the reader can not only get to the code that they want to see, but also
have a chance to read the surrounding prose in properly-rendered Markdown.
Next, unlike standard "literate Agda" files, my blog posts are not single
.agda
files with Markdown in comments. Rather, I use regular Hugo
Markdown, and present portions of an existing project, weaving together many
files, and showing the fragments out of order. So, my tool needs to support
links that come from distinct modules, in any order.
Additionally, I've recently been writing a whole series about an Agda project
of mine; in this series, I gradually build up to the final product, explaining
one or two modules at a time. I would expect that links on pages in this series
could jump to other pages in the same series: if I cover module A
in part 1,
then write A.f
in part 2, clicking on A
-- and maybe f
-- should take
the reader back to the first part's page; once again, this would help provide
them with the surrounding explanation.
Finally, I wanted the Agda code to appear exactly the same as any other code
on my site, including the Hugo-provided syntax highlighting and theme. This
ruled out just copy-pasting pieces of the Agda-generated HTML in place of
code blocks on my page (and redirecting the links). Thought it was not
a hard requirement, I also hoped to include Agda code in the same
manner that I include all other code: [my codelines
shortcode]({{< relref "codelines" >}}).
In brief, the codelines
shortcode creates a syntax-highlighted code block,
as well as a surrounding "context" that says what file the code is from,
which lines are listed, and where to find the full code (e.g., on my Git server).
It looks something like this:
{{< codelines "Agda" "agda-spa/Language.agda" 20 27 >}}
In summary:
- I want to create cross-links between symbols in Agda blocks in a blog post.
- These code blocks could include code from disjoint files, and be out of order.
- Code blocks among a whole series of posts should be cross-linked too.
- The code blocks should be syntax highlighted the same way as the rest of the code on the site.
- Ideally, I should be able to use my regular method for referencing code.
I've hit all of these requirements; now it's time to dig into how I got there.
Implementation
Processing Agda's HTML Output
It's pretty much a no-go to try to resolve Agda from Hugo, or perform some
sort of "heuristic" to detect cross-links. Agda is a very complex programming
language, and Hugo's templating engine, though powerful, is just not
up to this task. Fortunately, Agda has support for
HTML output using the --html
flag.
As a build step, I can invoke Agda on files that are referenced by my blog,
and generate HTML. This would decidedly slow down the site build process,
but it would guarantee accurate link information.
On the other hand, to satisfy the 4th constraint, I need to somehow mimic -- or keep -- the format of Hugo's existing HTML output. The easiest way to do this without worrying about breaking changes and version incompatibility is to actually use the existing syntax-highlighted HTML, and annotate it with links as I discover them. Effectively, what I need to do is a "link transfer": I need to identify regions of code that are highlighted in Agda's HTML, find those regions in Hugo's HTML output, and mark them with links. In addition, I'll need to fix up the links themselves: the HTML output assumes that each Agda file is its own HTML page, but this is ruled out by the second constraint of mine.
As a little visualization, the overall problems looks something like this:
1
2
3
4
5
-- Agda's HTML output (blocks of 't' are links):
-- |tttttt| |tttt| |t| |t| |ttttt|
module ModX ( x : T ) where
-- |tttttt| |tt|t| |t| |t| |ttttt|
-- Hugo's HTML output (blocks of 't' are syntax highlighting spans)
Both Agda and Hugo output a preformatted code block, decorated with various
inline HTMl that indicates information (token color for Hugo; symbol IDs and
links in Agda). However, Agda and Hugo do not use the same process to create
this decorated output; it's entirely possible -- and not uncommon -- for
Hugo and Agda to produce misaligned HTML nodes. In my diagram above,
this is reflected as ModX
being considered a single token by Agda, but
two tokens (Mod
and X
) by the syntax highlighter. As a result, it's
difficult to naively iterate the two HTML formats in parallel.
What I ended up doing is translating Agda's HTML output into offsets and data about the code block's plain text -- the source code being decorated. Both the Agda and Hugo HTML describe the same code; thus, the plain text is the common denominator between the two. {#plain-text}
I wrote a Ruby script to extract the decorations from the Agda output; here
it is in slightly abridged form. You can find the original agda.rb
file here.
# Traverse the preformatted Agda block in the given Agda HTML file
# and find which textual ranges have IDs and links to other ranges.
# Store this information in a hash, line => links[]
def process_agda_html_file(file)
document = Nokogiri::HTML.parse(File.open(file))
pre_code = document.css("pre.Agda")[0]
# The traversal is postorder; we always visit children before their
# parents, and we visit leaves in sequence.
line_infos = []
offset = 0 # Column index within the current Agda source code line
line = 1
pre_code.traverse do |at|
# Text nodes are always leaves; visiting a new leaf means we've advanced
# in the text by the length of that text. However, if there are newlines
# -- since this is a preformatted block -- we also advanced by a line.
# At this time, do not support links that span multiple lines, but
# Agda doesn't produce those either.
if at.text?
if at.content.include? "\n"
raise "no support for links with newlines inside" if at.parent.name != "pre"
# Increase the line and track the final offset. Written as a loop
# in case we eventually want to add some handling for the pieces
# sandwiched between newlines.
at.content.split("\n", -1).each_with_index do |bit, idx|
line += 1 unless idx == 0
offset = bit.length
end
else
# It's not a newline node. Just adjust the offset within the plain text.
offset += at.content.length
end
elsif at.name == "a"
# Agda emits both links and things-to-link-to as 'a' nodes.
line_info = line_infos.fetch(line) { line_infos[line] = [] }
href = at.attribute("href")
id = at.attribute("id")
if href or id
new_node = { :from => offset-at.content.length, :to => offset }
new_node[:href] = href if href
new_node[:id] = id if id
line_info << new_node
end
end
end
return line_infos
end
This script takes an Agda HTML file and returns a map in which each line
of the Agda source code is associated with a list of ranges; the ranges
indicate links or places that can be linked to. For example, for the ModX
example above, the script might produce:
3 => [
{ :from => 3, :to => 9, id => "..." }, # Agda creates <a> nodes even for keywords.
{ :from => 12, :to => 16, id => "ModX-id" }, # The IDs Agda generates aren't usually this nice.
{ :from => 20, :to => 21, id => "x-id" },
]
Modifying Hugo's HTML
Given such line information, the next step is to transfer it onto existing
Hugo HTML files. Within a file, I've made my codelines
shortcode emit
custom attributes that can be used to find syntax-highlighted Agda code.
The chief such attribute is data-agda-block
; my script traverses all
elements with this attribute.
def process_source_file(file, document)
# Process each highlight group that's been marked as an Agda file.
document.css('div[data-agda-block]').each do |t|
# ...
To figure out which Agda HTML file to use, and which lines to search for links, the script also expects some additional attributes.
# ...
first_line, last_line = nil, nil
if first_line_attr = t.attribute("data-first-line")
first_line = first_line_attr.to_s.to_i
end
if last_line_attr = t.attribute("data-last-line")
last_line = last_line_attr.to_s.to_i
end
if first_line and last_line
line_range = first_line..last_line
else
# no line number attributes = the code block contains the whole file
line_range = 1..
end
full_path = t.attribute("data-file-path").to_s
# ...
At this point, the Agda file could be in some nested directory, like
A/B/C/File.agda
. However, the project root -- the place where Agda modules
are compiled from -- could be any one of the folders A
, B
, or C
.
Thus, the fully qualified module name for File.agda
could be File
,
C.File
, B.C.File
, or A.B.C.File
. Since Agda's HTML output produces
files named after the fully qualified module name, the script needs to guess
what the module file is. This is where some conventions come in play:
I keep my code in folders directly nested within a top-level code
directory;
thus, I'll have folders project1
or project2
inside code
, and those
will always be project roots. As a result,
I guess that the first directory relative to code
should be discarded,
while the rest should be included in the path. The only exception to this is
Git submodules: if an Agda file is included using a submodule, the root
directory of the submodule is considered the Agda project root. My Hugo theme
indicates the submodule using an additional data-base-path
attribute; in all,
that leads to the following logic:
# ...
full_path_dirs = Pathname(full_path).each_filename.to_a
base_path = t.attribute("data-base-path").to_s
base_dir_depth = 0
if base_path.empty?
# No submodules were used. Assume code/<X> is the root.
# The path of the file is given relative to `code`, so need
# to strip only the one outermost directory.
base_dir_depth = 1
base_path = full_path_dirs[0]
else
# The code is in a submodule. Assume that the base path / submodule
# root is the Agda module root, ignore all folders before that.
base_path_dirs = Pathname(base_path).each_filename.to_a
base_dir_depth = base_path_dirs.length
end
# ...
With that, the script determines the actual HTML file path ---
by assuming that there's an html
folder in the same place as the Agda
project root --- and runs the above process_agda_html_file
:
# ...
dirs_in_base = full_path_dirs[base_dir_depth..-1]
html_file = dirs_in_base.join(".").gsub(/\.agda$/, ".html")
html_path = File.join(["code", base_path, "html", html_file])
agda_info = process_agda_html_file(html_path)
# ...
The next step is specific to the output of Hugo's syntax highlighter,
Chroma. When line numbers are enabled
-- and they are on my site -- Chroma generates a table that, at some point,
contains a bunch of span
HTML nodes, each with the line
class. Each
such span
corresponds to a single line of output; naturally, the first
one contains the code from first_line
, the second from first_line + 1
,
and so on until last_line
. This is quite convenient, because it saves the
headache of counting newlines the way that the Agda processing code above has to.
For each line of syntax-highlighted code, the script retrieves the corresponding list of links that were collected from the Agda HTML file.
# ...
lines = t.css("pre.chroma code[data-lang] .line")
lines.zip(line_range).each do |line, line_no|
line_info = agda_info[line_no]
next unless line_info
# ...
The subsequent traversal -- which picks out the plain text of the Agda file
as reasoned above -- is very similar to the previous
one. Here too there's an offset
variable, which gets incremented with
the length of a new plain text pieces. Since we know the lines match up
to span
s, there's no need to count newlines.
# ...
offset = 0
line.traverse do |lt|
if lt.text?
content = lt.content
new_offset = offset + content.length
# ...
At this point, we have a line number, and an offset within that line number that describes the portion of the text under consideration. We can traverse all the links for the line, and find ones that mark a piece of text somewhere in this range. For the time being -- since inserting overlapping spans is quite complicated -- I require the links to lie entirely within a particular plain text region. As a result, if Chroma splits a single Agda identifier into several tokens, it will not be linked. For now, this seems like the most conservative and safe approach.
# ...
matching_links = line_info.links.filter do |link|
link[:from] >= offset and link[:to] <= new_offset
end
# ...
All that's left is to slice up the plain text fragment into a bunch of HTML
pieces: the substrings that are links will turn into a
HTML nodes, while
the substrings that are "in between" the links will be left over as plain
text nodes. The code to do so is relatively verbose, but not all that complicated.
replace_with = []
replace_offset = 0
matching_links.each do |match|
# The link's range is an offset from the beginning of the line,
# but the text piece we're splitting up might be partway into
# the line. Convert the link coordinates to piece-relative ones.
relative_from = match[:from] - offset
relative_to = match[:to] - offset
# If the previous link ended some time before the new link
# began (or if the current link is the first one, and is not
# at the beginning), ensure that the plain text "in between"
# is kept.
replace_with << content[replace_offset...relative_from]
tag = (match.include? :href) ? 'a' : 'span'
new_node = Nokogiri::XML::Node.new(tag, document)
if match.include? :href
# For nodes with links, note what they're referring to, so
# we can adjust their hrefs when we assign global IDs.
href = match[:href].to_s
new_node['href'] = note_used_href file, new_node, href
end
if match.include? :id
# For nodes with IDs visible in the current Hugo file, we'll
# want to redirect links that previously go to other Agda
# module HTML files. So, note the ID that we want to redirect,
# and pick a new unique ID to replace it with.
id = match[:id].to_s
new_node['id'] = note_defined_href file, "#{html_file}##{id}"
end
new_node.content = content[relative_from...relative_to]
replace_with << new_node
replace_offset = relative_to
end
replace_with << content[replace_offset..-1]
There's a little bit of a subtlety in the above code: specifically, I use
the note_used_href
and note_defined_href
methods. These are important
for rewriting links. Like I mentioned earlier, Agda's HTML output assumes
that each source file should produce a single HTML file -- named after its
qualified module -- and creates links accordingly. However, my blog posts
interweave multiple source files. Some links that would've jumped to a different
file must now point to an internal identifier within the page. Another
important aspect of the transformation is that, since I'm pulling HTML files
from distinct files, it's not guaranteed that each of them will have a unique
id
attribute. After all, Agda just assigns sequential numbers to each
node that it generates; it would only take, e.g., including the first line
from two distinct modules to end up with two nodes with id="1"
.
The solution is then twofold:
- Track all the nodes referencing a particular
href
(made up of an HTML file and a numerical identifier, likeFile.html#123
). When we pick new IDs -- thus guaranteeing their uniqueness -- we'll visit all the nodes that refer to the old ID and HTML file, and update theirhref
. - Track all existing Agda HTML IDs that we're inserting. If we transfer
an
<a id="1234">
onto the Hugo content, we know we'll need to pick a new ID for it (since1234
need not be unique), and that we'll need to redirect the other links to that new ID as the previous bullet describes.
Here's how these two methods work:
def note_defined_href(file, href)
file_hrefs = @local_seen_hrefs.fetch(file) do
@local_seen_hrefs[file] = {}
end
uniq_id = file_hrefs.fetch(href) do
new_id = "agda-unique-ident-#{@id_counter}"
@id_counter += 1
file_hrefs[href] = new_id
end
unless @global_seen_hrefs.include? href
@global_seen_hrefs[href] = { :file => file, :id => uniq_id }
end
return uniq_id
end
def note_used_href(file, node, href)
ref_list = @nodes_referencing_href.fetch(href) { @nodes_referencing_href[href] = [] }
ref_list << { :file => file, :node => node }
return href
end
Note that they use class variables: these are methods on a FileGroup
class.
I've omitted the various classes I've declared from the above code for brevity,
but here it makes sense to show them. Like I mentioned earlier, you can
view the complete code here.
Interestingly, note_defined_href
makes use of two global maps:
@local_seen_hrefs
and @global_seen_hrefs
. This helps satisfy the third
constraint above, which is linking between code defined in the same series.
The logic is as follows: when rewriting a link to a new HTML file and ID,
if the code we're trying to link to exists on the current page, we should link
to that. Otherwise, if the code we're trying to link to was presented in
a different part of the series, then we should link to that other part.
So, we consult the "local" map for href
s that will be rewritten to HTML
nodes in the current file, and as a fallback, consult the "global" map for
hrefs
that were introduced in other parts. The note_defined_href
populates
both maps, and is "biased" towards the first occurrence of a piece of code:
if posts A and B define a function f
, and post C only references f
, then
that link will go to post A's definition, which came earlier.
The other method, note_used_href
, is simpler. It just appends to a list
of Nokogiri HTML nodes that reference a given href
. We keep track of the file
in which the reference occurred so we can be sure to consult the right sub-map
of @local_seen_hrefs
when checking for in-page rewrites.
After running process_source_file
on all Hugo HTML files within a particular
series, the following holds true:
- We have inserted
span
ora
nodes wherever Agda's original output had nodes withid
orhref
elements. This is with the exception of the case where Hugo's inline HTML doesn't "line up" with Agda's inline HTML, which I've only found to happen when the leading character of an identifier is a digit. - We have picked new IDs for each HTML node we inserted that had an ID,
noting them both globally and for the current file. We noted their original
href
value (in the formFile.html#123
) and that it should be transformed into our globally-unique identifiers, in the formagda-unique-ident-1234
. - For each HTML node we inserted that links to another, we noted the
href
of the reference (also in the formFile.html#123
).
Now, all that's left is to redirect the href
s of the nodes we inserted
from their old values to the new ones. I do this by iterating over @nodes_referencing_href
,
which contains every link we inserted.
def cross_link_files
@nodes_referencing_href.each do |href, references|
references.each do |reference|
file = reference[:file]
node = reference[:node]
local_targets = @local_seen_hrefs[file]
if local_targets.include? href
# A code block in this file provides this href, create a local link.
node['href'] = "##{local_targets[href]}"
elsif @global_seen_hrefs.include? href
# A code block in this series, but not in this file, defines
# this href. Create a cross-file link.
target = @global_seen_hrefs[href]
other_file = target[:file]
id = target[:id]
relpath = Pathname.new(other_file).dirname.relative_path_from(Pathname.new(file).dirname)
node['href'] = "#{relpath}##{id}"
else
# No definitions in any blog page. For now, just delete the anchor.
node.replace node.content
end
end
end
end
Notice that for the time being, I simply remove links to Agda definitions that didn't occur in the Hugo post. Ideally, this would link to the plain, non-blog documentation page generated by Agda; however, this requires either hosting those documentation pages, or expecting the Agda standard library HTML pages to remain stable and hosted at a fixed URL. Neither was simple enough to do, so I opted for the conservative "just don't insert links" approach.
And that's all of the approach that I wanted to show off today! There
are other details, like finding posts in the same series (I achieve this
with a meta
element) and invoking agda --html
on the necessary source files
(my build-agda-html.rb
script is how I personally do this), but I don't think it's all that valuable
to describe them here.
Unfortunately, the additional metadata I had my theme insert makes it harder for others to use this approach out of the box. However, I hope that by sharing my experience, others who write Agda and post about it might be able to get a similar solution working. And of course, it's always fun to write about a recent project or endeavor.
Happy (dependently typed) programming and blogging!