Write more about Agda+Hugo
Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
This commit is contained in:
parent
5232f0a6e2
commit
13636a0d29
@ -2,7 +2,7 @@
|
||||
title: "Integrating Agda's HTML Output with Hugo"
|
||||
date: 2024-05-25T21:02:10-07:00
|
||||
draft: true
|
||||
tags: ["Agda", "Hugo", "Ruby", "Nix"]
|
||||
tags: ["Agda", "Hugo", "Ruby"]
|
||||
---
|
||||
|
||||
One of my favorite things about Agda are its clickable HTML pages. If you don't
|
||||
@ -37,13 +37,29 @@ And here's the denotational semantics for that expression:
|
||||
|
||||
Notice that you can click `Expr`, `_∪_`, `⟦`, etc.! All of this integrates
|
||||
with my existing Hugo site, and only required a little bit of additional
|
||||
metadata to make it work.
|
||||
metadata to make it work. The conversion is implemented as
|
||||
[a Ruby script](https://dev.danilafe.com/Web-Projects/blog-static/src/commit/04f12b545d5692a78b1a2f13ef968417c749e295/agda.rb);
|
||||
this script transfers the link structure from an Agda-generated documentation
|
||||
HTML file onto lightly-annotated Hugo code blocks.
|
||||
|
||||
Now, the details. Right now, the solution is pretty tailored to my site
|
||||
and workflow, but the core of the script -- a piece that transfers links
|
||||
from an Agda HTML file into a syntax highlighted Hugo HTML block -- should
|
||||
To use the script, your Hugo theme (or your Markdown content) must
|
||||
annotate the code blocks with several properties:
|
||||
* `data-agda-block`, which marks code that needs to be processed.
|
||||
* `data-file-path`, which tells the script what Agda file provided the
|
||||
code in the block, and therefore what Agda HTML file should be searched
|
||||
for links.
|
||||
* `data-first-line` and `data-last-line`, which tell the script what
|
||||
section of the Agda HTML file should be searched for said links.
|
||||
|
||||
Given this -- and a couple of other assumptions, such as that all Agda
|
||||
projects are in a `code/<project>` folder, the script post-processes
|
||||
the HTML files automatically. Right now, the solution is pretty tailored to my
|
||||
site and workflow, but the core of the script -- the piece that transfers links
|
||||
from an Agda HTML file into a syntax-highlighted Hugo HTML block -- should
|
||||
be fairly reusable.
|
||||
|
||||
Now, the details.
|
||||
|
||||
### The Constraints
|
||||
The goal was simple: to allow the code blocks on my Hugo-generated site to
|
||||
have links that take the user to the definition of a given symbol.
|
||||
@ -84,7 +100,7 @@ In summary:
|
||||
1. I want to create cross-links between symbols in Agda blocks in a blog post.
|
||||
2. These code blocks could include code from disjoint files, and be out of order.
|
||||
3. Code blocks among a whole series of posts should be cross-linked too.
|
||||
4. The code blocks should be syntax highlighted the same way as the rest of the
|
||||
4. The code blocks should be syntax highlighting the same way as the rest of the
|
||||
code on the site.
|
||||
5. Ideally, I should be able to use my regular method for referencing code.
|
||||
|
||||
@ -92,6 +108,7 @@ I've hit all of these requirements; now it's time to dig into how I got there.
|
||||
|
||||
### Implementation
|
||||
|
||||
#### Processing Agda's HTML Output
|
||||
It's pretty much a no-go to try to resolve Agda from Hugo, or perform some
|
||||
sort of "heuristic" to detect cross-links. Agda is a very complex programming
|
||||
language, and Hugo's templating engine, though powerful, is just not
|
||||
@ -135,6 +152,7 @@ What I ended up doing is translating Agda's HTML output into offsets and data
|
||||
about the code block's _plain text_ -- the source code being decorated.
|
||||
Both the Agda and Hugo HTML describe the same code; thus, the plain text
|
||||
is the common denominator between the two.
|
||||
{#plain-text}
|
||||
|
||||
I wrote a Ruby script to extract the decorations from the Agda output; here
|
||||
it is in slightly abridged form. You can find the [original `agda.rb` file here](https://dev.danilafe.com/Web-Projects/blog-static/src/commit/04f12b545d5692a78b1a2f13ef968417c749e295/agda.rb).
|
||||
@ -147,7 +165,7 @@ def process_agda_html_file(file)
|
||||
document = Nokogiri::HTML.parse(File.open(file))
|
||||
pre_code = document.css("pre.Agda")[0]
|
||||
|
||||
# The traversal postorder; we always visit children before their
|
||||
# The traversal is postorder; we always visit children before their
|
||||
# parents, and we visit leaves in sequence.
|
||||
line_infos = []
|
||||
offset = 0 # Column index within the current Agda source code line
|
||||
@ -205,6 +223,325 @@ example above, the script might produce:
|
||||
]
|
||||
```
|
||||
|
||||
#### Modifying Hugo's HTML
|
||||
|
||||
Given such line information, the next step is to transfer it onto existing
|
||||
Hugo HTML files. Within a file, I've made my `codelines` shortcode emit
|
||||
custom attributes that can be used to find syntax highlighting Agda code.
|
||||
The chief such attribute is `data-agda-block`; my script traverses all
|
||||
elements with this attribute.
|
||||
|
||||
```Ruby
|
||||
def process_source_file(file, document)
|
||||
# Process each highlight group that's been marked as an Agda file.
|
||||
document.css('div[data-agda-block]').each do |t|
|
||||
# ...
|
||||
```
|
||||
|
||||
To figure out which Agda HTML file to use, and which lines to search for links,
|
||||
the script also expects some additional attributes.
|
||||
|
||||
```Ruby
|
||||
# ...
|
||||
first_line, last_line = nil, nil
|
||||
|
||||
if first_line_attr = t.attribute("data-first-line")
|
||||
first_line = first_line_attr.to_s.to_i
|
||||
end
|
||||
if last_line_attr = t.attribute("data-last-line")
|
||||
last_line = last_line_attr.to_s.to_i
|
||||
end
|
||||
|
||||
if first_line and last_line
|
||||
line_range = first_line..last_line
|
||||
else
|
||||
# no line number attributes = the code block contains the whole file
|
||||
line_range = 1..
|
||||
end
|
||||
|
||||
full_path = t.attribute("data-file-path").to_s
|
||||
# ...
|
||||
```
|
||||
|
||||
At this point, the Agda file could be in some nested directory, like
|
||||
`A/B/C/File.agda`. However, the project root -- the place where Agda modules
|
||||
are compiled from -- could be any one of the folders `A`, `B`, or `C`.
|
||||
Thus, the fully qualified module name for `File.agda` could be `File`,
|
||||
`C.File`, `B.C.File`, or `A.B.C.File`. Since Agda's HTML output produces
|
||||
files named after the fully qualified module name, the script needs to guess
|
||||
what the module file is. This is where some conventions come in play:
|
||||
I keep my code in folders directly nested within a top-level `code` directory;
|
||||
thus, I'll have folders `project1` or `project2`. As a result,
|
||||
I guess that the first directory should be discarded, while the rest should be
|
||||
included in the path. The only exception to this is Git submodules: if an Agda
|
||||
file is included using a submodule, the root directory of the submodule is
|
||||
considered the Agda project root. My Hugo theme indicates the submodule using
|
||||
an additional `data-base-path` attribute; in all, that leads to the
|
||||
following logic:
|
||||
|
||||
```Ruby
|
||||
# ...
|
||||
full_path_dirs = Pathname(full_path).each_filename.to_a
|
||||
base_path = t.attribute("data-base-path").to_s
|
||||
base_dir_depth = 0
|
||||
if base_path.empty?
|
||||
# No submodules were used. Assume code/<X> is the root.
|
||||
# The path of the file is given relative to `code`, so need
|
||||
# to strip only the one outermost directory.
|
||||
base_dir_depth = 1
|
||||
base_path = full_path_dirs[0]
|
||||
else
|
||||
# The code is in a submodule. Assume that the base path / submodule
|
||||
# root is the Agda module root, ignore all folders before that.
|
||||
base_path_dirs = Pathname(base_path).each_filename.to_a
|
||||
base_dir_depth = base_path_dirs.length
|
||||
end
|
||||
# ...
|
||||
```
|
||||
|
||||
With that, the script determines the actual HTML file path ---
|
||||
by assuming that there's an `html` folder in the same place as the Agda
|
||||
project root --- and runs the above `process_agda_html_file`:
|
||||
|
||||
```Ruby
|
||||
# ...
|
||||
dirs_in_base = full_path_dirs[base_dir_depth..-1]
|
||||
html_file = dirs_in_base.join(".").gsub(/\.agda$/, ".html")
|
||||
html_path = File.join(["code", base_path, "html", html_file])
|
||||
|
||||
agda_info = process_agda_html_file(html_path)
|
||||
# ...
|
||||
```
|
||||
|
||||
The next step is specific to the output of Hugo's syntax highlighter,
|
||||
[Chroma](https://github.com/alecthomas/chroma). When line numbers are enabled
|
||||
-- and they are on my site -- Chroma generates a table that, at some point,
|
||||
contains a bunch of `span` HTML nodes, each with the `line` class. Each
|
||||
such `span` corresponds to a single line of output; naturally, the first
|
||||
one contains the code from `first_line`, the second from `first_line + 1`,
|
||||
and so on until `last_line`. This is quite convenient, because it saves the
|
||||
headache of counting newlines the way that the Agda processing code above has to.
|
||||
|
||||
For each line of syntax-highlighted code, the script retrieves the corresponding
|
||||
list of links that were collected from the Agda HTML file.
|
||||
|
||||
```Ruby
|
||||
# ...
|
||||
lines = t.css("pre.chroma code[data-lang] .line")
|
||||
lines.zip(line_range).each do |line, line_no|
|
||||
line_info = agda_info[line_no]
|
||||
next unless line_info
|
||||
|
||||
# ...
|
||||
```
|
||||
|
||||
The subsequent traversal -- which picks out the plain text of the Agda file
|
||||
as [reasoned above](#plain-text) -- is very similar to the previous
|
||||
one. Here too there's an `offset` variable, which gets incremented with
|
||||
the length of a new plain text pieces. Since we know the lines match up
|
||||
to `span`s, there's no need to count newlines.
|
||||
|
||||
```Ruby
|
||||
# ...
|
||||
offset = 0
|
||||
line.traverse do |lt|
|
||||
if lt.text?
|
||||
content = lt.content
|
||||
new_offset = offset + content.length
|
||||
|
||||
# ...
|
||||
```
|
||||
|
||||
At this point, we have a line number, and an offset within that line number
|
||||
that describes the portion of the text under consideration. We can
|
||||
traverse all the links for the line, and find ones that mark a piece of
|
||||
text somewhere in this range. For the time being -- since inserting overlapping
|
||||
spans is quite complicated -- I require the links to lie entirely within a
|
||||
particular plain text region. As a result, if Chroma splits a single Agda
|
||||
identifier into several tokens, it will not be linked. For now, this seems
|
||||
like the most conservative and safe approach.
|
||||
|
||||
```Ruby
|
||||
# ...
|
||||
matching_links = line_info.links.filter do |link|
|
||||
link[:from] >= offset and link[:to] <= new_offset
|
||||
end
|
||||
# ...
|
||||
```
|
||||
|
||||
All that's left is to slice up the plain text fragment into a bunch of HTML
|
||||
pieces: the substrings that are links will turn into `a` HTML nodes, while
|
||||
the substrings that are "in between" the links will be left over as plain
|
||||
text nodes. The code to do so is relatively verbose, but not all that complicated.
|
||||
|
||||
```Ruby
|
||||
replace_with = []
|
||||
replace_offset = 0
|
||||
matching_links.each do |match|
|
||||
# The link's range is an offset from the beginning of the line,
|
||||
# but the text piece we're splitting up might be partway into
|
||||
# the line. Convert the link coordinates to piece-relative ones.
|
||||
relative_from = match[:from] - offset
|
||||
relative_to = match[:to] - offset
|
||||
|
||||
# If the previous link ended some time before the new link
|
||||
# began (or if the current link is the first one, and is not
|
||||
# at the beginning), ensure that the plain text "in between"
|
||||
# is kept.
|
||||
replace_with << content[replace_offset...relative_from]
|
||||
|
||||
tag = (match.include? :href) ? 'a' : 'span'
|
||||
new_node = Nokogiri::XML::Node.new(tag, document)
|
||||
if match.include? :href
|
||||
# For nodes with links, note what they're referring to, so
|
||||
# we can adjust their hrefs when we assign global IDs.
|
||||
href = match[:href].to_s
|
||||
new_node['href'] = note_used_href file, new_node, href
|
||||
end
|
||||
if match.include? :id
|
||||
# For nodes with IDs visible in the current Hugo file, we'll
|
||||
# want to redirect links that previously go to other Agda
|
||||
# module HTML files. So, note the ID that we want to redirect,
|
||||
# and pick a new unique ID to replace it with.
|
||||
id = match[:id].to_s
|
||||
new_node['id'] = note_defined_href file, "#{html_file}##{id}"
|
||||
end
|
||||
new_node.content = content[relative_from...relative_to]
|
||||
|
||||
replace_with << new_node
|
||||
replace_offset = relative_to
|
||||
end
|
||||
replace_with << content[replace_offset..-1]
|
||||
```
|
||||
|
||||
There's a little bit of a subtlety in the above code: specifically, I use
|
||||
the `note_used_href` and `note_defined_href` methods. These are important
|
||||
for rewriting links. Like I mentioned earlier, Agda's HTML output assumes
|
||||
that each source file should produce a single HTML file -- named after its
|
||||
qualified module -- and creates links accordingly. However, my blog posts
|
||||
interweave multiple source files. Some links that would've jumped to a different
|
||||
file must now point to an internal identifier within the page. Another
|
||||
important aspect of the transformation is that, since I'm pulling HTML files
|
||||
from distinct files, it's not guaranteed that each of them will have a unique
|
||||
`id` attribute. After all, Agda just assigns sequential numbers to each
|
||||
node that it generates; it would only take, e.g., including the first line
|
||||
from two distinct modules to end up with two nodes with `id="1"`.
|
||||
|
||||
The solution is then twofold:
|
||||
|
||||
1. Track all the nodes referencing a particular `href` (made up of an HTML
|
||||
file and a numerical identifier, like `File.html#123`). When we pick
|
||||
new IDs -- thus guaranteeing their uniqueness -- we'll visit all the
|
||||
nodes that refer to the old ID and HTML file, and update their `href`.
|
||||
2. Track all existing Agda HTML IDs that we're inserting. If we transfer
|
||||
an `<a id="1234">` onto the Hugo content, we know we'll need to pick a new
|
||||
ID for it (since `1234` need not be unique), and that we'll need to redirect
|
||||
the other links to that new ID as the previous bullet describes.
|
||||
|
||||
Here's how these two methods work:
|
||||
|
||||
```Ruby
|
||||
def note_defined_href(file, href)
|
||||
file_hrefs = @local_seen_hrefs.fetch(file) do
|
||||
@local_seen_hrefs[file] = {}
|
||||
end
|
||||
|
||||
uniq_id = file_hrefs.fetch(href) do
|
||||
new_id = "agda-unique-ident-#{@id_counter}"
|
||||
@id_counter += 1
|
||||
file_hrefs[href] = new_id
|
||||
end
|
||||
|
||||
unless @global_seen_hrefs.include? href
|
||||
@global_seen_hrefs[href] = { :file => file, :id => uniq_id }
|
||||
end
|
||||
|
||||
return uniq_id
|
||||
end
|
||||
|
||||
def note_used_href(file, node, href)
|
||||
ref_list = @nodes_referencing_href.fetch(href) { @nodes_referencing_href[href] = [] }
|
||||
ref_list << { :file => file, :node => node }
|
||||
return href
|
||||
end
|
||||
```
|
||||
|
||||
Note that they use class variables: these are methods on a `FileGroup` class.
|
||||
I've omitted the various classes I've declared from the above code for brevity,
|
||||
but here it makes sense to show them. Like I mentioned earlier, you can
|
||||
view the [complete code here](https://dev.danilafe.com/Web-Projects/blog-static/src/commit/6a168f2fe144850ed3a81b796e07266cbf80f382/agda.rb).
|
||||
|
||||
Interestingly, `note_defined_href` makes use of _two_ global maps:
|
||||
`@local_seen_hrefs` and `@global_seen_hrefs`. This helps satisfy the third
|
||||
constraint above, which is linking between code defined in the same series.
|
||||
The logic is as follows: when rewriting a link to a new HTML file and ID,
|
||||
if the code we're trying to link to exists on the current page, we should link
|
||||
to that. Otherwise, if the code we're trying to link to was presented in
|
||||
a different part of the series, then we should link to that other part.
|
||||
So, we consult the "local" map for `href`s that will be rewritten to HTML
|
||||
nodes in the current file, and as a fallback, consult the "global" map for
|
||||
`hrefs` that were introduced in other parts. The `note_defined_href` populates
|
||||
both maps, and is "biased" towards the first occurrence of a piece of code:
|
||||
if posts A and B define a function `f`, and post C only references `f`, then
|
||||
that link will go to post A's definition, which came earlier.
|
||||
|
||||
The other method, `note_used_href`, is simpler. It just appends to a list
|
||||
of Nokogiri HTML nodes that reference a given `href`. We keep track of the file
|
||||
in which the reference occurred so we can be sure to consult the right sub-map
|
||||
of `@local_seen_hrefs` when checking for in-page rewrites.
|
||||
|
||||
After running `process_source_file` on all Hugo HTML files within a particular
|
||||
series, the following holds true:
|
||||
* We have inserted `span` or `a` nodes wherever Agda's original output
|
||||
had nodes with `id` or `href` elements. This is with the exception of the
|
||||
case where Hugo's inline HTML doesn't "line up" with Agda's inline HTML,
|
||||
which I've only found to happen when the leading character of an identifier is a digit.
|
||||
* We have picked new IDs for each HTML node we inserted that had an ID,
|
||||
noting them both globally and for the current file. We noted their original
|
||||
`href` value (in the form `File.html#123`) and that it should be transformed
|
||||
into our globally-unique identifiers, in the form `agda-unique-ident-1234`.
|
||||
* For each HTML node we inserted that links to another, we noted the `href`
|
||||
of the reference (also in the form `File.html#123`).
|
||||
|
||||
Now, all that's left is to redirect the `href`s of the nodes we inserted
|
||||
from their old values to the new ones. I do this by iterating over `@nodes_referencing_href`,
|
||||
which contains every link we inserted.
|
||||
|
||||
```Ruby
|
||||
def cross_link_files
|
||||
@nodes_referencing_href.each do |href, references|
|
||||
references.each do |reference|
|
||||
file = reference[:file]
|
||||
node = reference[:node]
|
||||
|
||||
local_targets = @local_seen_hrefs[file]
|
||||
if local_targets.include? href
|
||||
# A code block in this file provides this href, create a local link.
|
||||
node['href'] = "##{local_targets[href]}"
|
||||
elsif @global_seen_hrefs.include? href
|
||||
# A code block in this series, but not in this file, defines
|
||||
# this href. Create a cross-file link.
|
||||
target = @global_seen_hrefs[href]
|
||||
other_file = target[:file]
|
||||
id = target[:id]
|
||||
|
||||
relpath = Pathname.new(other_file).dirname.relative_path_from(Pathname.new(file).dirname)
|
||||
node['href'] = "#{relpath}##{id}"
|
||||
else
|
||||
# No definitions in any blog page. For now, just delete the anchor.
|
||||
node.replace node.content
|
||||
end
|
||||
end
|
||||
end
|
||||
end
|
||||
```
|
||||
|
||||
Notice that for the time being, I simply remove links to Agda definitions that
|
||||
didn't occur in the Hugo post. Ideally, this would link to the plain, non-blog
|
||||
documentation page generated by Agda; however, this requires either hosting
|
||||
those documentation pages, or expecting the Agda standard library HTML pages
|
||||
to remain stable and hosted at a fixed URL. Neither was simple enough to do,
|
||||
so I opted for the conservative "just don't insert links" approach.
|
||||
|
||||
{{< todo >}}This isn't as important probably, but might be worth talking about. {{< /todo >}}
|
||||
|
||||
@ -253,4 +590,4 @@ end
|
||||
```
|
||||
|
||||
In short, it traverses all the folders in my `code` directory -- which is where
|
||||
I keep my code, looking for Agda source files. Once it finds them,
|
||||
I keep my code -- looking for Agda source files. Once it finds them,
|
||||
|
Loading…
Reference in New Issue
Block a user