blog-static/content/blog/codelines/index.md

269 lines
12 KiB
Markdown
Raw Normal View History

2021-01-13 21:39:35 -08:00
---
title: "Pleasant Code Includes with Hugo"
date: 2021-01-13T21:31:29-08:00
tags: ["Hugo"]
---
Ever since I started [the compiler series]({{< relref "00_compiler_intro.md" >}}),
I began to include more and more fragments of code into my blog.
I didn't want to be copy-pasting my code between my project
and my Markdown files, so I quickly wrote up a Hugo [shortcode](https://gohugo.io/content-management/shortcodes/)
to pull in other files in the local directory. I've since improved on this
some more, so I thought I'd share what I created with others.
### Including Entire Files and Lines
My needs for snippets were modest at first. For the most part,
I had a single code file that I wanted to present, so it was
acceptable to plop it in the middle of my post in one piece.
The shortcode for that was quite simple:
```
{{ highlight (readFile (printf "code/%s" (.Get 1))) (.Get 0) "" }}
```
This leverages Hugo's built-in [`highlight`](https://gohugo.io/functions/highlight/)
function to provide syntax highlighting to the included snippet. Hugo
doesn't guess at the language of the code, so you have to manually provide
it. Calling this shortcode looks as follows:
```
{{</* codeblock "C++" "compiler/03/type.hpp" */>}}
```
Note that this implicitly adds the `code/` prefix to all
the files I include. This is a personal convention: I want
all my code to be inside a dedicated directory.
Of course, including entire files only takes you so far.
What if you only need to discuss a small part of your code?
Alternaitvely, what if you want to present code piece-by-piece,
in the style of literate programming? I quickly ran into the
need to do this, for which I wrote another shortcode:
```
{{ $s := (readFile (printf "code/%s" (.Get 1))) }}
{{ $t := split $s "\n" }}
{{ if not (eq (int (.Get 2)) 1) }}
{{ .Scratch.Set "u" (after (sub (int (.Get 2)) 1) $t) }}
{{ else }}
{{ .Scratch.Set "u" $t }}
{{ end }}
{{ $v := first (add (sub (int (.Get 3)) (int (.Get 2))) 1) (.Scratch.Get "u") }}
{{ if (.Get 4) }}
{{ .Scratch.Set "opts" (printf ",%s" (.Get 4)) }}
{{ else }}
{{ .Scratch.Set "opts" "" }}
{{ end }}
{{ highlight (delimit $v "\n") (.Get 0) (printf "linenos=table,linenostart=%d%s" (.Get 2) (.Scratch.Get "opts")) }}
```
This shortcode takes a language and a filename as before, but it also takes
the numbers of the first and last lines indicating the part of the code that should be included. After
splitting the contents of the file into lines, it throws away all lines before and
after the window of code that you want to include. It seems to me (from my commit history)
that Hugo's [`after`](https://gohugo.io/functions/after/) function (which should behave
similarly to Haskell's `drop`) doesn't like to be given an argument of `0`.
I had to add a special case for when this would occur, where I simply do not invoke `after` at all.
The shortcode can be used as follows:
```
{{</* codelines "C++" "compiler/04/ast.cpp" 19 22 */>}}
```
To support a fuller range of Hugo's functionality, I also added an optional argument that
accepts Hugo's Chroma settings. This way, I can do things like highlight certain
lines in my code snippet, which is done as follows:
```
{{</* codelines "Idris" "typesafe-interpreter/TypesafeIntrV3.idr" 31 39 "hl_lines=7 8 9" */>}}
```
Note that the `hl_lines` field doesn't seem to work properly with `linenostart`, which means
that the highlighted lines are counted from 1 no matter what. This is why in the above snippet,
although I include lines 31 through 39, I feed lines 7, 8, and 9 to `hl_lines`. It's unusual,
but hey, it works!
### Linking to Referenced Code
Some time after implementing my initial system for including lines of code,
I got an email from a reader who pointed out that it was hard for them to find
the exact file I was referencing, and to view the surrounding context of the
presented lines. To address this, I decided that I'd include the link
to the file in question. After all, my website and all the associated
code is on a [Git server I host](https://dev.danilafe.com/Web-Projects/blog-static),
so any local file I'm referencing should -- assuming it was properly committed --
show up there, too. I hardcoded the URL of the `code` directory on the web interface,
and appended the relative path of each included file to it. The shortcode came out as follows:
```
{{ $s := (readFile (printf "code/%s" (.Get 1))) }}
{{ $t := split $s "\n" }}
{{ if not (eq (int (.Get 2)) 1) }}
{{ .Scratch.Set "u" (after (sub (int (.Get 2)) 1) $t) }}
{{ else }}
{{ .Scratch.Set "u" $t }}
{{ end }}
{{ $v := first (add (sub (int (.Get 3)) (int (.Get 2))) 1) (.Scratch.Get "u") }}
{{ if (.Get 4) }}
{{ .Scratch.Set "opts" (printf ",%s" (.Get 4)) }}
{{ else }}
{{ .Scratch.Set "opts" "" }}
{{ end }}
<div class="highlight-group">
<div class="highlight-label">From <a href="https://dev.danilafe.com/Web-Projects/blog-static/src/branch/master/code/{{ .Get 1 }}">{{ path.Base (.Get 1) }}</a>,
{{ if eq (.Get 2) (.Get 3) }}line {{ .Get 2 }}{{ else }} lines {{ .Get 2 }} through {{ .Get 3 }}{{ end }}</div>
{{ highlight (delimit $v "\n") (.Get 0) (printf "linenos=table,linenostart=%d%s" (.Get 2) (.Scratch.Get "opts")) }}
</div>
```
This results in code blocks like the one in the image below. The image
is the result of the `codelines` call for the Idris language, presented above.
{{< figure src="example.png" caption="An example of how the code looks." class="medium" >}}
I got a lot of mileage out of this setup . . . until I wanted to include code from _other_ git repositories.
For instance, I wanted to talk about my [Advent of Code](https://adventofcode.com/) submissions,
without having to copy-paste the code into my blog repository!
### Code from Submodules
My first thought when including code from other repositories was to use submodules.
This has the added advantage of "pinning" the version of the code I'm talking about,
which means that even if I push significant changes to the other repository, the code
in my blog will remain the same. This, in turn, means that all of my `codelines`
shortcodes will work as intended.
The problem is, most Git web interfaces (my own included) don't display paths corresponding
to submodules. Thus, even if all my code is checked out and Hugo correctly
pulls the selected lines into its HTML output, the _links to the file_ remain
broken!
There's no easy way to address this, particularly because _different submodules
can be located on different hosts_! The Git URL used for a submodule is
not known to Hugo (since, to the best of my knowledge, it can't run
shell commands), and it could reside on `dev.danilafe.com`, or `github.com`,
or elsewhere. Fortunately, it's fairly easy to tell when a file is part
of a submodule, and which submodule that is. It's sufficient to find
the longest submodule path that matches the selected file. If no
submodule path matches, then the file is part of the blog repository,
and no special action is needed.
Of course, this means that Hugo needs to be made aware of the various
submodules in my repository. It also needs to be aware of the submodules
_inside_ those submodules, and so on: it needs to be recursive. Git
has a command to list all submodules recursively:
```Bash
git submodule status --recursive
```
However, this only prints the commit, submodule path, and the upstream branch.
I don't think there's a way to list the remotes' URLs with this command; however,
we do _need_ the URLs, since that's how we create links to the Git web interfaces.
There's another issue: how do we let Hugo know about the various submodules,
even if we can find them? Hugo can read files, but doing any serious
text processing is downright impractical. However, Hugo
itself is not able to run commands, so it needs to be able to read in
the output of another command that _can_ find submodules.
I settled on using Hugo's `params` configuration option. This
allows users to communicate arbitrary properties to Hugo themes
and templates. In my case, I want to communicate a collection
of submodules. I didn't know about TOML's inline tables, so
I decided to represent this collection as a map of (meaningless)
submodule names to tables:
```TOML
[params]
[params.submoduleLinks]
[params.submoduleLinks.aoc2020]
url = "https://dev.danilafe.com/Advent-of-Code/AdventOfCode-2020/src/commit/7a8503c3fe1aa7e624e4d8672aa9b56d24b4ba82"
path = "aoc-2020"
```
Since it was seemingly impossible to wrangle Git into outputting
all of this information using one command, I decided
to write a quick Ruby script to generate a list of submodules
as follows. I had to use `cd` in one of my calls to Git
because Git's `--git-dir` option doesn't seem to work
with submodules, treating them like a "bare" checkout.
I also chose to use an allowlist of remote URLs,
since the URL format for linking to files in a
particular repository differs from service to service.
For now, I only use my own Git server, so only `dev.danilafe.com`
is allowed; however, just by adding `elsif`s to my code,
I can add other services in the future.
```Ruby
puts "[params]"
puts " [params.submoduleLinks]"
def each_submodule(base_path)
`cd #{base_path} && git submodule status`.lines do |line|
hash, path = line[1..].split " "
full_path = "#{base_path}/#{path}"
url = `git config --file #{base_path}/.gitmodules --get 'submodule.#{path}.url'`.chomp.delete_suffix(".git")
safe_name = full_path.gsub(/\/|-|_\./, "")
if url =~ /dev.danilafe.com/
file_url = "#{url}/src/commit/#{hash}"
else
raise "Submodule URL #{url.dump} not in a known format!"
end
yield ({ :path => full_path, :url => file_url, :name => safe_name })
each_submodule(full_path) { |m| yield m }
end
end
each_submodule(".") do |m|
next unless m[:path].start_with? "./code/"
puts " [params.submoduleLinks.#{m[:name].delete_prefix(".code")}]"
puts " url = #{m[:url].dump}"
puts " path = #{m[:path].delete_prefix("./code/").dump}"
end
```
I pipe the output of this script into a separate configuration file
called `config-gen.toml`, and then run Hugo as follows:
```
hugo --config config.toml,config-gen.toml
```
Finally, I had to modify my shortcode to find and handle the longest submodule prefix.
Here's the relevant portion, and you can
[view the entire file here](https://dev.danilafe.com/Web-Projects/blog-static/src/commit/bfeae89ab52d1696c4a56768b7f0c6682efaff82/themes/vanilla/layouts/shortcodes/codelines.html).
```
{{ .Scratch.Set "bestLength" -1 }}
{{ .Scratch.Set "bestUrl" (printf "https://dev.danilafe.com/Web-Projects/blog-static/src/branch/master/code/%s" (.Get 1)) }}
{{ $filePath := (.Get 1) }}
{{ $scratch := .Scratch }}
{{ range $module, $props := .Site.Params.submoduleLinks }}
{{ $path := index $props "path" }}
{{ $bestLength := $scratch.Get "bestLength" }}
{{ if and (le $bestLength (len $path)) (hasPrefix $filePath $path) }}
{{ $scratch.Set "bestLength" (len $path) }}
{{ $scratch.Set "bestUrl" (printf "%s%s" (index $props "url") (strings.TrimPrefix $path $filePath)) }}
{{ end }}
{{ end }}
```
And that's what I'm using at the time of writing!
### Conclusion
My current system for code includes allows me to do the following
things:
* Include entire files or sections of files into the page. This
saves me from having to copy and paste code manually, which
is error prone and can cause inconsistencies.
* Provide links to the files I reference on my Git interface.
This allows users to easily view the entire file that I'm talking about.
* Correctly link to files in repositories other than my blog
repository, when they are included using submodules. This means
I don't need to manually copy and update code from other projects.
I hope some of these shortcodes and script come in handy for someone else.
Thank you for reading!