blog-static/content/blog/codelines/index.md

12 KiB

title date tags
Pleasant Code Includes with Hugo 2021-01-13T21:31:29-08:00
Hugo

Ever since I started [the compiler series]({{< relref "00_compiler_intro.md" >}}), I began to include more and more fragments of code into my blog. I didn't want to be copy-pasting my code between my project and my Markdown files, so I quickly wrote up a Hugo shortcode to pull in other files in the local directory. I've since improved on this some more, so I thought I'd share what I created with others.

Including Entire Files and Lines

My needs for snippets were modest at first. For the most part, I had a single code file that I wanted to present, so it was acceptable to plop it in the middle of my post in one piece. The shortcode for that was quite simple:

{{ highlight (readFile (printf "code/%s" (.Get 1))) (.Get 0) "" }}

This leverages Hugo's built-in highlight function to provide syntax highlighting to the included snippet. Hugo doesn't guess at the language of the code, so you have to manually provide it. Calling this shortcode looks as follows:

{{</* codeblock "C++" "compiler/03/type.hpp" */>}}

Note that this implicitly adds the code/ prefix to all the files I include. This is a personal convention: I want all my code to be inside a dedicated directory.

Of course, including entire files only takes you so far. What if you only need to discuss a small part of your code? Alternaitvely, what if you want to present code piece-by-piece, in the style of literate programming? I quickly ran into the need to do this, for which I wrote another shortcode:

{{ $s := (readFile (printf "code/%s" (.Get 1))) }}
{{ $t := split $s "\n" }}
{{ if not (eq (int (.Get 2)) 1) }}
{{ .Scratch.Set "u" (after (sub (int (.Get 2)) 1) $t) }}
{{ else }}
{{ .Scratch.Set "u" $t }}
{{ end }}
{{ $v := first (add (sub (int (.Get 3)) (int (.Get 2))) 1) (.Scratch.Get "u") }}
{{ if (.Get 4) }}
{{ .Scratch.Set "opts" (printf ",%s" (.Get 4)) }}
{{ else }}
{{ .Scratch.Set "opts" "" }}
{{ end }}
{{ highlight (delimit $v "\n") (.Get 0) (printf "linenos=table,linenostart=%d%s" (.Get 2) (.Scratch.Get "opts")) }}

This shortcode takes a language and a filename as before, but it also takes the numbers of the first and last lines indicating the part of the code that should be included. After splitting the contents of the file into lines, it throws away all lines before and after the window of code that you want to include. It seems to me (from my commit history) that Hugo's after function (which should behave similarly to Haskell's drop) doesn't like to be given an argument of 0. I had to add a special case for when this would occur, where I simply do not invoke after at all. The shortcode can be used as follows:

{{</* codelines "C++" "compiler/04/ast.cpp" 19 22 */>}}

To support a fuller range of Hugo's functionality, I also added an optional argument that accepts Hugo's Chroma settings. This way, I can do things like highlight certain lines in my code snippet, which is done as follows:

{{</* codelines "Idris" "typesafe-interpreter/TypesafeIntrV3.idr" 31 39 "hl_lines=7 8 9" */>}}

Note that the hl_lines field doesn't seem to work properly with linenostart, which means that the highlighted lines are counted from 1 no matter what. This is why in the above snippet, although I include lines 31 through 39, I feed lines 7, 8, and 9 to hl_lines. It's unusual, but hey, it works!

Linking to Referenced Code

Some time after implementing my initial system for including lines of code, I got an email from a reader who pointed out that it was hard for them to find the exact file I was referencing, and to view the surrounding context of the presented lines. To address this, I decided that I'd include the link to the file in question. After all, my website and all the associated code is on a Git server I host, so any local file I'm referencing should -- assuming it was properly committed -- show up there, too. I hardcoded the URL of the code directory on the web interface, and appended the relative path of each included file to it. The shortcode came out as follows:

{{ $s := (readFile (printf "code/%s" (.Get 1))) }}
{{ $t := split $s "\n" }}
{{ if not (eq (int (.Get 2)) 1) }}
{{ .Scratch.Set "u" (after (sub (int (.Get 2)) 1) $t) }}
{{ else }}
{{ .Scratch.Set "u" $t }}
{{ end }}
{{ $v := first (add (sub (int (.Get 3)) (int (.Get 2))) 1) (.Scratch.Get "u") }}
{{ if (.Get 4) }}
{{ .Scratch.Set "opts" (printf ",%s" (.Get 4)) }}
{{ else }}
{{ .Scratch.Set "opts" "" }}
{{ end }}
<div class="highlight-group">
    <div class="highlight-label">From <a href="https://dev.danilafe.com/Web-Projects/blog-static/src/branch/master/code/{{ .Get 1 }}">{{ path.Base (.Get 1) }}</a>,
        {{ if eq (.Get 2) (.Get 3) }}line {{ .Get 2 }}{{ else }} lines {{ .Get 2 }} through {{ .Get 3 }}{{ end }}</div>
    {{ highlight (delimit $v "\n") (.Get 0) (printf "linenos=table,linenostart=%d%s" (.Get 2) (.Scratch.Get "opts")) }}
</div>

This results in code blocks like the one in the image below. The image is the result of the codelines call for the Idris language, presented above.

{{< figure src="example.png" caption="An example of how the code looks." class="medium" >}}

I got a lot of mileage out of this setup . . . until I wanted to include code from other git repositories. For instance, I wanted to talk about my Advent of Code submissions, without having to copy-paste the code into my blog repository!

Code from Submodules

My first thought when including code from other repositories was to use submodules. This has the added advantage of "pinning" the version of the code I'm talking about, which means that even if I push significant changes to the other repository, the code in my blog will remain the same. This, in turn, means that all of my codelines shortcodes will work as intended.

The problem is, most Git web interfaces (my own included) don't display paths corresponding to submodules. Thus, even if all my code is checked out and Hugo correctly pulls the selected lines into its HTML output, the links to the file remain broken!

There's no easy way to address this, particularly because different submodules can be located on different hosts! The Git URL used for a submodule is not known to Hugo (since, to the best of my knowledge, it can't run shell commands), and it could reside on dev.danilafe.com, or github.com, or elsewhere. Fortunately, it's fairly easy to tell when a file is part of a submodule, and which submodule that is. It's sufficient to find the longest submodule path that matches the selected file. If no submodule path matches, then the file is part of the blog repository, and no special action is needed.

Of course, this means that Hugo needs to be made aware of the various submodules in my repository. It also needs to be aware of the submodules inside those submodules, and so on: it needs to be recursive. Git has a command to list all submodules recursively:

git submodule status --recursive

However, this only prints the commit, submodule path, and the upstream branch. I don't think there's a way to list the remotes' URLs with this command; however, we do need the URLs, since that's how we create links to the Git web interfaces.

There's another issue: how do we let Hugo know about the various submodules, even if we can find them? Hugo can read files, but doing any serious text processing is downright impractical. However, Hugo itself is not able to run commands, so it needs to be able to read in the output of another command that can find submodules.

I settled on using Hugo's params configuration option. This allows users to communicate arbitrary properties to Hugo themes and templates. In my case, I want to communicate a collection of submodules. I didn't know about TOML's inline tables, so I decided to represent this collection as a map of (meaningless) submodule names to tables:

[params]
  [params.submoduleLinks]
    [params.submoduleLinks.aoc2020]
      url = "https://dev.danilafe.com/Advent-of-Code/AdventOfCode-2020/src/commit/7a8503c3fe1aa7e624e4d8672aa9b56d24b4ba82"
      path = "aoc-2020"

Since it was seemingly impossible to wrangle Git into outputting all of this information using one command, I decided to write a quick Ruby script to generate a list of submodules as follows. I had to use cd in one of my calls to Git because Git's --git-dir option doesn't seem to work with submodules, treating them like a "bare" checkout. I also chose to use an allowlist of remote URLs, since the URL format for linking to files in a particular repository differs from service to service. For now, I only use my own Git server, so only dev.danilafe.com is allowed; however, just by adding elsifs to my code, I can add other services in the future.

puts "[params]"
puts "  [params.submoduleLinks]"

def each_submodule(base_path)
  `cd #{base_path} && git submodule status`.lines do |line|
    hash, path = line[1..].split " "
    full_path = "#{base_path}/#{path}"
    url = `git config --file #{base_path}/.gitmodules --get 'submodule.#{path}.url'`.chomp.delete_suffix(".git")
    safe_name = full_path.gsub(/\/|-|_\./, "")

    if url =~ /dev.danilafe.com/
      file_url = "#{url}/src/commit/#{hash}"
    else
      raise "Submodule URL #{url.dump} not in a known format!"
    end

    yield ({ :path => full_path, :url => file_url, :name => safe_name })
    each_submodule(full_path) { |m| yield m }
  end
end

each_submodule(".") do |m|
  next unless m[:path].start_with? "./code/"
  puts "    [params.submoduleLinks.#{m[:name].delete_prefix(".code")}]"
  puts "      url = #{m[:url].dump}"
  puts "      path = #{m[:path].delete_prefix("./code/").dump}"
end

I pipe the output of this script into a separate configuration file called config-gen.toml, and then run Hugo as follows:

hugo --config config.toml,config-gen.toml

Finally, I had to modify my shortcode to find and handle the longest submodule prefix. Here's the relevant portion, and you can view the entire file here.

{{ .Scratch.Set "bestLength" -1 }}
{{ .Scratch.Set "bestUrl" (printf "https://dev.danilafe.com/Web-Projects/blog-static/src/branch/master/code/%s" (.Get 1)) }}
{{ $filePath := (.Get 1) }}
{{ $scratch := .Scratch }}
{{ range $module, $props := .Site.Params.submoduleLinks }}
{{ $path := index $props "path" }}
{{ $bestLength := $scratch.Get "bestLength" }}
{{ if and (le $bestLength (len $path)) (hasPrefix $filePath $path) }}
{{ $scratch.Set "bestLength" (len $path) }}
{{ $scratch.Set "bestUrl" (printf "%s%s" (index $props "url") (strings.TrimPrefix $path $filePath)) }}
{{ end }}
{{ end }}

And that's what I'm using at the time of writing!

Conclusion

My current system for code includes allows me to do the following things:

  • Include entire files or sections of files into the page. This saves me from having to copy and paste code manually, which is error prone and can cause inconsistencies.
  • Provide links to the files I reference on my Git interface. This allows users to easily view the entire file that I'm talking about.
  • Correctly link to files in repositories other than my blog repository, when they are included using submodules. This means I don't need to manually copy and update code from other projects.

I hope some of these shortcodes and script come in handy for someone else. Thank you for reading!