diff --git a/content/blog/codelines/example.png b/content/blog/codelines/example.png new file mode 100644 index 0000000..aed915f Binary files /dev/null and b/content/blog/codelines/example.png differ diff --git a/content/blog/codelines/index.md b/content/blog/codelines/index.md new file mode 100644 index 0000000..799b34b --- /dev/null +++ b/content/blog/codelines/index.md @@ -0,0 +1,268 @@ +--- +title: "Pleasant Code Includes with Hugo" +date: 2021-01-13T21:31:29-08:00 +tags: ["Hugo"] +--- + +Ever since I started [the compiler series]({{< relref "00_compiler_intro.md" >}}), +I began to include more and more fragments of code into my blog. +I didn't want to be copy-pasting my code between my project +and my Markdown files, so I quickly wrote up a Hugo [shortcode](https://gohugo.io/content-management/shortcodes/) +to pull in other files in the local directory. I've since improved on this +some more, so I thought I'd share what I created with others. + +### Including Entire Files and Lines +My needs for snippets were modest at first. For the most part, +I had a single code file that I wanted to present, so it was +acceptable to plop it in the middle of my post in one piece. +The shortcode for that was quite simple: + +``` +{{ highlight (readFile (printf "code/%s" (.Get 1))) (.Get 0) "" }} +``` + +This leverages Hugo's built-in [`highlight`](https://gohugo.io/functions/highlight/) +function to provide syntax highlighting to the included snippet. Hugo +doesn't guess at the language of the code, so you have to manually provide +it. Calling this shortcode looks as follows: + +``` +{{}} +``` + +Note that this implicitly adds the `code/` prefix to all +the files I include. This is a personal convention: I want +all my code to be inside a dedicated directory. + +Of course, including entire files only takes you so far. +What if you only need to discuss a small part of your code? +Alternaitvely, what if you want to present code piece-by-piece, +in the style of literate programming? I quickly ran into the +need to do this, for which I wrote another shortcode: + +``` +{{ $s := (readFile (printf "code/%s" (.Get 1))) }} +{{ $t := split $s "\n" }} +{{ if not (eq (int (.Get 2)) 1) }} +{{ .Scratch.Set "u" (after (sub (int (.Get 2)) 1) $t) }} +{{ else }} +{{ .Scratch.Set "u" $t }} +{{ end }} +{{ $v := first (add (sub (int (.Get 3)) (int (.Get 2))) 1) (.Scratch.Get "u") }} +{{ if (.Get 4) }} +{{ .Scratch.Set "opts" (printf ",%s" (.Get 4)) }} +{{ else }} +{{ .Scratch.Set "opts" "" }} +{{ end }} +{{ highlight (delimit $v "\n") (.Get 0) (printf "linenos=table,linenostart=%d%s" (.Get 2) (.Scratch.Get "opts")) }} +``` + +This shortcode takes a language and a filename as before, but it also takes +the numbers of the first and last lines indicating the part of the code that should be included. After +splitting the contents of the file into lines, it throws away all lines before and +after the window of code that you want to include. It seems to me (from my commit history) +that Hugo's [`after`](https://gohugo.io/functions/after/) function (which should behave +similarly to Haskell's `drop`) doesn't like to be given an argument of `0`. +I had to add a special case for when this would occur, where I simply do not invoke `after` at all. +The shortcode can be used as follows: + +``` +{{}} +``` + +To support a fuller range of Hugo's functionality, I also added an optional argument that +accepts Hugo's Chroma settings. This way, I can do things like highlight certain +lines in my code snippet, which is done as follows: + +``` +{{}} +``` + +Note that the `hl_lines` field doesn't seem to work properly with `linenostart`, which means +that the highlighted lines are counted from 1 no matter what. This is why in the above snippet, +although I include lines 31 through 39, I feed lines 7, 8, and 9 to `hl_lines`. It's unusual, +but hey, it works! + +### Linking to Referenced Code +Some time after implementing my initial system for including lines of code, +I got an email from a reader who pointed out that it was hard for them to find +the exact file I was referencing, and to view the surrounding context of the +presented lines. To address this, I decided that I'd include the link +to the file in question. After all, my website and all the associated +code is on a [Git server I host](https://dev.danilafe.com/Web-Projects/blog-static), +so any local file I'm referencing should -- assuming it was properly committed -- +show up there, too. I hardcoded the URL of the `code` directory on the web interface, +and appended the relative path of each included file to it. The shortcode came out as follows: + +``` +{{ $s := (readFile (printf "code/%s" (.Get 1))) }} +{{ $t := split $s "\n" }} +{{ if not (eq (int (.Get 2)) 1) }} +{{ .Scratch.Set "u" (after (sub (int (.Get 2)) 1) $t) }} +{{ else }} +{{ .Scratch.Set "u" $t }} +{{ end }} +{{ $v := first (add (sub (int (.Get 3)) (int (.Get 2))) 1) (.Scratch.Get "u") }} +{{ if (.Get 4) }} +{{ .Scratch.Set "opts" (printf ",%s" (.Get 4)) }} +{{ else }} +{{ .Scratch.Set "opts" "" }} +{{ end }} +
+
From {{ path.Base (.Get 1) }}, + {{ if eq (.Get 2) (.Get 3) }}line {{ .Get 2 }}{{ else }} lines {{ .Get 2 }} through {{ .Get 3 }}{{ end }}
+ {{ highlight (delimit $v "\n") (.Get 0) (printf "linenos=table,linenostart=%d%s" (.Get 2) (.Scratch.Get "opts")) }} +
+``` + +This results in code blocks like the one in the image below. The image +is the result of the `codelines` call for the Idris language, presented above. + +{{< figure src="example.png" caption="An example of how the code looks." class="medium" >}} + +I got a lot of mileage out of this setup . . . until I wanted to include code from _other_ git repositories. +For instance, I wanted to talk about my [Advent of Code](https://adventofcode.com/) submissions, +without having to copy-paste the code into my blog repository! + +### Code from Submodules +My first thought when including code from other repositories was to use submodules. +This has the added advantage of "pinning" the version of the code I'm talking about, +which means that even if I push significant changes to the other repository, the code +in my blog will remain the same. This, in turn, means that all of my `codelines` +shortcodes will work as intended. + +The problem is, most Git web interfaces (my own included) don't display paths corresponding +to submodules. Thus, even if all my code is checked out and Hugo correctly +pulls the selected lines into its HTML output, the _links to the file_ remain +broken! + +There's no easy way to address this, particularly because _different submodules +can be located on different hosts_! The Git URL used for a submodule is +not known to Hugo (since, to the best of my knowledge, it can't run +shell commands), and it could reside on `dev.danilafe.com`, or `github.com`, +or elsewhere. Fortunately, it's fairly easy to tell when a file is part +of a submodule, and which submodule that is. It's sufficient to find +the longest submodule path that matches the selected file. If no +submodule path matches, then the file is part of the blog repository, +and no special action is needed. + +Of course, this means that Hugo needs to be made aware of the various +submodules in my repository. It also needs to be aware of the submodules +_inside_ those submodules, and so on: it needs to be recursive. Git +has a command to list all submodules recursively: + +```Bash +git submodule status --recursive +``` + +However, this only prints the commit, submodule path, and the upstream branch. +I don't think there's a way to list the remotes' URLs with this command; however, +we do _need_ the URLs, since that's how we create links to the Git web interfaces. + +There's another issue: how do we let Hugo know about the various submodules, +even if we can find them? Hugo can read files, but doing any serious +text processing is downright impractical. However, Hugo +itself is not able to run commands, so it needs to be able to read in +the output of another command that _can_ find submodules. + +I settled on using Hugo's `params` configuration option. This +allows users to communicate arbitrary properties to Hugo themes +and templates. In my case, I want to communicate a collection +of submodules. I didn't know about TOML's inline tables, so +I decided to represent this collection as a map of (meaningless) +submodule names to tables: + +```TOML +[params] + [params.submoduleLinks] + [params.submoduleLinks.aoc2020] + url = "https://dev.danilafe.com/Advent-of-Code/AdventOfCode-2020/src/commit/7a8503c3fe1aa7e624e4d8672aa9b56d24b4ba82" + path = "aoc-2020" +``` + +Since it was seemingly impossible to wrangle Git into outputting +all of this information using one command, I decided +to write a quick Ruby script to generate a list of submodules +as follows. I had to use `cd` in one of my calls to Git +because Git's `--git-dir` option doesn't seem to work +with submodules, treating them like a "bare" checkout. +I also chose to use an allowlist of remote URLs, +since the URL format for linking to files in a +particular repository differs from service to service. +For now, I only use my own Git server, so only `dev.danilafe.com` +is allowed; however, just by adding `elsif`s to my code, +I can add other services in the future. + +```Ruby +puts "[params]" +puts " [params.submoduleLinks]" + +def each_submodule(base_path) + `cd #{base_path} && git submodule status`.lines do |line| + hash, path = line[1..].split " " + full_path = "#{base_path}/#{path}" + url = `git config --file #{base_path}/.gitmodules --get 'submodule.#{path}.url'`.chomp.delete_suffix(".git") + safe_name = full_path.gsub(/\/|-|_\./, "") + + if url =~ /dev.danilafe.com/ + file_url = "#{url}/src/commit/#{hash}" + else + raise "Submodule URL #{url.dump} not in a known format!" + end + + yield ({ :path => full_path, :url => file_url, :name => safe_name }) + each_submodule(full_path) { |m| yield m } + end +end + +each_submodule(".") do |m| + next unless m[:path].start_with? "./code/" + puts " [params.submoduleLinks.#{m[:name].delete_prefix(".code")}]" + puts " url = #{m[:url].dump}" + puts " path = #{m[:path].delete_prefix("./code/").dump}" +end +``` + +I pipe the output of this script into a separate configuration file +called `config-gen.toml`, and then run Hugo as follows: + +``` +hugo --config config.toml,config-gen.toml +``` + +Finally, I had to modify my shortcode to find and handle the longest submodule prefix. +Here's the relevant portion, and you can +[view the entire file here](https://dev.danilafe.com/Web-Projects/blog-static/src/commit/bfeae89ab52d1696c4a56768b7f0c6682efaff82/themes/vanilla/layouts/shortcodes/codelines.html). + +``` +{{ .Scratch.Set "bestLength" -1 }} +{{ .Scratch.Set "bestUrl" (printf "https://dev.danilafe.com/Web-Projects/blog-static/src/branch/master/code/%s" (.Get 1)) }} +{{ $filePath := (.Get 1) }} +{{ $scratch := .Scratch }} +{{ range $module, $props := .Site.Params.submoduleLinks }} +{{ $path := index $props "path" }} +{{ $bestLength := $scratch.Get "bestLength" }} +{{ if and (le $bestLength (len $path)) (hasPrefix $filePath $path) }} +{{ $scratch.Set "bestLength" (len $path) }} +{{ $scratch.Set "bestUrl" (printf "%s%s" (index $props "url") (strings.TrimPrefix $path $filePath)) }} +{{ end }} +{{ end }} +``` + +And that's what I'm using at the time of writing! + +### Conclusion +My current system for code includes allows me to do the following +things: + +* Include entire files or sections of files into the page. This +saves me from having to copy and paste code manually, which +is error prone and can cause inconsistencies. +* Provide links to the files I reference on my Git interface. +This allows users to easily view the entire file that I'm talking about. +* Correctly link to files in repositories other than my blog +repository, when they are included using submodules. This means +I don't need to manually copy and update code from other projects. + +I hope some of these shortcodes and script come in handy for someone else. +Thank you for reading!