Compare commits

..

No commits in common. "fdaec6d5a9c4b5f206a6110855c7b634de4f480d" and "e9f2378b47542c0a8c12f6a365720d96f986630c" have entirely different histories.

View File

@ -1,7 +1,8 @@
---
title: Rendering Mathematics On The Back End
date: 2020-07-21T14:54:26-07:00
tags: ["Website", "Nix", "Ruby", "KaTeX"]
date: 2020-07-15T15:27:19-07:00
draft: true
tags: ["Website", "Nix", "Ruby", "KaTeX", "Hugo"]
---
Due to something of a streak of bad luck when it came to computers, I spent a
@ -25,7 +26,7 @@ for displaying things like inference rules, didn't work without
JavaScript. I was left with two options:
* Allow JavaScript, and continue using MathJax to render my math.
* Make it so that the mathematics are rendered on the back end.
* Make it so that the mathematics is rendered on the back end.
I've [previously written about math rendering]({{< relref "math_rendering_is_wrong.md" >}}),
and made the observation that MathJax's output for LaTeX is __identical__
@ -89,7 +90,7 @@ to `node2nix`:
]
```
The Ruby script I wrote for this (more on that soon) required the `nokogiri` gem, which
The Ruby script I wrote for this (more on that soon) required the `nokigiri` gem, which
I used for traversing the HTML generated for my site. Hugo was obviously required to
generate the HTML.
@ -104,13 +105,10 @@ page advertises server-side rendering. Their documentation [(link)](https://kate
even shows (at least as of the time this email was sent) that it renders both HTML
(to be arranged nicely with their CSS) for visuals and MathML for accessibility.
The author of the email then kindly provided a link to a page they generated using KaTeX and
some Bash scripts. The math on this page was rendered at the time it was generated.
This is a great point, and KaTeX is indeed usable for server-side rendering. But I've
seen few people who do actually use it. Unfortunately, as I pointed out in my previous post on the subject,
few tools actually take your HTML page and replace LaTeX with rendered math.
Here's what I wrote about this last time:
few tools remain that provide the software that actually takes your HTML page and substitutes
LaTeX for math.
> [In MathJax,] The bigger issue, though, was that the `page2html`
program, which rendered all the mathematics in a single HTML page,
@ -121,14 +119,8 @@ which replaced mathematical expressions in a page with their SVG forms.
This is still the case, in both MathJax and KaTeX. The ability
to render math in one step is the main selling point of front-end LaTeX renderers:
all you have to do is drop in a file from a CDN, and voila, you have your
math. There are no such easy answers for back-end rendering. In fact,
as we will soon see, it's not possible to just search-and-replace occurences
of mathematics on your page, either. To actually get KaTeX working
on the backend, you need access to tools that handle the potential variety
of edge cases associated with HTML. Such tools, to my knowledge, do not
currently exist.
I decided to write my own Ruby script to get the job done. From this script, I
math. There are no such easy answers for back-end rendering. I decided
to write my own Ruby script to get the job done. From this script, I
would call the `katex` command-line program, which would perform
the heavy lifting of rendering the mathematics.
@ -178,16 +170,13 @@ end
There's a bit of a trick to the final layer of this script. We want to be
really careful about where we replace LaTeX, and where we don't. In
particular, we _don't_ want to go into the `code` tags. Otherwise,
it wouldn't be possible to talk about LaTeX code! I also suspect that
some captions, alt texts, and similar elements should also be left alone.
However, I don't have those on my website (yet), and I won't worry about
them now. Either way, because of the code tags,
we can't just search-and-replace over the entire page; we need to be context
aware. This is where `nokogiri` comes in. We parse the HTML, and iterate
it wouldn't be possible to talk about LaTeX code! Thus, we can't just
search-and-replace over the entire HTML document; we need to be context
aware. This is where `nokigiri` comes in. We parse the HTML, and iterate
over all of the 'text' nodes, calling `perform_katex_sub` on all
of those that _aren't_ inside code tags.
Fortunately, this kind of iteration is pretty easy to specify thanks to something called XPath.
Fortunately, this is pretty easy to specify thanks to something called XPath.
This was my first time encountering it, but it seems extremely useful: it's
a sort of language for selecting XML nodes. First, you provide an 'axis',
which is used to specify the positions of the nodes you want to look at
@ -222,7 +211,7 @@ All in all:
//*[not(self::code)]/text()
```
Finally, we use this XPath from `nokogiri`:
Finally, we use this XPath from `nokigiri`:
```Ruby {linenos=table}
files = ARGV[0..-1]
@ -247,8 +236,7 @@ I used Nix for this, but the below script will largely be compatible with a non-
I came up with the following, commenting on Nix-specific commands:
```Bash {linenos=table}
# Nix-specific; set up paths.
source $stdenv/setup
source $stdenv/setup # Nix-specific; set up paths.
# Build site with Hugo
# The cp is Nix-specific; it copies the blog source into the current directory.
@ -278,7 +266,7 @@ take a few dozen seconds to run on my relatively small site. The
better approach would be to use a NodeJS script, rather than a Ruby one,
to perform the conversion. KaTeX also provides an API, so such a NodeJS
script can find the files, parse the HTML, and perform the substitutions.
I did quite like using `nokogiri` here, though, and I hope that an equivalently
I did quite like using `nokigiri` here, though, and I hope that an equivalently
pleasant solution exists in JavaScript.
Re-rendering the whole website is also pretty wasteful. I rarely change the
@ -287,15 +275,6 @@ to re-run the script, and therefore re-render every page. This makes sense
for me, since I use Nix, and my builds are pretty much always performed
from scratch. On the other hand, for others, this may not be the best solution.
### Alternatives
The same person who sent me the original email above also pointed out
[this `pandoc` filter for KaTeX](https://github.com/Zaharid/pandoc_static_katex).
I do not use Pandoc, but from what I can see, this fitler relies on
Pandoc's `Math` AST nodes, and applies KaTeX to each of those. This
should work, but wasn't applicable in my case, since Hugo's shrotcodes
don't mix well with Pandoc. However, it certainly seems like a workable
solution.
### Conclusion
With the removal of MathJax from my site, it is now completely JavaScript free,
and contains virtually the same HTML that it did beforehand. This, I hope,