Add math rendering draft.
This commit is contained in:
parent
6a2fec8ef4
commit
cb65e89e53
280
content/blog/backend_math_rendering.md
Normal file
280
content/blog/backend_math_rendering.md
Normal file
@ -0,0 +1,280 @@
|
||||
---
|
||||
title: Rendering Mathematics On The Back End
|
||||
date: 2020-07-15T15:27:19-07:00
|
||||
draft: true
|
||||
tags: ["Website", "Nix", "Ruby", "KaTeX", "Hugo"]
|
||||
---
|
||||
|
||||
Due to something of a streak of bad luck when it came to computers, I spent a
|
||||
significant amount of time using a Linux-based Chromebook, and then a
|
||||
Pinebook Pro. It was, in some way, enlightening. The things that I used to take
|
||||
for granted with a 'powerful' machine now became a rare luxury: StackOverflow,
|
||||
and other relatively static websites, took upwards of ten seconds to finish
|
||||
loading. On Slack, each of my keypresses could take longer than 500ms to
|
||||
appear on the screen, and sometimes, it would take several seconds. Some
|
||||
websites would present me with a white screen, and remain that way for much
|
||||
longer than I had time to wait. It was awful.
|
||||
|
||||
At one point, I installed uMatrix, and made it the default policy to block
|
||||
all JavaScript. For the most part, this worked well. Of course, I had to
|
||||
enable JavaScript for applications that needed to be interactive, like
|
||||
Slack, and Discord. But for the most part, I was able to browse the majority
|
||||
of the websites I normally browse. This went on until I started working
|
||||
on the [compiler series]({{< relref "00_compiler_intro.md" >}}) again,
|
||||
and discovered that the LaTeX math on my page, which was required
|
||||
for displaying things like inference rules, didn't work without
|
||||
JavaScript. I was left with two options:
|
||||
|
||||
* Allow JavaScript, and continue using MathJax to render my math.
|
||||
* Make it so that the mathematics is rendered on the back end.
|
||||
|
||||
I've [previously written about math rendering]({{< relref "math_rendering_is_wrong.md" >}}),
|
||||
and made the observation that MathJax's output for LaTeX is __identical__
|
||||
on every computer. From the MathJax 2.6 change log:
|
||||
|
||||
> _Improved CommonHTML output_. The CommonHTML output now provides the same layout quality and MathML support as the HTML-CSS and SVG output. It is on average 40% faster than the other outputs and the markup it produces are identical on all browsers and thus can also be pre-generated on the server via MathJax-node.
|
||||
|
||||
It seems absurd, then, to offload this kind of work into the users, to
|
||||
be done over and over again. As should be clear from the title of
|
||||
this post, this made me settle for the second option: it was
|
||||
__obviously within reach__, especially for a statically-generated website
|
||||
like mine, to render math on the backend.
|
||||
|
||||
I settled on the following architecture:
|
||||
|
||||
* As before I would generate my pages using Hugo.
|
||||
* I would use the KaTeX NPM package to rendering math.
|
||||
* To build the website no matter what computer I was on, I would use Nix.
|
||||
|
||||
It so happens that Nix isn't really required for using my approach in general.
|
||||
I will give my setup here, but feel free to skip ahead.
|
||||
|
||||
### Setting Up A Nix Build
|
||||
My `default.nix` file looks like this:
|
||||
|
||||
```Nix {linenos=table}
|
||||
{ stdenv, hugo, fetchgit, pkgs, nodejs, ruby }:
|
||||
|
||||
let
|
||||
url = "https://dev.danilafe.com/Web-Projects/blog-static.git";
|
||||
rev = "<commit>";
|
||||
sha256 = "<hash>";
|
||||
requiredPackages = import ./required-packages.nix {
|
||||
inherit pkgs nodejs;
|
||||
};
|
||||
in
|
||||
stdenv.mkDerivation {
|
||||
name = "blog-static";
|
||||
version = rev;
|
||||
src = fetchgit {
|
||||
inherit url rev sha256;
|
||||
};
|
||||
builder = ./builder.sh;
|
||||
converter = ./convert.rb;
|
||||
buildInputs = [
|
||||
hugo
|
||||
requiredPackages.katex
|
||||
(ruby.withPackages (ps: [ ps.nokogiri ]))
|
||||
];
|
||||
}
|
||||
```
|
||||
|
||||
I'm using `node2nix` to generate the `required-packages.nix` file, which allows me,
|
||||
even from a sandboxed Nix build, to download and install `npm` packages. This is needed
|
||||
so that I have access to the `katex` binary at build time. I fed the following JSON file
|
||||
to `node2nix`:
|
||||
|
||||
```JSON {linenos=table}
|
||||
[
|
||||
"katex"
|
||||
]
|
||||
```
|
||||
|
||||
The Ruby script I wrote for this (more on that soon) required the `nokigiri` gem, which
|
||||
I used for traversing the HTML generated for my site. Hugo was obviously required to
|
||||
generate the HTML.
|
||||
|
||||
### Converting LaTeX To HTML
|
||||
After my first post complaining about the state of mathematics on the web, I received
|
||||
the following email (which the author allowed me to share):
|
||||
|
||||
> Sorry for having a random stranger email you, but in your blog post
|
||||
[(link)](https://danilafe.com/blog/math_rendering_is_wrong) you seem to focus on MathJax's
|
||||
difficulty in rendering things server-side, while quietly ignoring that KaTeX's front
|
||||
page advertises server-side rendering. Their documentation [(link)](https://katex.org/docs/options.html)
|
||||
even shows (at least as of the time this email was sent) that it renders both HTML
|
||||
(to be arranged nicely with their CSS) for visuals and MathML for accessibility.
|
||||
|
||||
This is a great point, and KaTeX is indeed usable for server-side rendering. But I've
|
||||
seen few people who do actually use it. Unfortunately, as I pointed out in my previous post on the subject,
|
||||
few tools remain that provide the software that actually takes your HTML page and substitutes
|
||||
LaTeX for math.
|
||||
|
||||
> [In MathJax,] The bigger issue, though, was that the `page2html`
|
||||
program, which rendered all the mathematics in a single HTML page,
|
||||
was gone. I found `tex2html` and `text2htmlcss`, which could only
|
||||
render equations without the surrounding HTML. I also found `mjpage`,
|
||||
which replaced mathematical expressions in a page with their SVG forms.
|
||||
|
||||
This is still the case, in both MathJax and KaTeX. The ability
|
||||
to render math in one step is the main selling point of front-end LaTeX renderers:
|
||||
all you have to do is drop in a file from a CDN, and voila, you have your
|
||||
math. There are no such easy answers for back-end rendering.
|
||||
|
||||
So what _do_ I do? Well, there are two types on my website: inline math and display math.
|
||||
On the command line ([here are the docs](https://katex.org/docs/cli.html)),
|
||||
the distinction is made using the `--display-mode` argument. So, the general algorithm
|
||||
is to replace the code inside the `$$...$$` with their display-rendered version,
|
||||
and the code inside the `\(...\)` with the inline-rendered version. I came up with
|
||||
the following Ruby function:
|
||||
|
||||
```Ruby {linenos=table}
|
||||
def render_cached(cache, command, string, render_comment = nil)
|
||||
cache.fetch(string) do |new|
|
||||
puts " Rendering #{render_comment || new}"
|
||||
cache[string] = Open3.popen3(command) do |i, o, e, t|
|
||||
i.write new
|
||||
i.close
|
||||
o.read.force_encoding(Encoding::UTF_8).strip
|
||||
end
|
||||
end
|
||||
end
|
||||
```
|
||||
|
||||
Here, the `cache` argument is used to prevent re-running the `katex` command
|
||||
on an equation that was already rendered before (the output is the same, after all).
|
||||
The `command` is the specific shell command that we want to invoke; this would
|
||||
be either `katex` or `katex -d`. The `string` is the math equation to render,
|
||||
and the `render_comment` is the string to print to the console instead of the equation
|
||||
(so that long, display math equations are not printed out to standard out).
|
||||
|
||||
Then, given a substring of the HTML file, we use regular expressions
|
||||
to find the `\(...\)` and `$$...$$`s, and use the `render_cached` method
|
||||
on the LaTeX code inside.
|
||||
|
||||
```Ruby {linenos=table}
|
||||
def perform_katex_sub(inline_cache, display_cache, content)
|
||||
rendered = content.gsub /\\\(((?:[^\\]|\\[^\)])*)\\\)/ do |match|
|
||||
render_cached(inline_cache, "katex", $~[1])
|
||||
end
|
||||
rendered = rendered.gsub /\$\$((?:[^\$]|$[^\$])*)\$\$/ do |match|
|
||||
render_cached(display_cache, "katex -d", $~[1], "display")
|
||||
end
|
||||
return rendered
|
||||
end
|
||||
```
|
||||
|
||||
There's a bit of a trick to the final layer of this script. We want to be
|
||||
really careful about where we replace LaTeX, and where we don't. In
|
||||
particular, we _don't_ want to go into the `code` tags. Otherwise,
|
||||
it wouldn't be able to talk about LaTeX code! Thus, we can't just
|
||||
search-and-replace over the entire HTML document; we need to be context
|
||||
aware. This is where `nokigiri` comes in. We parse the HTML, and iterate
|
||||
over all of the 'text' nodes, calling `perform_katex_sub` on all
|
||||
of those that _aren't_ inside code tags.
|
||||
|
||||
Fortunately, this is pretty easy to specify thanks to something called XPath.
|
||||
This was my first time encountering it, but it seems extremely useful: it's
|
||||
a sort of language for selecting XML nodes. First, you provide an 'axis',
|
||||
which is used to specify the positions of the nodes you want to look at
|
||||
relative to the root node. The axis `/` looks at the immediate children
|
||||
(this would be the `html` tag in a properly formatted document, I would imagine).
|
||||
The axis `//` looks at all the transitive children. That is, it will look at the
|
||||
children of the root, then its children, and so on. There's also the `self` axis,
|
||||
which looks at the node itself.
|
||||
|
||||
After you provide an axis, you need to specify the type of node that you want to
|
||||
select. We can write `code`, for instance, to pick only the `<code>....</code>` tags
|
||||
from the axis we've chosen. We can also use `*` to select any node, and we can
|
||||
use `text()` to select text nodes, such as the `Hello` inside of `<b>Hello</b>`.
|
||||
|
||||
We can also apply some more conditions to the nodes we pick using `[]`.
|
||||
For us, the relevant feature here is `not(...)`, which allows us to
|
||||
select nodes that do __not__ match a particular condition. This is all
|
||||
we need to know.
|
||||
|
||||
We write:
|
||||
|
||||
* `//`, starting to search for nodes everywhere, not just the root of the document.
|
||||
* `*`, to match _any_ node. We want to replace math inside of `div`s, `span`s, `nav`s,
|
||||
all of the `h`s, and so on.
|
||||
* `[not(self::code)]` cutting out all the `code` tags.
|
||||
* `/`, now selecting the nodes that are immediate descendants of the nodes we've selected.
|
||||
* `text()`, giving us the text contents of all the nodes we've selected.
|
||||
|
||||
All in all:
|
||||
|
||||
```
|
||||
//*[not(self::code)]/text()
|
||||
```
|
||||
|
||||
Finally, we use this XPath from `nokigiri`:
|
||||
|
||||
```Ruby {linenos=table}
|
||||
files = ARGV[0..-1]
|
||||
inline_cache, display_cache = {}, {}
|
||||
|
||||
files.each do |file|
|
||||
puts "Rendering file: #{file}"
|
||||
document = Nokogiri::HTML.parse(File.open(file))
|
||||
document.search('//*[not(self::code)]/text()').each do |t|
|
||||
t.replace(perform_katex_sub(inline_cache, display_cache, t.content))
|
||||
end
|
||||
File.write(file, document.to_html)
|
||||
end
|
||||
```
|
||||
|
||||
I named this script `convert.rb`; it's used from inside of the Nix expression
|
||||
and its builder, which we will cover below.
|
||||
|
||||
### Tying it All Together
|
||||
Finally, I wanted an end-to-end script to generate HTML pages and render the LaTeX in them.
|
||||
I used Nix for this, but the below script will largely be compatible with a non-Nix system.
|
||||
I came up with the following, commenting on Nix-specific commands:
|
||||
|
||||
```Bash {linenos=table}
|
||||
source $stdenv/setup # Nix-specific; set up paths.
|
||||
|
||||
# Build site with Hugo
|
||||
# The cp is Nix-specific; it copies the blog source into the current directory.
|
||||
cp -r $src/* .
|
||||
hugo --baseUrl="https://danilafe.com"
|
||||
|
||||
# Render math in HTML and XML files.
|
||||
# $converter is Nix-specific; you can just use convert.rb.
|
||||
find public/ -regex "public/.*\.html" | xargs ruby $converter
|
||||
|
||||
# Output result
|
||||
# $out is Nix-specific; you can replace it with your destination folder.
|
||||
mkdir $out
|
||||
cp -r public/* $out/
|
||||
```
|
||||
|
||||
This is it! Using the two scripts, `convert.rb` and `builder.sh`, I
|
||||
was able to generate my blog with the math rendered on the back-end.
|
||||
Please note, though, that I had to add the KaTeX CSS to my website's
|
||||
`<head>`.
|
||||
|
||||
### Caveats
|
||||
The main caveat of my approach is performance. For every piece of
|
||||
mathematics that I render, I invoke the `katex` command. This incurs
|
||||
the penalty of Node's startup time, every time, and makes my approach
|
||||
take a few dozen seconds to run on my relatively small site. The
|
||||
better approach would be to use a NodeJS script, rather than a Ruby one,
|
||||
to perform the conversion. KaTeX also provides an API, so such a NodeJS
|
||||
script can find the files, parse the HTML, and perform the substitutions.
|
||||
I did quite like using `nokigiri` here, though, and I hope that an equivalently
|
||||
pleasant solution exists in JavaScript.
|
||||
|
||||
Re-rendering the whole website is also pretty wasteful. I rarely change the
|
||||
mathematics on more than one page at a time, but every time I do so, I have
|
||||
to re-run the script, and therefore re-render every page. This makes sense
|
||||
for me, since I use Nix, and my builds are pretty much always performed
|
||||
from scratch. On the other hand, for others, this may not be the best solution.
|
||||
|
||||
### Conclusion
|
||||
With the removal of MathJax from my site, it is now completely JavaScript free,
|
||||
and contains virtually the same HTML that it did beforehand. This, I hope,
|
||||
makes it work better on devices where computational power is more limited.
|
||||
I also hope that it illustrates a general principle - it's very possible,
|
||||
and plausible, to render LaTeX on the back-end for a static site.
|
Loading…
Reference in New Issue
Block a user