Add math rendering draft.
This commit is contained in:
parent
6a2fec8ef4
commit
cb65e89e53
280
content/blog/backend_math_rendering.md
Normal file
280
content/blog/backend_math_rendering.md
Normal file
|
@ -0,0 +1,280 @@
|
||||||
|
---
|
||||||
|
title: Rendering Mathematics On The Back End
|
||||||
|
date: 2020-07-15T15:27:19-07:00
|
||||||
|
draft: true
|
||||||
|
tags: ["Website", "Nix", "Ruby", "KaTeX", "Hugo"]
|
||||||
|
---
|
||||||
|
|
||||||
|
Due to something of a streak of bad luck when it came to computers, I spent a
|
||||||
|
significant amount of time using a Linux-based Chromebook, and then a
|
||||||
|
Pinebook Pro. It was, in some way, enlightening. The things that I used to take
|
||||||
|
for granted with a 'powerful' machine now became a rare luxury: StackOverflow,
|
||||||
|
and other relatively static websites, took upwards of ten seconds to finish
|
||||||
|
loading. On Slack, each of my keypresses could take longer than 500ms to
|
||||||
|
appear on the screen, and sometimes, it would take several seconds. Some
|
||||||
|
websites would present me with a white screen, and remain that way for much
|
||||||
|
longer than I had time to wait. It was awful.
|
||||||
|
|
||||||
|
At one point, I installed uMatrix, and made it the default policy to block
|
||||||
|
all JavaScript. For the most part, this worked well. Of course, I had to
|
||||||
|
enable JavaScript for applications that needed to be interactive, like
|
||||||
|
Slack, and Discord. But for the most part, I was able to browse the majority
|
||||||
|
of the websites I normally browse. This went on until I started working
|
||||||
|
on the [compiler series]({{< relref "00_compiler_intro.md" >}}) again,
|
||||||
|
and discovered that the LaTeX math on my page, which was required
|
||||||
|
for displaying things like inference rules, didn't work without
|
||||||
|
JavaScript. I was left with two options:
|
||||||
|
|
||||||
|
* Allow JavaScript, and continue using MathJax to render my math.
|
||||||
|
* Make it so that the mathematics is rendered on the back end.
|
||||||
|
|
||||||
|
I've [previously written about math rendering]({{< relref "math_rendering_is_wrong.md" >}}),
|
||||||
|
and made the observation that MathJax's output for LaTeX is __identical__
|
||||||
|
on every computer. From the MathJax 2.6 change log:
|
||||||
|
|
||||||
|
> _Improved CommonHTML output_. The CommonHTML output now provides the same layout quality and MathML support as the HTML-CSS and SVG output. It is on average 40% faster than the other outputs and the markup it produces are identical on all browsers and thus can also be pre-generated on the server via MathJax-node.
|
||||||
|
|
||||||
|
It seems absurd, then, to offload this kind of work into the users, to
|
||||||
|
be done over and over again. As should be clear from the title of
|
||||||
|
this post, this made me settle for the second option: it was
|
||||||
|
__obviously within reach__, especially for a statically-generated website
|
||||||
|
like mine, to render math on the backend.
|
||||||
|
|
||||||
|
I settled on the following architecture:
|
||||||
|
|
||||||
|
* As before I would generate my pages using Hugo.
|
||||||
|
* I would use the KaTeX NPM package to rendering math.
|
||||||
|
* To build the website no matter what computer I was on, I would use Nix.
|
||||||
|
|
||||||
|
It so happens that Nix isn't really required for using my approach in general.
|
||||||
|
I will give my setup here, but feel free to skip ahead.
|
||||||
|
|
||||||
|
### Setting Up A Nix Build
|
||||||
|
My `default.nix` file looks like this:
|
||||||
|
|
||||||
|
```Nix {linenos=table}
|
||||||
|
{ stdenv, hugo, fetchgit, pkgs, nodejs, ruby }:
|
||||||
|
|
||||||
|
let
|
||||||
|
url = "https://dev.danilafe.com/Web-Projects/blog-static.git";
|
||||||
|
rev = "<commit>";
|
||||||
|
sha256 = "<hash>";
|
||||||
|
requiredPackages = import ./required-packages.nix {
|
||||||
|
inherit pkgs nodejs;
|
||||||
|
};
|
||||||
|
in
|
||||||
|
stdenv.mkDerivation {
|
||||||
|
name = "blog-static";
|
||||||
|
version = rev;
|
||||||
|
src = fetchgit {
|
||||||
|
inherit url rev sha256;
|
||||||
|
};
|
||||||
|
builder = ./builder.sh;
|
||||||
|
converter = ./convert.rb;
|
||||||
|
buildInputs = [
|
||||||
|
hugo
|
||||||
|
requiredPackages.katex
|
||||||
|
(ruby.withPackages (ps: [ ps.nokogiri ]))
|
||||||
|
];
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
I'm using `node2nix` to generate the `required-packages.nix` file, which allows me,
|
||||||
|
even from a sandboxed Nix build, to download and install `npm` packages. This is needed
|
||||||
|
so that I have access to the `katex` binary at build time. I fed the following JSON file
|
||||||
|
to `node2nix`:
|
||||||
|
|
||||||
|
```JSON {linenos=table}
|
||||||
|
[
|
||||||
|
"katex"
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
The Ruby script I wrote for this (more on that soon) required the `nokigiri` gem, which
|
||||||
|
I used for traversing the HTML generated for my site. Hugo was obviously required to
|
||||||
|
generate the HTML.
|
||||||
|
|
||||||
|
### Converting LaTeX To HTML
|
||||||
|
After my first post complaining about the state of mathematics on the web, I received
|
||||||
|
the following email (which the author allowed me to share):
|
||||||
|
|
||||||
|
> Sorry for having a random stranger email you, but in your blog post
|
||||||
|
[(link)](https://danilafe.com/blog/math_rendering_is_wrong) you seem to focus on MathJax's
|
||||||
|
difficulty in rendering things server-side, while quietly ignoring that KaTeX's front
|
||||||
|
page advertises server-side rendering. Their documentation [(link)](https://katex.org/docs/options.html)
|
||||||
|
even shows (at least as of the time this email was sent) that it renders both HTML
|
||||||
|
(to be arranged nicely with their CSS) for visuals and MathML for accessibility.
|
||||||
|
|
||||||
|
This is a great point, and KaTeX is indeed usable for server-side rendering. But I've
|
||||||
|
seen few people who do actually use it. Unfortunately, as I pointed out in my previous post on the subject,
|
||||||
|
few tools remain that provide the software that actually takes your HTML page and substitutes
|
||||||
|
LaTeX for math.
|
||||||
|
|
||||||
|
> [In MathJax,] The bigger issue, though, was that the `page2html`
|
||||||
|
program, which rendered all the mathematics in a single HTML page,
|
||||||
|
was gone. I found `tex2html` and `text2htmlcss`, which could only
|
||||||
|
render equations without the surrounding HTML. I also found `mjpage`,
|
||||||
|
which replaced mathematical expressions in a page with their SVG forms.
|
||||||
|
|
||||||
|
This is still the case, in both MathJax and KaTeX. The ability
|
||||||
|
to render math in one step is the main selling point of front-end LaTeX renderers:
|
||||||
|
all you have to do is drop in a file from a CDN, and voila, you have your
|
||||||
|
math. There are no such easy answers for back-end rendering.
|
||||||
|
|
||||||
|
So what _do_ I do? Well, there are two types on my website: inline math and display math.
|
||||||
|
On the command line ([here are the docs](https://katex.org/docs/cli.html)),
|
||||||
|
the distinction is made using the `--display-mode` argument. So, the general algorithm
|
||||||
|
is to replace the code inside the `$$...$$` with their display-rendered version,
|
||||||
|
and the code inside the `\(...\)` with the inline-rendered version. I came up with
|
||||||
|
the following Ruby function:
|
||||||
|
|
||||||
|
```Ruby {linenos=table}
|
||||||
|
def render_cached(cache, command, string, render_comment = nil)
|
||||||
|
cache.fetch(string) do |new|
|
||||||
|
puts " Rendering #{render_comment || new}"
|
||||||
|
cache[string] = Open3.popen3(command) do |i, o, e, t|
|
||||||
|
i.write new
|
||||||
|
i.close
|
||||||
|
o.read.force_encoding(Encoding::UTF_8).strip
|
||||||
|
end
|
||||||
|
end
|
||||||
|
end
|
||||||
|
```
|
||||||
|
|
||||||
|
Here, the `cache` argument is used to prevent re-running the `katex` command
|
||||||
|
on an equation that was already rendered before (the output is the same, after all).
|
||||||
|
The `command` is the specific shell command that we want to invoke; this would
|
||||||
|
be either `katex` or `katex -d`. The `string` is the math equation to render,
|
||||||
|
and the `render_comment` is the string to print to the console instead of the equation
|
||||||
|
(so that long, display math equations are not printed out to standard out).
|
||||||
|
|
||||||
|
Then, given a substring of the HTML file, we use regular expressions
|
||||||
|
to find the `\(...\)` and `$$...$$`s, and use the `render_cached` method
|
||||||
|
on the LaTeX code inside.
|
||||||
|
|
||||||
|
```Ruby {linenos=table}
|
||||||
|
def perform_katex_sub(inline_cache, display_cache, content)
|
||||||
|
rendered = content.gsub /\\\(((?:[^\\]|\\[^\)])*)\\\)/ do |match|
|
||||||
|
render_cached(inline_cache, "katex", $~[1])
|
||||||
|
end
|
||||||
|
rendered = rendered.gsub /\$\$((?:[^\$]|$[^\$])*)\$\$/ do |match|
|
||||||
|
render_cached(display_cache, "katex -d", $~[1], "display")
|
||||||
|
end
|
||||||
|
return rendered
|
||||||
|
end
|
||||||
|
```
|
||||||
|
|
||||||
|
There's a bit of a trick to the final layer of this script. We want to be
|
||||||
|
really careful about where we replace LaTeX, and where we don't. In
|
||||||
|
particular, we _don't_ want to go into the `code` tags. Otherwise,
|
||||||
|
it wouldn't be able to talk about LaTeX code! Thus, we can't just
|
||||||
|
search-and-replace over the entire HTML document; we need to be context
|
||||||
|
aware. This is where `nokigiri` comes in. We parse the HTML, and iterate
|
||||||
|
over all of the 'text' nodes, calling `perform_katex_sub` on all
|
||||||
|
of those that _aren't_ inside code tags.
|
||||||
|
|
||||||
|
Fortunately, this is pretty easy to specify thanks to something called XPath.
|
||||||
|
This was my first time encountering it, but it seems extremely useful: it's
|
||||||
|
a sort of language for selecting XML nodes. First, you provide an 'axis',
|
||||||
|
which is used to specify the positions of the nodes you want to look at
|
||||||
|
relative to the root node. The axis `/` looks at the immediate children
|
||||||
|
(this would be the `html` tag in a properly formatted document, I would imagine).
|
||||||
|
The axis `//` looks at all the transitive children. That is, it will look at the
|
||||||
|
children of the root, then its children, and so on. There's also the `self` axis,
|
||||||
|
which looks at the node itself.
|
||||||
|
|
||||||
|
After you provide an axis, you need to specify the type of node that you want to
|
||||||
|
select. We can write `code`, for instance, to pick only the `<code>....</code>` tags
|
||||||
|
from the axis we've chosen. We can also use `*` to select any node, and we can
|
||||||
|
use `text()` to select text nodes, such as the `Hello` inside of `<b>Hello</b>`.
|
||||||
|
|
||||||
|
We can also apply some more conditions to the nodes we pick using `[]`.
|
||||||
|
For us, the relevant feature here is `not(...)`, which allows us to
|
||||||
|
select nodes that do __not__ match a particular condition. This is all
|
||||||
|
we need to know.
|
||||||
|
|
||||||
|
We write:
|
||||||
|
|
||||||
|
* `//`, starting to search for nodes everywhere, not just the root of the document.
|
||||||
|
* `*`, to match _any_ node. We want to replace math inside of `div`s, `span`s, `nav`s,
|
||||||
|
all of the `h`s, and so on.
|
||||||
|
* `[not(self::code)]` cutting out all the `code` tags.
|
||||||
|
* `/`, now selecting the nodes that are immediate descendants of the nodes we've selected.
|
||||||
|
* `text()`, giving us the text contents of all the nodes we've selected.
|
||||||
|
|
||||||
|
All in all:
|
||||||
|
|
||||||
|
```
|
||||||
|
//*[not(self::code)]/text()
|
||||||
|
```
|
||||||
|
|
||||||
|
Finally, we use this XPath from `nokigiri`:
|
||||||
|
|
||||||
|
```Ruby {linenos=table}
|
||||||
|
files = ARGV[0..-1]
|
||||||
|
inline_cache, display_cache = {}, {}
|
||||||
|
|
||||||
|
files.each do |file|
|
||||||
|
puts "Rendering file: #{file}"
|
||||||
|
document = Nokogiri::HTML.parse(File.open(file))
|
||||||
|
document.search('//*[not(self::code)]/text()').each do |t|
|
||||||
|
t.replace(perform_katex_sub(inline_cache, display_cache, t.content))
|
||||||
|
end
|
||||||
|
File.write(file, document.to_html)
|
||||||
|
end
|
||||||
|
```
|
||||||
|
|
||||||
|
I named this script `convert.rb`; it's used from inside of the Nix expression
|
||||||
|
and its builder, which we will cover below.
|
||||||
|
|
||||||
|
### Tying it All Together
|
||||||
|
Finally, I wanted an end-to-end script to generate HTML pages and render the LaTeX in them.
|
||||||
|
I used Nix for this, but the below script will largely be compatible with a non-Nix system.
|
||||||
|
I came up with the following, commenting on Nix-specific commands:
|
||||||
|
|
||||||
|
```Bash {linenos=table}
|
||||||
|
source $stdenv/setup # Nix-specific; set up paths.
|
||||||
|
|
||||||
|
# Build site with Hugo
|
||||||
|
# The cp is Nix-specific; it copies the blog source into the current directory.
|
||||||
|
cp -r $src/* .
|
||||||
|
hugo --baseUrl="https://danilafe.com"
|
||||||
|
|
||||||
|
# Render math in HTML and XML files.
|
||||||
|
# $converter is Nix-specific; you can just use convert.rb.
|
||||||
|
find public/ -regex "public/.*\.html" | xargs ruby $converter
|
||||||
|
|
||||||
|
# Output result
|
||||||
|
# $out is Nix-specific; you can replace it with your destination folder.
|
||||||
|
mkdir $out
|
||||||
|
cp -r public/* $out/
|
||||||
|
```
|
||||||
|
|
||||||
|
This is it! Using the two scripts, `convert.rb` and `builder.sh`, I
|
||||||
|
was able to generate my blog with the math rendered on the back-end.
|
||||||
|
Please note, though, that I had to add the KaTeX CSS to my website's
|
||||||
|
`<head>`.
|
||||||
|
|
||||||
|
### Caveats
|
||||||
|
The main caveat of my approach is performance. For every piece of
|
||||||
|
mathematics that I render, I invoke the `katex` command. This incurs
|
||||||
|
the penalty of Node's startup time, every time, and makes my approach
|
||||||
|
take a few dozen seconds to run on my relatively small site. The
|
||||||
|
better approach would be to use a NodeJS script, rather than a Ruby one,
|
||||||
|
to perform the conversion. KaTeX also provides an API, so such a NodeJS
|
||||||
|
script can find the files, parse the HTML, and perform the substitutions.
|
||||||
|
I did quite like using `nokigiri` here, though, and I hope that an equivalently
|
||||||
|
pleasant solution exists in JavaScript.
|
||||||
|
|
||||||
|
Re-rendering the whole website is also pretty wasteful. I rarely change the
|
||||||
|
mathematics on more than one page at a time, but every time I do so, I have
|
||||||
|
to re-run the script, and therefore re-render every page. This makes sense
|
||||||
|
for me, since I use Nix, and my builds are pretty much always performed
|
||||||
|
from scratch. On the other hand, for others, this may not be the best solution.
|
||||||
|
|
||||||
|
### Conclusion
|
||||||
|
With the removal of MathJax from my site, it is now completely JavaScript free,
|
||||||
|
and contains virtually the same HTML that it did beforehand. This, I hope,
|
||||||
|
makes it work better on devices where computational power is more limited.
|
||||||
|
I also hope that it illustrates a general principle - it's very possible,
|
||||||
|
and plausible, to render LaTeX on the back-end for a static site.
|
Loading…
Reference in New Issue
Block a user