blog-static/content/blog/math_rendering_is_wrong.md

172 lines
10 KiB
Markdown
Raw Permalink Normal View History

2020-03-15 18:43:28 -07:00
---
title: Math Rendering is Wrong
2020-03-24 16:41:11 -07:00
date: 2020-03-24T16:40:27-07:00
2020-03-15 18:43:28 -07:00
tags: ["Website"]
---
Since I first started working on my website at age fourteen, the site
has gone through many revisions, and hopefully changed for the better.
This blog was originally dynamically served using a Python/Flask backend,
having a custom login system and post "editor" (just an input box).
One of the more strange things about my website, though, was how I displayed
content.
It was clear to me, even at my young age, that writing raw HTML was
suboptimal. Somehow (perhaps through GitHub) I heard about Markdown,
and realized that a human-readable markup language was probably a much
better way to go. What remained was, of course, rendering the content.
The easiest way I found was to just stick a JavaScript script, calling
out to [marked](https://github.com/markedjs/marked), to run on page
load and convert all the markup into pretty HTML.
This rendering would happen on every page load. Every time I navigated
between pages on my site, for a second or two, I'd see the raw, unrendered
Markdown that I had written, which would then disappear and be replaced
with a proper view of the page's content. The rendering wasn't error-proof,
either. If my connection was particularly slow (which it was, thanks, Comcast),
or if I forgot to disable uMatrix, I would be left having to sift through
the frequent occurences of `#`, `_`, `*`... Eventually I realized my mistake,
and switched to rendering Markdown on the backend. Now, my content
would appear to the user already formatted, and they wouldn't have to
wait for a JavaScript program to finish to read what I had written.
All was well.
Sometimes, I look back on the early iterations of my site, and smile
at the silly mistakes I'd made. Yet I can't innocently make fun of
my original Markdown rendering solution, because in a way, it lives on.
### The State of Mathematics on the Web
When I search for "render math on website" on Google,
the following two links are at the top of my search:
* [MathJax | Beautiful math in all browsers](https://www.mathjax.org)
* [KaTeX - the fastests math typesetting library for the web](https://katex.org)
Indeed, these are the two most popular math rendering solutions
(in my experience). Yet both of these solutiosn share something something
in common with each other and with the early iterations of my website -
they use client-side rendering for static content. In my opinion, this is absurd.
Every time that you visit a website that uses MathJax or KaTeX, any mathematical
notation arrives to your machine in the form of LaTex markup. For instance,
\(e^{-\frac{x}{2}}\) looks like `e^{-\frac{x}{2}}`. The rendering
2020-03-15 18:43:28 -07:00
software (MathJax or KaTeX) then takes this markup and converts it into
HTML and CSS that your browser can display. Just like my old website,
all of this happens __every time the page is loaded__. This isn't an uncommon
thing, either: websites like
[Mathematics StackExchange](https://math.stackexchange.com) and
[Chegg](https://www.chegg.com) use MathJax, and many more can be found
on [this list](https://docs.mathjax.org/en/v2.7-latest/misc/mathjax-in-use.html).
According to [this page](https://katex.org/users.html), Facebook Messenger,
[Khan Academy](https://www.khanacademy.org/), [Gitter](https://gitter.im/)
and [GitLab](https://about.gitlab.com/) use KaTeX. Some of these websites
don't even load their content with JavaScript enabled, much less render
math.
A skeptic might say that it is not possible to render LaTeX to HTML and
CSS ahead of time. This might even have been true in the past. A user
on the
["can we replace client-side MathJax with server-side MathJax"](https://meta.mathoverflow.net/questions/2360/can-we-replace-client-side-mathjax-with-server-side-mathjax) thread on MathOverflow Meta in 2015 points out that MathJax doesn't support
server-side HTML output:
> I found [[the comment]](https://math.meta.stackexchange.com/questions/16809/a-mathjax-alternative-from-khan-academy#comment62132_16817) from a MathJax developer confirming that HTML+CSS doesn't work yet with the server-side version . . .
The comment is even older, from 2014:
> . . . [MathJax does not support] HTML-CSS at the moment . . .
It's over, go home everyone. We are asking for the impossible.
Or are we? Version 2.6 of MathJax has the following comment in its change log:
> _Improved CommonHTML output_. The CommonHTML output now provides the same layout quality and MathML support as the HTML-CSS and SVG output. It is on average 40% faster than the other outputs and the markup it produces are identical on all browsers and thus can also be pre-generated on the server via MathJax-node.
Further, the [HTML Support](http://docs.mathjax.org/en/latest/output/html.html)
page from MathJax's docs states:
> [CommonHTML] is MathJaxs primary output mode since MathJax version 2.6. Its major advantage is its quality, consistency, and the fact that its output is independent of the browser, operating system, and user environment
So not only is it explicitly possible to have a server-generated math output,
but the algorithm that would be used for generating such output is already
in use on the client-side! If you look hard enough, you may even find a few
resources for using this algorithm server-side, but many of those suffer from
another problem...
### Images Won't Cut It
It's tempting to convert mathematics to an image, such as a PNG or an SVG file.
This approach is taken on [this blog](https://blog.oniuo.com/post/math-jax-ssr-example/) and this [Advanced Web Machinery](https://advancedweb.hu/mathjax-processing-on-the-server-side/) article. [Wolfram MathWorld](http://mathworld.wolfram.com/Convergent.html) also render their mathematics to images. However, in my opion,
this is __not the right approach__. It is inconvenient for me as a user,
and, I suspect, for those in need of assistive technologies. Here are
the issues I see with rendering mathematics to images:
* Images are impossible to use with copy/paste. I am unable to select a word,
number, or symbol in a rendered image. I do this on occasion, and this is
not a contrived issue.
* Images are not nearly as responsive, and are difficult to style.
Line breaking, fonts, and even colors are difficult to change when using images.
They stick out like a sore thumb when used for inline math, and can
look very strange (or disappear) if a user extension manipulates colors on the
page.
* Images are completely opaque to users in need of screen readers. While
some readers support
{{< sidenote "right" "mathml-note" "MathML," >}}
MathML is an XML-based markup for mathematics, meant to serve as a low-level
ouput target for other math processors. While it is supported in Firefox,
it requires a Polyfill in Chrome, and brings us back to front-end JavaScript.
{{< /sidenote >}} images are much harder to work with. The sites
that I linked that use images do not have captions with their math,
making it completely inaccessible.
* Using images instead of a JavaScript-based renderer reminds me of
the [false dichotomy fallacy](https://www.logicallyfallacious.com/cgi-bin/uy/webpages.cgi?/logicalfallacies/False-Dilemma). Why can't we do what we do,
but using the server?
### Where Have the Resources Gone?
If you look up "mathjax setup" on Google, you are greeted with dozens
of links telling you how to get the rendering working client-side. It's
convenient: a single JavaScript `<src>` tag, and maybe a bit of code
at the bottom of the page with some settings.
If you look up "mathjax render server-side", the resources are far more
scarce. In fact, the first two image-based solutions I presented to you
came up as results after such a search. One more website looks promising:
[Antoine Amarilli's blog post about server-side rendering](https://a3nm.net/blog/selfhost_mathjax.html).
The page promises a self-hosted way of generating HTML using `mathjax-node`,
and even provides a very compelling sample output. I decided to try
out this approach, but to no avail. First of all, the MathJax infrastructure
has changed:
> As mathjax has reorganized their repositories, to make the following work, you will probably need to install manually mathjax-node-cli, as well as maybe installing mathjax-node and possibly mathjax-node-page. Again, I haven't tried it. Thanks again to Ted for pointing this out!
This was indeed the case. The bigger issue, though, was that the `page2html`
program, which rendered all the mathematics in a single HTML page,
was gone. I found `tex2html` and `text2htmlcss`, which could only
render equations without the surrounding HTML. I also found `mjpage`,
which replaced mathematical expressions in a page with their SVG forms.
This actually looked quite good - the SVGs even had labels with their
original LaTeX code. However, those labels were hard to read (likely
especially so for people using screen readers), and the SVG images
otherwise maintained most of the issues I described above. Additionally,
{{< sidenote "right" "sluggish-note" "the page behaved significantly more sluggishly after the initial render than the JavaScript-based alternative." >}}
This is purely anecdotal, and would require a more thorough analysis to
generalize.
{{< /sidenote >}}
In short, it's much harder to find resources for server-side LaTeX rendering.
It is especially hard to find such resources that work with more
than static pages. I could probably use a regular expression to extract
math I need to render from HTML, call `tex2htmlcss` on it, and splice it back
into the page. But how could a WordPress user render their math? What
about someone writing their blog engine like I did in my youth? Somehow,
those of us wanting to give our users a better experience are left
fumbling for an alternative, more or less without outside help...
### Conclusion
The majority of websites today use client-side, JavaScript-based rendering
techniques for mathematics. The work that could be done by a server
is outsorced to thousands of browsers, who have to run the same code
to get __identical results__. Somehow, this is the best solution -
the most accessible alternatives use images, which are a downgrade
to the user experience. I wish for server-side rendering to become
more common, and better documented.