blog-static/content/blog/math_rendering_is_wrong.md
Danila Fedorin 3cb66a606d
All checks were successful
continuous-integration/drone/push Build is passing
Make MathJax post public
2020-03-24 16:41:11 -07:00

10 KiB
Raw Blame History

title date tags
Math Rendering is Wrong 2020-03-24T16:40:27-07:00
Website

Since I first started working on my website at age fourteen, the site has gone through many revisions, and hopefully changed for the better. This blog was originally dynamically served using a Python/Flask backend, having a custom login system and post "editor" (just an input box). One of the more strange things about my website, though, was how I displayed content.

It was clear to me, even at my young age, that writing raw HTML was suboptimal. Somehow (perhaps through GitHub) I heard about Markdown, and realized that a human-readable markup language was probably a much better way to go. What remained was, of course, rendering the content. The easiest way I found was to just stick a JavaScript script, calling out to marked, to run on page load and convert all the markup into pretty HTML.

This rendering would happen on every page load. Every time I navigated between pages on my site, for a second or two, I'd see the raw, unrendered Markdown that I had written, which would then disappear and be replaced with a proper view of the page's content. The rendering wasn't error-proof, either. If my connection was particularly slow (which it was, thanks, Comcast), or if I forgot to disable uMatrix, I would be left having to sift through the frequent occurences of #, _, *... Eventually I realized my mistake, and switched to rendering Markdown on the backend. Now, my content would appear to the user already formatted, and they wouldn't have to wait for a JavaScript program to finish to read what I had written. All was well.

Sometimes, I look back on the early iterations of my site, and smile at the silly mistakes I'd made. Yet I can't innocently make fun of my original Markdown rendering solution, because in a way, it lives on.

The State of Mathematics on the Web

When I search for "render math on website" on Google, the following two links are at the top of my search:

Indeed, these are the two most popular math rendering solutions (in my experience). Yet both of these solutiosn share something something in common with each other and with the early iterations of my website - they use client-side rendering for static content. In my opinion, this is absurd. Every time that you visit a website that uses MathJax or KaTeX, any mathematical notation arrives to your machine in the form of LaTex markup. For instance, \(e^{-\frac{x}{2}}\) looks like e^{-\frac{x}{2}}. The rendering software (MathJax or KaTeX) then takes this markup and converts it into HTML and CSS that your browser can display. Just like my old website, all of this happens every time the page is loaded. This isn't an uncommon thing, either: websites like Mathematics StackExchange and Chegg use MathJax, and many more can be found on this list. According to this page, Facebook Messenger, Khan Academy, Gitter and GitLab use KaTeX. Some of these websites don't even load their content with JavaScript enabled, much less render math.

A skeptic might say that it is not possible to render LaTeX to HTML and CSS ahead of time. This might even have been true in the past. A user on the "can we replace client-side MathJax with server-side MathJax" thread on MathOverflow Meta in 2015 points out that MathJax doesn't support server-side HTML output:

I found [the comment] from a MathJax developer confirming that HTML+CSS doesn't work yet with the server-side version . . .

The comment is even older, from 2014:

. . . [MathJax does not support] HTML-CSS at the moment . . .

It's over, go home everyone. We are asking for the impossible.

Or are we? Version 2.6 of MathJax has the following comment in its change log:

Improved CommonHTML output. The CommonHTML output now provides the same layout quality and MathML support as the HTML-CSS and SVG output. It is on average 40% faster than the other outputs and the markup it produces are identical on all browsers and thus can also be pre-generated on the server via MathJax-node.

Further, the HTML Support page from MathJax's docs states:

[CommonHTML] is MathJaxs primary output mode since MathJax version 2.6. Its major advantage is its quality, consistency, and the fact that its output is independent of the browser, operating system, and user environment

So not only is it explicitly possible to have a server-generated math output, but the algorithm that would be used for generating such output is already in use on the client-side! If you look hard enough, you may even find a few resources for using this algorithm server-side, but many of those suffer from another problem...

Images Won't Cut It

It's tempting to convert mathematics to an image, such as a PNG or an SVG file. This approach is taken on this blog and this Advanced Web Machinery article. Wolfram MathWorld also render their mathematics to images. However, in my opion, this is not the right approach. It is inconvenient for me as a user, and, I suspect, for those in need of assistive technologies. Here are the issues I see with rendering mathematics to images:

  • Images are impossible to use with copy/paste. I am unable to select a word, number, or symbol in a rendered image. I do this on occasion, and this is not a contrived issue.
  • Images are not nearly as responsive, and are difficult to style. Line breaking, fonts, and even colors are difficult to change when using images. They stick out like a sore thumb when used for inline math, and can look very strange (or disappear) if a user extension manipulates colors on the page.
  • Images are completely opaque to users in need of screen readers. While some readers support {{< sidenote "right" "mathml-note" "MathML," >}} MathML is an XML-based markup for mathematics, meant to serve as a low-level ouput target for other math processors. While it is supported in Firefox, it requires a Polyfill in Chrome, and brings us back to front-end JavaScript. {{< /sidenote >}} images are much harder to work with. The sites that I linked that use images do not have captions with their math, making it completely inaccessible.
  • Using images instead of a JavaScript-based renderer reminds me of the false dichotomy fallacy. Why can't we do what we do, but using the server?

Where Have the Resources Gone?

If you look up "mathjax setup" on Google, you are greeted with dozens of links telling you how to get the rendering working client-side. It's convenient: a single JavaScript <src> tag, and maybe a bit of code at the bottom of the page with some settings.

If you look up "mathjax render server-side", the resources are far more scarce. In fact, the first two image-based solutions I presented to you came up as results after such a search. One more website looks promising: Antoine Amarilli's blog post about server-side rendering. The page promises a self-hosted way of generating HTML using mathjax-node, and even provides a very compelling sample output. I decided to try out this approach, but to no avail. First of all, the MathJax infrastructure has changed:

As mathjax has reorganized their repositories, to make the following work, you will probably need to install manually mathjax-node-cli, as well as maybe installing mathjax-node and possibly mathjax-node-page. Again, I haven't tried it. Thanks again to Ted for pointing this out!

This was indeed the case. The bigger issue, though, was that the page2html program, which rendered all the mathematics in a single HTML page, was gone. I found tex2html and text2htmlcss, which could only render equations without the surrounding HTML. I also found mjpage, which replaced mathematical expressions in a page with their SVG forms. This actually looked quite good - the SVGs even had labels with their original LaTeX code. However, those labels were hard to read (likely especially so for people using screen readers), and the SVG images otherwise maintained most of the issues I described above. Additionally, {{< sidenote "right" "sluggish-note" "the page behaved significantly more sluggishly after the initial render than the JavaScript-based alternative." >}} This is purely anecdotal, and would require a more thorough analysis to generalize. {{< /sidenote >}}

In short, it's much harder to find resources for server-side LaTeX rendering. It is especially hard to find such resources that work with more than static pages. I could probably use a regular expression to extract math I need to render from HTML, call tex2htmlcss on it, and splice it back into the page. But how could a WordPress user render their math? What about someone writing their blog engine like I did in my youth? Somehow, those of us wanting to give our users a better experience are left fumbling for an alternative, more or less without outside help...

Conclusion

The majority of websites today use client-side, JavaScript-based rendering techniques for mathematics. The work that could be done by a server is outsorced to thousands of browsers, who have to run the same code to get identical results. Somehow, this is the best solution - the most accessible alternatives use images, which are a downgrade to the user experience. I wish for server-side rendering to become more common, and better documented.