Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert mathematical formulae to MathML #108

Open
skalee opened this issue Jun 26, 2020 · 6 comments
Open

Convert mathematical formulae to MathML #108

skalee opened this issue Jun 26, 2020 · 6 comments
Assignees

Comments

@skalee
Copy link
Contributor

skalee commented Jun 26, 2020

Some concepts may contain mathematical symbols and formulas in their designations, descriptions, or notes. Formulas can be expressed either in LaTeX math, AsciiMath, or MathML. It is also preferred that concepts follow AsciiDoc stemming syntax with stem, asciimath, and latexmath macros.

Available converters

There are some programs which come handy:

AsciiMath gem

A handy gem which converts AsciiMath to MathML. AsciiDoctor relies on it when processing stem macros (optional dependency). Does job pretty well, however does not convert LaTeX math strings. There is no corresponding gem for LaTeX math.

LaTeXML

A toolset for processing LaTeX documents. Most importantly, it contains latexmlmath program, which converts LaTeX math formulas to MathML. Sadly, this program fails to recognize some symbols, e.g. \backepsilon. Perhaps this can be fixed with proper configuration.

Example: latexmlmath '\sqrt{b^2-4ac}'

Pandoc

Pandoc is capable of converting LaTeX math to MathML, though it must be wrapped in a Markdown document. We can craft a minimalistic Markdown document and then extract MathML formula from generated HTML.

Example: echo '$$\sqrt{b^2-4ac}$$' | pandoc --mathml -f markdown -t html

MathJax

MathJax converts both AsciiMath and LaTeX math to MathML. It is designed to be run in browser primarily, but works in NodeJS too. The problem is that it is poorly documented, and API docs are non-existent. There are some usage examples in https://github.com/mathjax/MathJax-demos-node which present working solutions. Following two snippets use programs from that repository:

Example: node -r esm component/tex2mml \\sqrt{b^2-4ac} (LaTeX math -> MathML)
Example: node -r esm component/am2mml 'sqrt(b^2-4ac)' (AsciiMath -> MathML)

Performance considerations

Executing a program per each formula on site may hamper site generation time. LaTeXML, Pandoc and MathJax have been benchmarked with hyperfine:

hyperfine -m 100 'latexmlmath \\sqrt{b^2-4ac}'
Benchmark #1: latexmlmath \\sqrt{b^2-4ac}
  Time (mean ± σ):      1.504 s ±  0.022 s    [User: 1.383 s, System: 0.108 s]
  Range (min … max):    1.481 s …  1.597 s    100 runs
hyperfine -m 100 'echo \$\$\\sqrt{b^2-4ac}\$\$ | pandoc  --mathml -f markdown -t html'
Benchmark #1: echo \$\$\\sqrt{b^2-4ac}\$\$ | pandoc  --mathml -f markdown -t html
  Time (mean ± σ):      39.9 ms ±   1.7 ms    [User: 12.5 ms, System: 15.7 ms]
  Range (min … max):    37.2 ms …  53.6 ms    100 runs
hyperfine -m 100 'node -r esm component/tex2mml \\sqrt{b^2-4ac}'
Benchmark #1: node -r esm component/tex2mml \\sqrt{b^2-4ac}
  Time (mean ± σ):     646.5 ms ±  14.6 ms    [User: 626.3 ms, System: 90.1 ms]
  Range (min … max):   627.1 ms … 713.7 ms    100 runs

Integration considerations

We can call any of these programs from Ruby by creating a subshell. However, it will be very time-consuming for MathJax, and especially for LaTeXML.

  • AsciiMath is already a gem, and that's perfect.
  • LaTeXML probably can be turned into a gem with native extensions, but this requires some work.
  • There exists a gem Pandoc-Ruby which runs Pandoc in a subshell. Because Pandoc performance is quite good, this is an acceptable solution. Integrating Pandoc as native extension is probably difficult.
  • MathJax can be run in Node.js. Perhaps it can be run in some JavaScript engine like therubyracer or therubyrhino. Surely we can write some crawler in JavaScript which post-processes generated files. Either way we eliminate performance penalty from loading Node.js and dependencies every time.

Final considerations

We would love to integrate LaTeXML as we have our part in its development, however this seems to be the most difficult of all above. We need to turn it into a gem, and resolve issues with unrecognized symbols. Perhaps in a longer run… unless we have a gem already?

@ronaldtse
Copy link
Member

LaTeXML probably can be turned into a gem with native extensions, but this requires some work.

You can do this, and if it works, it will be useful in Metanorma as well.

Metanorma uses the LaTeXML installation separately via package managers. In the docker image it uses CPAN, in other situations the Snap package and the Chocolatey package.

@ronaldtse
Copy link
Member

@skalee for LaTeX math, ONLY LaTeXML is deterministically accurate and correct (i.e. it always arrives at the correct structure), even though it is slower than others. It is also necessary to use the same processor being used in Metanorma because the terminology site software is part of our standardization suite.

@skalee
Copy link
Contributor Author

skalee commented Jun 27, 2020

@skalee for LaTeX math, ONLY LaTeXML is deterministically accurate and correct (i.e. it always arrives at the correct structure), even though it is slower than others. It is also necessary to use the same processor being used in Metanorma because the terminology site software is part of our standardization suite.

Okay, these are strong arguments. I'll experiment with LaTeXML then.

Regarding bridging LaTeXML as native extension: Initially I thought that LaTeXML is written in C, but now I see it's in Perl. This makes everything difficult. Resources on the topic are scarce, if any. We're literally entering uncharted waters and I doubt we'll succeed, especially that I don't know Perl at all. Nevertheless, I'll be happy to try. (update: this is very old, but looks promising: ruby-perl)

However, we can still call LaTeXML from a subshell, and we can avoid repetitive calls by caching the results. This should improve performance greatly, especially if we use a disk case in order to persist it between builds. At the moment I'm pretty convinced we'll end up with subshell calls.

Having said that, I still don't know what to do with missing entities like \backepsilon. Following formula is taken directly from concept 259 "isomorphism".

latexmlmath '[A,B \textit{ isomorphic}] \Leftrightarrow [\exists f : A \rightarrow B, g : B \rightarrow A \backepsilon f \circ g = Id_A, g \circ f = Id_B]'

On my computer, it ends up with one error (Error:undefined:\backepsilon The token T_CS[\backepsilon] is not defined) and one warning (Warning:not_parsed:UNKNOWN.ATOM.CLOSE>METARELOP MathParser failed to match rule 'Anything'). Produced MathML is as follows (note merror element):

<?xml version="1.0" encoding="UTF-8"?>
<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="[A,B\textit{ isomorphic}]\Leftrightarrow[\exists f:A\rightarrow B,g:B%&#10;\rightarrow A\backepsilon f\circ g=Id_{A},g\circ f=Id_{B}]" display="block">
  <mrow>
    <mrow>
      <mo stretchy="false">[</mo>
      <mi>A</mi>
      <mo>,</mo>
      <mi>B</mi>
      <mtext mathvariant="italic"> isomorphic</mtext>
      <mo stretchy="false">]</mo>
    </mrow>
    <mo>⇔</mo>
    <mrow>
      <mo stretchy="false">[</mo>
      <mo>∃</mo>
      <mi>f</mi>
      <mo>:</mo>
      <mi>A</mi>
      <mo>→</mo>
      <mi>B</mi>
      <mo>,</mo>
      <mi>g</mi>
      <mo>:</mo>
      <mi>B</mi>
      <mo>→</mo>
      <mi>A</mi>
      <merror class="ltx_ERROR undefined undefined">
        <mtext>\backepsilon</mtext>
      </merror>
      <mi>f</mi>
      <mo>∘</mo>
      <mi>g</mi>
      <mo>=</mo>
      <mi>I</mi>
      <msub>
        <mi>d</mi>
        <mi>A</mi>
      </msub>
      <mo>,</mo>
      <mi>g</mi>
      <mo>∘</mo>
      <mi>f</mi>
      <mo>=</mo>
      <mi>I</mi>
      <msub>
        <mi>d</mi>
        <mi>B</mi>
      </msub>
      <mo stretchy="false">]</mo>
    </mrow>
  </mrow>
</math>

You can copy-paste it to MathJax demo.

@skalee
Copy link
Contributor Author

skalee commented Jul 6, 2020

@ronaldtse I still have troubles with LaTeXML. Does anyone know how to fix error produced by following command (Error:undefined:\backepsilon)?

latexmlmath '[A,B \textit{ isomorphic}] \Leftrightarrow [\exists f : A \rightarrow B, g : B \rightarrow A \backepsilon f \circ g = Id_A, g \circ f = Id_B]'

@ronaldtse
Copy link
Member

@skalee Please check usage of latexmlmath in the metanorma gem. Backepsilon is recognized there.

@ronaldtse
Copy link
Member

We now have the plurimath gem that can do all of the above conversions. Thanks @suleman-uzair!

@ronaldtse ronaldtse moved this from 🆕 New to 📋 Backlog in Geolexica Jul 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 📋 Backlog
Development

No branches or pull requests

3 participants