Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

\mhchem conversion: MathML to OOML/MathML equation leads to additional width in equation #197

Open
frederik opened this issue Aug 29, 2022 · 4 comments

Comments

@frederik
Copy link

Hello,

I am currently trying to export \mhchem equations to DocX using Pandoc. Since \mhchem is not supported natively, I transformed the TeX to MathML and then tried to convert the MathML to OOML for Word.

Using https://johnmacfarlane.net/texmath.html I could see that even in the MathML to MathML conversion the zero-width mpadded element is dropped. This would explain the additional horizontal space in the screenshot.

comparison

Looking at the OOML that Word produces directly (I am attaching the DocX file) I can see that Word added a <m:zeroWid m:val="1"/> element whereas in the the <m:phant> element in the OOML produced by TeXMath does not have a zero-width thus pushing the 2 down correctly but introducing white space.

Source Equation:

\ce{ H2O }

Resulting MathML (using MathJax):

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mrow data-mjx-texclass="ORD">
    <mrow data-mjx-texclass="ORD">
      <mi data-mjx-auto-op="false" mathvariant="normal">H</mi>
    </mrow>
    <msub>
      <mrow data-mjx-texclass="ORD">
        <mrow data-mjx-texclass="ORD">
          <mpadded width="0">
            <mphantom>
              <mi>A</mi>
            </mphantom>
          </mpadded>
        </mrow>
      </mrow>
      <mrow data-mjx-texclass="ORD">
        <mrow data-mjx-texclass="ORD">
          <mpadded height="0">
            <mn>2</mn>
          </mpadded>
        </mrow>
      </mrow>
    </msub>
    <mrow data-mjx-texclass="ORD">
      <mi data-mjx-auto-op="false" mathvariant="normal">O</mi>
    </mrow>
  </mrow>
</math>

DocX file as produced by Word:
HA2O.docx

I think that using MathML directly could open up a whole other world of equations for researchers, since we would not depend on all packages being available through Pandoc directly. I'd be happy to help and to provide further examples - unfortunately, I do not understand the code base enough to provide a fix or analysis.

Kindly
Frederik

@jgm
Copy link
Owner

jgm commented Aug 29, 2022

Wow, this is quite elaborate MathML for something so simple! Why not

<math display="block" xmlns="http://www.w3.org/1998/Math/MathML">
  <mstyle mathvariant="normal">
    <msub>
      <mi>H</mi>
      <mn>2</mn>
    </msub>
    <mi>O</mi>
  </mstyle>
</math>

Our MathML reader does not yet implement mpadded, which is one problem.

The other is the treatment of mathvariant="normal". It should produce roman, not italics.
Not sure yet if it's a MathML reader or OOXML writer issue or both.

PS. Have you tried my my mhchem Lua filter? That may be a simpler approach.

@frederik
Copy link
Author

Yes, I had a look, but this would add support for one package, whereas if we could support generic MathML we'd reach more use cases (and would have 100% compatibility between what users see as an SVG in the editor and what is exported with their paper - at least in theory).

MathML is a supported format for a number of discipline specific editor. It might also allow us to take complex TeX that uses a lot of packages and condense it to a publication format that can be used in PD, EPUB, or websites.

Back then I followed an issue on mhchem and for some reason concluded that support would also only be partial. I will take another look.

The complex MathML is unfortunately the way that MathJax 3 produces it. But yours of course is a lot nicer.

How do you feel about adding support for mpadded? Is this something you see being added here? I will definitely take another look at the Lua filter in any case.

@jgm
Copy link
Owner

jgm commented Aug 30, 2022

mpadded support would probably require adding something new to the types for equations.
And then the trick would be figuring out equivalents in all the other formats. I'm really not sure what the TeX equivalent would be, for example. But suppose we did have a \padded command that did the same thing. You wouldn't WANT the MathML you give above to be converted to something like

H\phantom{\padded[0pt]{A}}_2 O

but rather to

H_2 O

Currently we handle mpadded by ignoring it and just processing what's inside. We could add a special case that checks to see if we have a width of 0 specified, in which case we could ignore the contents and insert a zero-width space, or nothing. That wouldn't be general support for mpadded, but it would handle this specific case.

@jgm
Copy link
Owner

jgm commented Aug 30, 2022

The font issue is related to #149. MathML has no way to specify "upright" or "roman" font; it just has an option for "no special font adjustment." Maybe we should always use roman style for mathvariant="normal". But that would violate the documentation's expectation that <mi mathvariant="normal"> is the same as plain <mi> (i.e. the default).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants