Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MathML and Unicode invisible operators #2172

Closed
Omikhleia opened this issue Nov 18, 2024 · 1 comment · Fixed by #2177
Closed

MathML and Unicode invisible operators #2172

Omikhleia opened this issue Nov 18, 2024 · 1 comment · Fixed by #2177
Assignees
Labels
bug Software bug issue question Ask for advice or investigate solutions
Milestone

Comments

@Omikhleia
Copy link
Member

Unicode defines a few "invisible" operators:

  • U+2061 Function application (⁡)
  • U+2062 Invisible times (⁢)
  • U+2063 Invisible separator (⁣)
  • U+2064 Invisible plus (&InvisiblePlus;)

MathML Core doesn't mention anything special about these, apart from listing them in its appendices.

MathML4, however, has a few words to say in §3.1.1: "... they usually render invisibly ... but may influence visual spacing."
Note the ill-defined "usually" and "may" in that specification. After so many years, there's still an unaddressed blind spot.
Despite that, there are a few test cases in Joe Javawaski's Browser Test and the MathML3 Test Suite that use these operators, as well as other sources...

Consider $f(x)$, and $\cos \theta$. In MathML, this could be written as:

  <mi>f</mi>
  <mo>&ApplyFunction;</mo>
  <mrow><mo>(</mo><mi>x</mi><mo>)</mo></mrow>
  ...
  <mi>cos</mi>
  <mo>&ApplyFunction;</mo>
  <mi>&theta;</mi>

In the first case, the function application operator should not be not rendered, but in the second case, one expects it to be rendered as spacing, unless other provisions are made.

What do I mean by "other provisions"? Let's first check the invisible times operator, for $a b$ and $\cos \theta \cos \phi$:

    <mi>a</mi>
    <mo>&InvisibleTimes;</mo>
    <mi>b</mi>
    ...
    <mi>cos</mi>
    <mo>&ApplyFunction;</mo>
    <mi>&theta;</mi>
    <mo>&InvisibleTimes;</mo>
    <mi>cos</mi>
    <mo>&ApplyFunction;</mo>
    <mi>&phi;</mi>

In the first case, the invisible times operator should not be rendered (implicit multiplication), but in the second case, it should be rendered as spacing between the two cosine functions.

Do you start seeing the problem?

  • Trying various MathML renderers (native or not), I can't make sense of how they handle these operators, or their absence. All I can say at a glance is that interpretations are inconsistent...
  • The specification is so vague that people often tweak the MathML lspace and rspace attributes to get the desired effect...
  • The MathML specifications describes an algorithm for spacing, but it's not clear how these operators fit in, and more generally does not seem to explain what we see in the wild.
  • SILE, anyhow, doesn't implement the MathML spacing algorithm, and relies on TeX's spacing rules (based on atom types)
  • To make it worse, math fonts may or may not have glyphs for these operators, and even if they do, they may not be designed to be invisible (e.g. having some width): I haven't found a consistent specification for this matter either.

Earlier I mentioned "other provisions". It would seem (without looking at the code), for instance, that MathJax does some magic distinguishing between one-letter and multi-letter <mi>identifiers, adding spacing in the latter case. It's my guess based on observation, at best. It's as if the multi-letter <mi> identifiers are treated as "operators" with some spacing.
It's not totally insane, if it's what it does: it's how I implemented the cos, sin, etc. functions in SILE in #2167 for the TeX-like syntax (as mo with the operator atom type -- note this is also what TeX does with \mathop in its implementation of these functions).

All of this is nice and dandy, but it doesn't tell us what to do for MathML documents.

  • Invisible separator (a.k.a. invisible comma) can likely be completely ignored.
  • Invisible plus is a mystery to me, I've no idea what it's supposed to do and where...
  • Invisible times and invisible function application are the most problematic:
    • Sometimes they should be ignored?
    • Sometimes they should be rendered as spacing?
    • Or alter the previous element?
    • Explicit lspace and rspace on them may need to be honored, even though SILE currently ignores them on other operators (using, as stated above, it's atom type to determine spacing).

My head hurts 🐰

@Omikhleia Omikhleia added this to Math Nov 18, 2024
@github-project-automation github-project-automation bot moved this to To do in Math Nov 18, 2024
@Omikhleia Omikhleia added bug Software bug issue question Ask for advice or investigate solutions labels Nov 18, 2024
@Omikhleia
Copy link
Member Author

And I've just scratched the surface.
One could also want to mention the indirect use of U+200B, the zero-width space in this extract of the MathML Test Suite ("Torture Tests", complex1, simplified here for the sake of brevity and your own sanity):

  <mo>&nabla;</mo>
  <mtext>&#x200B;</mtext>
  <mo>&times;</mo>
  <mi>B</mi>

The intent of this dubious use of U+200B in an mtext element might have been to cancel some spacing introduced by the "nabla" as operator, instead of making it an identifier (?)

Those folks are really pushing the envelope, bordering on the absurd, but it's all in the name of testing a totally insane and complex standard, so it's all good, right?

@Omikhleia Omikhleia moved this from To do to In progress in Math Nov 23, 2024
@Omikhleia Omikhleia self-assigned this Nov 23, 2024
@alerque alerque added this to the v0.15.7 milestone Nov 23, 2024
@github-project-automation github-project-automation bot moved this from In progress to Done in Math Nov 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Software bug issue question Ask for advice or investigate solutions
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants