-
-
Notifications
You must be signed in to change notification settings - Fork 563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supporting math #83
Comments
Thanks! To the best of my recollection, I don't think anybody's asked for support, so not really a priority. As a couple of notes should anybody want to take a look:
|
I had a quick look at this but couldn't work out how to convert the Mammoth XML element to a DOM node (and back). |
There's also an issue for MathJax support of OMML as an input format, though I'm not sure how it would be represented in the HTML. |
There's a related request for equation support in |
@hubgit It would be pretty straightforward to walk the returned For MathJax it should be completely straightforward to implement an input jax by using omml2mathml and then piping that to the MathML input jax, but that might feel a bit indirect. I am no super familiar with MathJax's internals, it might be easy to port omml2mathml to address the internal API directly. As for Python, I guess the best option is a direct port. That shouldn't take more than a day's work. (DOMs in Python tend to be quite painful, but it's not like our usage is highly complex.) |
Any news on this? I may take a stab at it. |
@jmealo Please do, I haven't tried yet. |
For the record, it would be quite useful for our purposes (education) if Mammoth could convert equations to MathML or LaTeX. |
any update on this...? does it supoorts now |
It would help if you could implement an option that lets Mammoth mark the position of any elements it doesn't understand yet, such as equations. This would allow us to trace insertion points in the converted HTML and do the conversion externally. I've just completed that for OMML - converting it via a PowerShell script to MathML. All that I need now for a successful fully automated solution is to know where which equation sits within the HTML. Presently Mammoth is not capable of giving that clue. How about:
which could give me in the HTML |
Hi. I am currently working for a publisher to generate the HTML from docx having a lot of Math equations. I added the below code to the object xmlElementReaders in lib/docx/body-reader.js `"m:oMath": function (element) {
|
@lkkkeith You gave me inspiration, and I realized it according to your idea. Let me share my implementation plan. Three libraries need to be introduced first npm install --save mathjax omml2mathml xmldom Then introduce them in node_modules\mammoth\lib\docx\body-reader.js var omml2mathml = require('omml2mathml');
var xmldom = require('xmldom');
require('mathjax/es5/mml-svg') Then change the code above you "oMath": function (element) {
var om = transform(element)
function transform(data) {
var el = _transform(data)
el.setAttribute('xmlns:wpc', 'http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas')
el.setAttribute('xmlns:mo', 'http://schemas.microsoft.com/office/mac/office/2008/main')
el.setAttribute('xmlns:mc', 'http://schemas.openxmlformats.org/markup-compatibility/2006')
el.setAttribute('xmlns:mv', 'urn:schemas-microsoft-com:mac:vml')
el.setAttribute('xmlns:o', 'urn:schemas-microsoft-com:office:office')
el.setAttribute('xmlns:r', 'http://schemas.openxmlformats.org/officeDocument/2006/relationships')
el.setAttribute('xmlns:m', 'http://schemas.openxmlformats.org/officeDocument/2006/math')
el.setAttribute('xmlns:v', 'urn:schemas-microsoft-com:vml')
el.setAttribute('xmlns:wp14', 'http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing')
el.setAttribute('xmlns:wp', 'http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing')
el.setAttribute('xmlns:w10', 'urn:schemas-microsoft-com:office:word')
el.setAttribute('xmlns:w', 'http://schemas.openxmlformats.org/wordprocessingml/2006/main')
el.setAttribute('xmlns:w14', 'http://schemas.microsoft.com/office/word/2010/wordml')
el.setAttribute('xmlns:w15', 'http://schemas.microsoft.com/office/word/2012/wordml')
el.setAttribute('xmlns:wpg', 'http://schemas.microsoft.com/office/word/2010/wordprocessingGroup')
el.setAttribute('xmlns:wpi', 'http://schemas.microsoft.com/office/word/2010/wordprocessingInk')
el.setAttribute('xmlns:wne', 'http://schemas.microsoft.com/office/word/2006/wordml')
el.setAttribute('xmlns:wps', 'http://schemas.microsoft.com/office/word/2010/wordprocessingShape')
return el
function _transform(data) {
var name = data.name
if (!name) {
return document.createTextNode(data.value)
}
var tagName = name.replace(/\{.*\}/, 'm:')
var el = document.createElement(tagName)
var children = data.children
if (children) {
children.forEach(element => {
el.appendChild(_transform(element))
});
}
return el
}
}
var doc = new xmldom.DOMParser().parseFromString(om.outerHTML)
var math = omml2mathml(doc)
var abc = MathJax.mathml2svg(math.outerHTML)
abc = abc.outerHTML
var svg = abc.match(/\<svg.*\<\/svg\>/)[0]
var math2 = abc.match(/\<math.*\<\/math\>/)[0].replace(/\sdisplay=".+?"/, '').replace(/\</g, "«").replace(/\>/g, "»").replace(/"/g, "¨")
var style = ''
svg.replace(/style="(.+?)"\>/, function (match, $1) {
style = $1
})
svg = "data:image/svg+xml,"+window.encodeURIComponent(svg)
var img = `<img align="middle" class="Wirisformula" src="${svg}" data-mathml="${math2}" alt="1 half" role="math" style="${style}">`
//
// Then update the document to include the adjusted CSS for the
// content of the new equation.
//
MathJax.startup.document.clear();
MathJax.startup.document.updateDocument();
return elementResult(new documents.Text(img));
}, This method readXmlElement needs to be changed to the following function readXmlElement(element) {
if (element.type === "element") {
var handler = xmlElementReaders[element.name];
if (handler) {
return handler(element);
}else if(/math/.test(element.name)){
return xmlElementReaders.oMath(element);
} else if (!Object.prototype.hasOwnProperty.call(ignoreElements, element.name)) {
var message = warning("An unrecognised element was ignored: " + element.name);
return emptyResultWithMessages([message]);
}
}
return emptyResult();
} In the final return result, the compiled picture labels "& lt;" and "& gt;" need to be replaced with "<" and" >" mammoth.convertToHtml({ arrayBuffer: arrayBuffer }).then(function (resultObject) {
var html = resultObject.value.replace(/<(img[^]+)>/g, '<$1>')
console.log(html)
}) If the formula is inserted from WPS, the situation is different <script src="./UDOC.js"></script>
<script src="./FromWMF.js"></script>
<script src="./ToContext2D.js"></script> Finally, we need to change it mammoth.convertToHtml({ arrayBuffer: arrayBuffer }).then(function (resultObject) {
var html = resultObject.value.replace(/<(img[^]+)>/g, '<$1>')
var newHtml = html.replace(/<img[^\\>]*?"(data:image\/x-wmf;.*?)"[^\\>]*?\/>/g, function (match, $1) {
return transformWMF($1)
})
console.log(newHtml)
})
function transformWMF(src) {
var base64 = src.replace(/.*;base64,/, '')
var rawData = window.atob(base64)
const outputArray = new Uint8Array(rawData.length);
for (let i = rawData.length - 1; i >= 0; --i) {
outputArray[i] = rawData.charCodeAt(i);
}
var pNum = 0; // number of the page, that you want to render
var scale = 1; // the scale of the document
var wrt = new ToContext2D(pNum, scale);
FromWMF.Parse(outputArray, wrt);
var canvas = wrt.canvas
var { width, height } = canvas
var ctx = canvas.getContext('2d');
var { data } = ctx.getImageData(0, 0, width, height)
var len = data.length
var row_len = width * 4
var col_len = height
var arr = []
for (var i = 0; i < col_len; i++) {
var per_arr = data.slice(i * row_len, (i + 1) * row_len)
arr.push(per_arr)
}
var canvas2 = document.createElement('canvas');
canvas2.width = width
canvas2.height = height
var ctx2 = canvas2.getContext('2d');
var imageData = ctx2.createImageData(width, height)
var n = row_len * col_len
var arr2 = new Uint8ClampedArray(n)
var curr_row = 0;
var len = arr.length
for (var i = len - 1; i >= 0; i--) {
var curr_row = arr[i]
for (var j = 0; j < curr_row.length; j++) {
arr2[(len - i) * row_len + j] = curr_row[j]
}
}
var imageData = new ImageData(arr2, width, height)
ctx2.putImageData(imageData, 0, 0)
var dataurl = canvas2.toDataURL()
var img = new Image()
img.src = dataurl
img.width = width
img.height = height
return img.outerHTML
} @darobin @mwilliamson thank you very much |
@liyongleihf2006 By the way, do you have any issues parsing equations with alignment? The result is like this: I think the <m:aln/> is ignored in the library omml2mathml |
@mwilliamson Is there any mechanism for funding of a particular feature? If so, I think I could arrange for support for this feature. |
I've been giving this a try and have had partial success. The code already posted in this thread has been extremely helpful. I have a working solution for my own needs, but it would not be suitable for this library yet. In my case, I only want the conversion to MathML -- I did not need the conversion to an image, as described above. I've got some code that is working, but which unfortunately results in escaped MathML (ie, with But in case it is helpful, here are the changes I'm using: master...brockfanning:math-support And here is the application code that unescapes the MathML afterwards:
The reason that my MathML output is escaped is that I'm using the |
I've found the previous method occur some problem that can't show the subscript character for equation. When I check the code, I found that the dom element would make the tag to lowercase. For example, m:sSub->m:ssub. The omml2mathml can't match this type of word. By the way, I just use the svg rather than the img, it's easy to change text color. But remember, it should transform to safe html. 'm:oMathPara': function (element) {
var xmlDOM = document.implementation.createDocument(null, null);
var om = transform(element, xmlDOM);
function transform(data) {
var el = _transform(data);
el.setAttribute('xmlns:wpc', 'http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas');
el.setAttribute('xmlns:mo', 'http://schemas.microsoft.com/office/mac/office/2008/main');
el.setAttribute('xmlns:mc', 'http://schemas.openxmlformats.org/markup-compatibility/2006');
el.setAttribute('xmlns:mv', 'urn:schemas-microsoft-com:mac:vml');
el.setAttribute('xmlns:o', 'urn:schemas-microsoft-com:office:office');
el.setAttribute('xmlns:r', 'http://schemas.openxmlformats.org/officeDocument/2006/relationships');
el.setAttribute('xmlns:m', 'http://schemas.openxmlformats.org/officeDocument/2006/math');
el.setAttribute('xmlns:v', 'urn:schemas-microsoft-com:vml');
el.setAttribute('xmlns:wp14', 'http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing');
el.setAttribute('xmlns:wp', 'http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing');
el.setAttribute('xmlns:w10', 'urn:schemas-microsoft-com:office:word');
el.setAttribute('xmlns:w', 'http://schemas.openxmlformats.org/wordprocessingml/2006/main');
el.setAttribute('xmlns:w14', 'http://schemas.microsoft.com/office/word/2010/wordml');
el.setAttribute('xmlns:w15', 'http://schemas.microsoft.com/office/word/2012/wordml');
el.setAttribute('xmlns:wpg', 'http://schemas.microsoft.com/office/word/2010/wordprocessingGroup');
el.setAttribute('xmlns:wpi', 'http://schemas.microsoft.com/office/word/2010/wordprocessingInk');
el.setAttribute('xmlns:wne', 'http://schemas.microsoft.com/office/word/2006/wordml');
el.setAttribute('xmlns:wps', 'http://schemas.microsoft.com/office/word/2010/wordprocessingShape');
return el;
function _transform(data) {
var name = data.name;
if (!name) {
return xmlDOM.createTextNode(data.value);
}
var tagName = name.replace(/\{.*\}/, 'm:');
var el = xmlDOM.createElement(tagName);
var children = data.children;
if (children) {
children.forEach(element => {
el.appendChild(_transform(element));
});
}
return el;
}
} |
Hey can you explain me if same is available in python? |
I don't know if you're planning to support math ever, but just in case I thought I'd give you a heads up that I have a JS converter for the Office math markup to MathML, so it could just be reused in Mammoth: omml2mathml.
I don't have time to dig into the Mammoth code to integrate this (in case you're interested) but I'm happy to help if someone takes it on.
The text was updated successfully, but these errors were encountered: