Skip to content

Commit

Permalink
Merge branch 'master' into image-titles
Browse files Browse the repository at this point in the history
  • Loading branch information
tripodsan authored Sep 5, 2023
2 parents caf2307 + 318145a commit 2356fe3
Show file tree
Hide file tree
Showing 18 changed files with 401 additions and 182 deletions.
1 change: 1 addition & 0 deletions .eslintrc.json
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@
"exports": true,
"module": true,
"require": false,
"TextDecoder": false,
"Uint8Array": false
}
}
4 changes: 3 additions & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ jobs:

strategy:
matrix:
node-version: ["6", "8", "10", "12", "14", "16"]
node-version: ["12", "14", "16"]

steps:

Expand All @@ -22,3 +22,5 @@ jobs:
- run: npm install

- run: npm test

- run: npm run check-typescript
16 changes: 15 additions & 1 deletion NEWS
Original file line number Diff line number Diff line change
@@ -1,7 +1,21 @@
# 1.5.2
# 1.6.0

* Add transformDocument to the TypeScript declarations.

* Support merged paragraphs when revisions are tracked.

* Use xmldom instead of sax to parse XML documents. This should remove the need
to polyfill stream in the browser.

* Adjust the internal implementation to remove the use of Buffer on the critical
path, and provide APIs to read images and documents with embedded style maps
without using Buffer. This should remove the need to polyfill Buffer in the
browser. Since TextDecoder is now used, the minimum version of node.js is now
v12.

* Remove the use of the util module. This should remove the need to polyfill
util in the browser.

# 1.5.1

* Fix: npm 7 changed the behaviour of prepublish, causing the browser build not
Expand Down
27 changes: 17 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,17 +119,12 @@ For instance:

### Library

In node.js, mammoth can be required in the usual way:
In node.js and the browser, mammoth can be required in the usual way:

```javascript
var mammoth = require("mammoth");
```

This also works in the browser if node.js core modules
such as `Buffer` and `Stream`, are polyfilled.
Some bundlers, such as Webpack before version 5, will automatically polyfill these modules,
while others, such as Webpack from version 5, require the polyfills to be explicitly configured.

Alternatively, you may use the standalone JavaScript file `mammoth.browser.js`,
which includes both mammoth and its dependencies.
This uses any loaded module system.
Expand Down Expand Up @@ -449,6 +444,7 @@ it will use the embedded style map.
* `styleMap`: the style map to embed.

* Returns a promise.
Call `toArrayBuffer()` on the value inside the promise to get an `ArrayBuffer` representing the new document.
Call `toBuffer()` on the value inside the promise to get a `Buffer` representing the new document.

For instance:
Expand Down Expand Up @@ -479,11 +475,22 @@ This creates an `<img>` element for each image in the original docx.
This argument is the image element being converted,
and has the following properties:

* `read([encoding])`: read the image file with the specified encoding.
If no encoding is specified, a `Buffer` is returned.

* `contentType`: the content type of the image, such as `image/png`.

* `readAsArrayBuffer()`: read the image file as an `ArrayBuffer`.
Returns a promise of an `ArrayBuffer`.

* `readAsBuffer()`: read the image file as a `Buffer`.
Returns a promise of a `Buffer`.
This is not supported in browsers unless a `Buffer` polyfill has been used.

* `readAsBase64String()`: read the image file as a base64-encoded string.
Returns a promise of a `string`.

* `read([encoding])` (deprecated): read the image file with the specified encoding.
If an encoding is specified, a promise of a `string` is returned.
If no encoding is specified, a promise of a `Buffer` is returned.

`func` should return an object (or a promise of an object) of attributes for the `<img>` element.
At a minimum, this should include the `src` attribute.
If any alt text is found for the image,
Expand All @@ -493,7 +500,7 @@ For instance, the following replicates the default image conversion:

```javascript
mammoth.images.imgElement(function(image) {
return image.read("base64").then(function(imageBuffer) {
return image.readAsBase64String().then(function(imageBuffer) {
return {
src: "data:" + image.contentType + ";base64," + imageBuffer
};
Expand Down
37 changes: 29 additions & 8 deletions lib/documents.js
Original file line number Diff line number Diff line change
Expand Up @@ -55,12 +55,12 @@ function Run(children, properties) {
children: children,
styleId: properties.styleId || null,
styleName: properties.styleName || null,
isBold: properties.isBold,
isUnderline: properties.isUnderline,
isItalic: properties.isItalic,
isStrikethrough: properties.isStrikethrough,
isAllCaps: properties.isAllCaps,
isSmallCaps: properties.isSmallCaps,
isBold: !!properties.isBold,
isUnderline: !!properties.isUnderline,
isItalic: !!properties.isItalic,
isStrikethrough: !!properties.isStrikethrough,
isAllCaps: !!properties.isAllCaps,
isSmallCaps: !!properties.isSmallCaps,
verticalAlignment: properties.verticalAlignment || verticalAlignment.baseline,
font: properties.font || null,
fontSize: properties.fontSize || null
Expand Down Expand Up @@ -151,7 +151,28 @@ function noteKey(noteType, id) {
function Image(options) {
return {
type: types.image,
read: options.readImage,
// `read` is retained for backwards compatibility, but other read
// methods should be preferred.
read: function(encoding) {
if (encoding) {
return options.readImage(encoding);
} else {
return options.readImage().then(function(arrayBuffer) {
return Buffer.from(arrayBuffer);
});
}
},
readAsArrayBuffer: function() {
return options.readImage();
},
readAsBase64String: function() {
return options.readImage("base64");
},
readAsBuffer: function() {
return options.readImage().then(function(arrayBuffer) {
return Buffer.from(arrayBuffer);
});
},
title: options.title,
altText: options.altText,
contentType: options.contentType
Expand Down Expand Up @@ -204,7 +225,7 @@ function BookmarkStart(options) {
exports.document = exports.Document = Document;
exports.paragraph = exports.Paragraph = Paragraph;
exports.run = exports.Run = Run;
exports.Text = Text;
exports.text = exports.Text = Text;
exports.tab = exports.Tab = Tab;
exports.Hyperlink = Hyperlink;
exports.noteReference = exports.NoteReference = NoteReference;
Expand Down
105 changes: 61 additions & 44 deletions lib/docx/body-reader.js
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,12 @@ function createBodyReader(options) {
function BodyReader(options) {
var complexFieldStack = [];
var currentInstrText = [];

// When a paragraph is marked as deleted, its contents should be combined
// with the following paragraph. See 17.13.5.15 del (Deleted Paragraph) of
// ECMA-376 4th edition Part 1.
var deletedParagraphContents = [];

var relationships = options.relationships;
var contentTypes = options.contentTypes;
var docxFile = options.docxFile;
Expand All @@ -48,6 +54,19 @@ function BodyReader(options) {
return emptyResult();
}

function readParagraphProperties(element) {
return readParagraphStyle(element).map(function(style) {
return {
type: "paragraphProperties",
styleId: style.styleId,
styleName: style.name,
alignment: element.firstOrEmpty("w:jc").attributes["w:val"],
numbering: readNumberingProperties(style.styleId, element.firstOrEmpty("w:numPr"), numbering),
indent: readParagraphIndent(element.firstOrEmpty("w:ind"))
};
});
}

function readParagraphIndent(element) {
return {
start: element.attributes["w:start"] || element.attributes["w:left"],
Expand Down Expand Up @@ -213,43 +232,46 @@ function BodyReader(options) {

var xmlElementReaders = {
"w:p": function(element) {
return readXmlElements(element.children)
.map(function(children) {
var properties = _.find(children, isParagraphProperties);
return new documents.Paragraph(
children.filter(negate(isParagraphProperties)),
properties
);
})
.insertExtra();
},
"w:pPr": function(element) {
return readParagraphStyle(element).map(function(style) {
return {
type: "paragraphProperties",
styleId: style.styleId,
styleName: style.name,
alignment: element.firstOrEmpty("w:jc").attributes["w:val"],
numbering: readNumberingProperties(style.styleId, element.firstOrEmpty("w:numPr"), numbering),
indent: readParagraphIndent(element.firstOrEmpty("w:ind"))
};
});
var paragraphPropertiesElement = element.firstOrEmpty("w:pPr");

var isDeleted = !!paragraphPropertiesElement
.firstOrEmpty("w:rPr")
.first("w:del");

if (isDeleted) {
element.children.forEach(function(child) {
deletedParagraphContents.push(child);
});
return emptyResult();
} else {
var childrenXml = element.children;
if (deletedParagraphContents.length > 0) {
childrenXml = deletedParagraphContents.concat(childrenXml);
deletedParagraphContents = [];
}
return ReadResult.map(
readParagraphProperties(paragraphPropertiesElement),
readXmlElements(childrenXml),
function(properties, children) {
return new documents.Paragraph(children, properties);
}
).insertExtra();
}
},
"w:r": function(element) {
return readXmlElements(element.children)
.map(function(children) {
var properties = _.find(children, isRunProperties);
children = children.filter(negate(isRunProperties));

return ReadResult.map(
readRunProperties(element.firstOrEmpty("w:rPr")),
readXmlElements(element.children),
function(properties, children) {
var hyperlinkOptions = currentHyperlinkOptions();
if (hyperlinkOptions !== null) {
children = [new documents.Hyperlink(children, hyperlinkOptions)];
}

return new documents.Run(children, properties);
});
}
);
},
"w:rPr": readRunProperties,
"w:fldChar": readFldChar,
"w:instrText": readInstrText,
"w:t": function(element) {
Expand Down Expand Up @@ -569,27 +591,14 @@ var ignoreElements = {
"w:del": true,
"w:footnoteRef": true,
"w:endnoteRef": true,
"w:pPr": true,
"w:rPr": true,
"w:tblPr": true,
"w:tblGrid": true,
"w:trPr": true,
"w:tcPr": true
};

function isParagraphProperties(element) {
return element.type === "paragraphProperties";
}

function isRunProperties(element) {
return element.type === "runProperties";
}

function negate(predicate) {
return function(value) {
return !predicate(value);
};
}


function emptyResultWithMessages(messages) {
return new ReadResult(null, null, messages);
}
Expand All @@ -608,7 +617,7 @@ function elementResultWithMessages(element, messages) {

function ReadResult(element, extra, messages) {
this.value = element || [];
this.extra = extra;
this.extra = extra || [];
this._result = new Result({
element: this.value,
extra: extra
Expand Down Expand Up @@ -643,6 +652,14 @@ ReadResult.prototype.flatMap = function(func) {
return new ReadResult(result.value.element, joinElements(this.extra, result.value.extra), result.messages);
};

ReadResult.map = function(first, second, func) {
return new ReadResult(
func(first.value, second.value),
joinElements(first.extra, second.extra),
first.messages.concat(second.messages)
);
};

function combineResults(results) {
var result = Result.combine(_.pluck(results, "_result"));
return new ReadResult(
Expand Down
2 changes: 1 addition & 1 deletion lib/images.js
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ function imgElement(func) {
exports.inline = exports.imgElement;

exports.dataUri = imgElement(function(element) {
return element.read("base64").then(function(imageBuffer) {
return element.readAsBase64String().then(function(imageBuffer) {
return {
src: "data:" + element.contentType + ";base64," + imageBuffer
};
Expand Down
8 changes: 7 additions & 1 deletion lib/index.d.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
interface Mammoth {
convertToHtml: (input: Input, options?: Options) => Promise<Result>;
extractRawText: (input: Input) => Promise<Result>;
embedStyleMap: (input: Input, styleMap: string) => Promise<{toBuffer: () => Buffer}>;
embedStyleMap: (input: Input, styleMap: string) => Promise<{
toArrayBuffer: () => ArrayBuffer,
toBuffer: () => Buffer,
}>;
images: Images;
}

Expand Down Expand Up @@ -39,6 +42,9 @@ interface ImageConverter {

interface Image {
contentType: string;
readAsArrayBuffer: () => Promise<ArrayBuffer>;
readAsBase64String: () => Promise<string>;
readAsBuffer: () => Promise<Buffer>;
read: ImageRead;
}

Expand Down
9 changes: 6 additions & 3 deletions lib/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -92,12 +92,15 @@ function embedStyleMap(input, styleMap) {
return docxStyleMap.writeStyleMap(docxFile, styleMap);
})
.then(function(docxFile) {
return docxFile.toBuffer();
return docxFile.toArrayBuffer();
})
.then(function(buffer) {
.then(function(arrayBuffer) {
return {
toArrayBuffer: function() {
return arrayBuffer;
},
toBuffer: function() {
return buffer;
return Buffer.from(arrayBuffer);
}
};
});
Expand Down
2 changes: 0 additions & 2 deletions lib/unzip.js
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
exports.openZip = openZip;

var fs = require("fs");

var promises = require("./promises");
Expand Down
Loading

0 comments on commit 2356fe3

Please sign in to comment.