Skip to content

Commit

Permalink
- Resolving #111;
Browse files Browse the repository at this point in the history
- Adding `XBrowserNode` to deal with browser-generated nodes with extra properties;
- Additional comments and notes at TODO and README files.
  • Loading branch information
leonelsanchesdasilva committed Nov 2, 2024
1 parent aa9a2a3 commit a8d533a
Show file tree
Hide file tree
Showing 9 changed files with 128 additions and 39 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ const xslt = new Xslt(options);
```

- `cData` (`boolean`, default `true`): resolves CDATA elements in the output. Content under CDATA is resolved as text. This overrides `escape` for CDATA content.
- `escape` (`boolean`, default `true`): replaces symbols like `<`, `>`, `&` and `"` by the corresponding [XML entities](https://www.tutorialspoint.com/xml/xml_character_entities.htm).
- `escape` (`boolean`, default `true`): replaces symbols like `<`, `>`, `&` and `"` by the corresponding [HTML/XML entities](https://www.tutorialspoint.com/xml/xml_character_entities.htm). Can be overridden by `disable-output-escaping`, that also does the opposite, unescaping `&gt;` and `&lt;` by `<` and `>`, respectively.
- `selfClosingTags` (`boolean`, default `true`): Self-closes tags that don't have inner elements, if `true`. For instance, `<test></test>` becomes `<test />`.
- `outputMethod` (`string`, default `xml`): Specifies the default output method. if `<xsl:output>` is declared in your XSLT file, this will be overridden.
- `parameters` (`array`, default `[]`): external parameters that you want to use.
Expand Down
9 changes: 6 additions & 3 deletions TODO.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
XSLT-processor TODO
=====

* Rethink match algorithm, as described in https://github.com/DesignLiquido/xslt-processor/pull/62#issuecomment-1636684453;
* Rethink match algorithm, as described in https://github.com/DesignLiquido/xslt-processor/pull/62#issuecomment-1636684453. There's a good number of issues open about this problem:
* https://github.com/DesignLiquido/xslt-processor/issues/108
* https://github.com/DesignLiquido/xslt-processor/issues/109
* https://github.com/DesignLiquido/xslt-processor/issues/110
* XSLT validation, besides the version number;
* XSL:number
* `attribute-set`, `decimal-format`, etc. (check `src/xslt.ts`)
* `/html/body//ul/li|html/body//ol/li` has `/html/body//ul/li` evaluated by this XPath implementation as "absolute", and `/html/body//ol/li` as "relative". Both should be evaluated as "absolute".
* Implement `<xsl:import>` with correct template precedence.
* `/html/body//ul/li|html/body//ol/li` has `/html/body//ul/li` evaluated by this XPath implementation as "absolute", and `/html/body//ol/li` as "relative". Both should be evaluated as "absolute". One idea is to rewrite the XPath logic entirely, since it is nearly impossible to debug it.
* Implement `<xsl:import>` with correct template precedence.

Help is much appreciated. It seems to currently work for most of our purposes, but fixes and additions are always welcome!
1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
"@rollup/plugin-typescript": "^11.1.1",
"@types/he": "^1.2.0",
"@types/jest": "^29.5.12",
"@types/node-fetch": "^2.6.11",
"@typescript-eslint/eslint-plugin": "^8.4.0",
"@typescript-eslint/parser": "^8.4.0",
"babel-jest": "^29.7.0",
Expand Down
1 change: 1 addition & 0 deletions src/dom/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@ export * from './xdocument';
export * from './xml-functions';
export * from './xml-output-options';
export * from './xml-parser';
export * from './xbrowser-node';
export * from './xnode';
10 changes: 10 additions & 0 deletions src/dom/xbrowser-node.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
import { XNode } from "./xnode";

/**
* Special XNode class, that retains properties from browsers like
* IE, Opera, Safari, etc.
*/
export class XBrowserNode extends XNode {
innerText?: string;
textContent?: string;
}
104 changes: 72 additions & 32 deletions src/dom/xml-functions.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ import { domGetAttributeValue } from './functions';
import { XNode } from './xnode';
import { XDocument } from './xdocument';
import { XmlOutputOptions } from './xml-output-options';

import { XBrowserNode } from './xbrowser-node';

/**
* Returns the text value of a node; for nodes without children this
Expand All @@ -25,15 +25,15 @@ import { XmlOutputOptions } from './xml-output-options';
* @param disallowBrowserSpecificOptimization A boolean, to avoid browser optimization.
* @returns The XML value as a string.
*/
export function xmlValue(node: XNode | any, disallowBrowserSpecificOptimization: boolean = false): string {
export function xmlValue(node: XNode, disallowBrowserSpecificOptimization: boolean = false): string {
if (!node) {
return '';
}

let ret = '';
switch (node.nodeType) {
case DOM_DOCUMENT_TYPE_NODE:
return `<!DOCTYPE ${node.nodeValue}>`
return `<!DOCTYPE ${node.nodeValue}>`;

Check warning on line 36 in src/dom/xml-functions.ts

View workflow job for this annotation

GitHub Actions / Coverage annotations (🧪 jest-coverage-report-action)

🧾 Statement is not covered

Warning! Not covered statement

Check warning on line 36 in src/dom/xml-functions.ts

View workflow job for this annotation

GitHub Actions / Coverage annotations (🧪 jest-coverage-report-action)

🌿 Branch is not covered

Warning! Not covered branch
case DOM_TEXT_NODE:
case DOM_CDATA_SECTION_NODE:
case DOM_ATTRIBUTE_NODE:
Expand All @@ -44,19 +44,22 @@ export function xmlValue(node: XNode | any, disallowBrowserSpecificOptimization:
if (!disallowBrowserSpecificOptimization) {
// Only returns something if node has either `innerText` or `textContent` (not an XNode).
// IE, Safari, Opera, and friends (`innerText`)
const innerText = node.innerText;
if (innerText != undefined) {
const browserNode = node as XBrowserNode;
const innerText = browserNode.innerText;
if (innerText !== undefined) {
return innerText;
}

Check warning on line 51 in src/dom/xml-functions.ts

View workflow job for this annotation

GitHub Actions / Coverage annotations (🧪 jest-coverage-report-action)

🌿 Branch is not covered

Warning! Not covered branch
// Firefox (`textContent`)
const textContent = node.textContent;
if (textContent != undefined) {
const textContent = browserNode.textContent;
if (textContent !== undefined) {
return textContent;
}

Check warning on line 56 in src/dom/xml-functions.ts

View workflow job for this annotation

GitHub Actions / Coverage annotations (🧪 jest-coverage-report-action)

🌿 Branch is not covered

Warning! Not covered branch
}

if (node.transformedChildNodes.length > 0) {
const transformedTextNodes = node.transformedChildNodes.filter((n: XNode) => n.nodeType !== DOM_ATTRIBUTE_NODE);
const transformedTextNodes = node.transformedChildNodes.filter(
(n: XNode) => n.nodeType !== DOM_ATTRIBUTE_NODE
);
for (let i = 0; i < transformedTextNodes.length; ++i) {
ret += xmlValue(transformedTextNodes[i]);
}
Expand All @@ -71,8 +74,15 @@ export function xmlValue(node: XNode | any, disallowBrowserSpecificOptimization:
}
}

// TODO: Give a better name to this.
export function xmlValue2(node: any, disallowBrowserSpecificOptimization: boolean = false) {
/**
* The older version to obtain a XML value from a node.
* For now, this form is only used to get text from attribute nodes,
* and it should be removed in future versions.
* @param node The attribute node.
* @param disallowBrowserSpecificOptimization A boolean, to avoid browser optimization.
* @returns The XML value as a string.
*/
export function xmlValueLegacyBehavior(node: XNode, disallowBrowserSpecificOptimization: boolean = false) {
if (!node) {
return '';
}
Expand All @@ -91,13 +101,14 @@ export function xmlValue2(node: any, disallowBrowserSpecificOptimization: boolea
case DOM_ELEMENT_NODE:
if (!disallowBrowserSpecificOptimization) {
// IE, Safari, Opera, and friends
const innerText = node.innerText;
if (innerText != undefined) {
const browserNode = node as XBrowserNode;
const innerText = browserNode.innerText;
if (innerText !== undefined) {
return innerText;
}

Check warning on line 108 in src/dom/xml-functions.ts

View workflow job for this annotation

GitHub Actions / Coverage annotations (🧪 jest-coverage-report-action)

🌿 Branch is not covered

Warning! Not covered branch
// Firefox
const textContent = node.textContent;
if (textContent != undefined) {
const textContent = browserNode.textContent;
if (textContent !== undefined) {
return textContent;
}

Check warning on line 113 in src/dom/xml-functions.ts

View workflow job for this annotation

GitHub Actions / Coverage annotations (🧪 jest-coverage-report-action)

🌿 Branch is not covered

Warning! Not covered branch
}
Expand All @@ -121,17 +132,28 @@ export function xmlValue2(node: any, disallowBrowserSpecificOptimization: boolea
* @returns The XML string.
* @see xmlTransformedText
*/
export function xmlText(node: XNode, options: XmlOutputOptions = {
cData: true,
escape: true,
selfClosingTags: true,
outputMethod: 'xml'
}) {
export function xmlText(
node: XNode,
options: XmlOutputOptions = {
cData: true,
escape: true,
selfClosingTags: true,
outputMethod: 'xml'
}
) {
const buffer: string[] = [];
xmlTextRecursive(node, buffer, options);
return buffer.join('');
}

/**
* The recursive logic to transform a node in XML text.
* It can be considered legacy, since it does not work with transformed nodes, and
* probably will be removed in the future.
* @param {XNode} node The node.
* @param {string[]} buffer The buffer, that will represent the transformed XML text.
* @param {XmlOutputOptions} options XML output options.
*/
function xmlTextRecursive(node: XNode, buffer: string[], options: XmlOutputOptions) {
if (node.nodeType == DOM_TEXT_NODE) {
buffer.push(xmlEscapeText(node.nodeValue));
Expand All @@ -158,7 +180,10 @@ function xmlTextRecursive(node: XNode, buffer: string[], options: XmlOutputOptio
}

if (node.childNodes.length === 0) {
if (options.selfClosingTags || (options.outputMethod === 'html' && ['hr', 'link'].includes(node.nodeName))) {
if (
options.selfClosingTags ||
(options.outputMethod === 'html' && ['hr', 'link'].includes(node.nodeName))
) {
buffer.push('/>');
} else {
buffer.push(`></${xmlFullNodeName(node)}>`);
Expand Down Expand Up @@ -197,15 +222,20 @@ export function xmlTransformedText(
return buffer.join('');
}

function xmlTransformedTextRecursive(node: XNode, buffer: any[], options: XmlOutputOptions) {
/**
* The recursive logic to transform a node in XML text.
* @param {XNode} node The node.
* @param {string[]} buffer The buffer, that will represent the transformed XML text.
* @param {XmlOutputOptions} options XML output options.
*/
function xmlTransformedTextRecursive(node: XNode, buffer: string[], options: XmlOutputOptions) {
if (node.visited) return;
const nodeType = node.transformedNodeType || node.nodeType;
const nodeValue = node.transformedNodeValue || node.nodeValue;
if (nodeType === DOM_TEXT_NODE) {
if (node.transformedNodeValue && node.transformedNodeValue.trim() !== '') {
const finalText = node.escape && options.escape?
xmlEscapeText(node.transformedNodeValue) :
node.transformedNodeValue;
const finalText =
node.escape && options.escape ? xmlEscapeText(node.transformedNodeValue): xmlUnescapeText(node.transformedNodeValue);
buffer.push(finalText);
}
} else if (nodeType === DOM_CDATA_SECTION_NODE) {
Expand Down Expand Up @@ -246,9 +276,9 @@ function xmlTransformedTextRecursive(node: XNode, buffer: any[], options: XmlOut
function xmlElementLogicTrivial(node: XNode, buffer: string[], options: XmlOutputOptions) {
buffer.push(`<${xmlFullNodeName(node)}`);

let attributes = node.transformedChildNodes.filter(n => n.nodeType === DOM_ATTRIBUTE_NODE);
let attributes = node.transformedChildNodes.filter((n) => n.nodeType === DOM_ATTRIBUTE_NODE);
if (attributes.length === 0) {
attributes = node.childNodes.filter(n => n.nodeType === DOM_ATTRIBUTE_NODE);
attributes = node.childNodes.filter((n) => n.nodeType === DOM_ATTRIBUTE_NODE);

Check warning on line 281 in src/dom/xml-functions.ts

View workflow job for this annotation

GitHub Actions / Coverage annotations (🧪 jest-coverage-report-action)

🧾 Statement is not covered

Warning! Not covered statement

Check warning on line 281 in src/dom/xml-functions.ts

View workflow job for this annotation

GitHub Actions / Coverage annotations (🧪 jest-coverage-report-action)

🕹️ Function is not covered

Warning! Not covered function
}

for (let i = 0; i < attributes.length; ++i) {
Expand All @@ -262,9 +292,9 @@ function xmlElementLogicTrivial(node: XNode, buffer: string[], options: XmlOutpu
}
}

let childNodes = node.transformedChildNodes.filter(n => n.nodeType !== DOM_ATTRIBUTE_NODE);
let childNodes = node.transformedChildNodes.filter((n) => n.nodeType !== DOM_ATTRIBUTE_NODE);
if (childNodes.length === 0) {
childNodes = node.childNodes.filter(n => n.nodeType !== DOM_ATTRIBUTE_NODE);
childNodes = node.childNodes.filter((n) => n.nodeType !== DOM_ATTRIBUTE_NODE);

Check warning on line 297 in src/dom/xml-functions.ts

View workflow job for this annotation

GitHub Actions / Coverage annotations (🧪 jest-coverage-report-action)

🧾 Statement is not covered

Warning! Not covered statement

Check warning on line 297 in src/dom/xml-functions.ts

View workflow job for this annotation

GitHub Actions / Coverage annotations (🧪 jest-coverage-report-action)

🕹️ Function is not covered

Warning! Not covered function
}

childNodes = childNodes.sort((a, b) => a.siblingPosition - b.siblingPosition);
Expand Down Expand Up @@ -317,7 +347,17 @@ function xmlFullNodeName(node: XNode): string {
}

/**
* Escape XML special markup chracters: tag delimiter < > and entity
* Replaces HTML/XML entities to their literal characters.
* Currently implementing only tag delimiters.
* @param text The text to be transformed.
* @returns The unescaped text.
*/
export function xmlUnescapeText(text: string): string {
return `${text}`.replace(/&lt;/g, '<').replace(/&gt;/g, '>');
}

/**
* Escape XML special markup characters: tag delimiter <, >, and entity
* reference start delimiter &. The escaped string can be used in XML
* text portions (i.e. between tags).
* @param s The string to be escaped.
Expand All @@ -332,8 +372,8 @@ export function xmlEscapeText(s: string): string {
}

/**
* Escape XML special markup characters: tag delimiter < > entity
* reference start delimiter & and quotes ". The escaped string can be
* Escape XML special markup characters: tag delimiter, <, >, entity
* reference start delimiter &, and double quotes ("). The escaped string can be
* used in double quoted XML attribute value portions (i.e. in
* attributes within start tags).
* @param s The string to be escaped.
Expand Down
4 changes: 2 additions & 2 deletions src/xslt/xslt.ts
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ import {
xmlGetAttribute,
xmlTransformedText,
xmlValue,
xmlValue2
xmlValueLegacyBehavior
} from '../dom';
import { ExprContext, XPath } from '../xpath';

Expand Down Expand Up @@ -366,7 +366,7 @@ export class Xslt {

const documentFragment = domCreateDocumentFragment(this.outputDocument);
await this.xsltChildNodes(context, template, documentFragment);
const value = xmlValue2(documentFragment);
const value = xmlValueLegacyBehavior(documentFragment);

if (output && output.nodeType === DOM_DOCUMENT_FRAGMENT_NODE) {
domSetTransformedAttribute(output, name, value);
Expand Down
28 changes: 27 additions & 1 deletion tests/xslt/xslt.test.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,16 @@ describe('xslt', () => {
});

describe('xsl:text', () => {
it('disable-output-escaping', async () => {
// Apparently, this is not how `disable-output-escaping` works.
// By an initial research, `<!DOCTYPE html>` explicitly mentioned in
// the XSLT gives an error like:
// `Unable to generate the XML document using the provided XML/XSL input.
// org.xml.sax.SAXParseException; lineNumber: 4; columnNumber: 70;
// A DOCTYPE is not allowed in content.`
// All the examples of `disable-output-escaping` usage will point out
// the opposite: `&lt;!DOCTYPE html&gt;` will become `<!DOCTYPE html>`.
// This test will be kept here for historical purposes.
it.skip('disable-output-escaping', async () => {
const xml = `<anything></anything>`;
const xslt = `<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" indent="yes" />
Expand All @@ -216,6 +225,23 @@ describe('xslt', () => {
assert.equal(html, '<!DOCTYPE html>');
});

it('disable-output-escaping, XML/HTML entities', async () => {
const xml = `<anything></anything>`;
const xslt = `<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" indent="yes" />
<xsl:template match="/">
<xsl:text disable-output-escaping='yes'>&lt;!DOCTYPE html&gt;</xsl:text>
</xsl:template>
</xsl:stylesheet>`;

const xsltClass = new Xslt();
const xmlParser = new XmlParser();
const parsedXml = xmlParser.xmlParse(xml);
const parsedXslt = xmlParser.xmlParse(xslt);
const html = await xsltClass.xsltProcess(parsedXml, parsedXslt);
assert.equal(html, '<!DOCTYPE html>');
});

it('CDATA as JavaScript', async () => {
const xml = `<XampleXml xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
Expand Down
8 changes: 8 additions & 0 deletions yarn.lock
Original file line number Diff line number Diff line change
Expand Up @@ -1886,6 +1886,14 @@
resolved "https://registry.yarnpkg.com/@types/json-schema/-/json-schema-7.0.15.tgz#596a1747233694d50f6ad8a7869fcb6f56cf5841"
integrity sha512-5+fP8P8MFNC+AyZCDxrB2pkZFPGzqQWUzpSeuuVLvm8VMcorNYavBqoFcxK8bQz4Qsbn4oUEEem4wDLfcysGHA==

"@types/node-fetch@^2.6.11":
version "2.6.11"
resolved "https://registry.yarnpkg.com/@types/node-fetch/-/node-fetch-2.6.11.tgz#9b39b78665dae0e82a08f02f4967d62c66f95d24"
integrity sha512-24xFj9R5+rfQJLRyM56qh+wnVSYhyXC2tkoBndtY0U+vubqNsYXGjufB2nn8Q6gt0LrARwL6UBtMCSVCwl4B1g==
dependencies:
"@types/node" "*"
form-data "^4.0.0"

"@types/node@*":
version "22.5.4"
resolved "https://registry.yarnpkg.com/@types/node/-/node-22.5.4.tgz#83f7d1f65bc2ed223bdbf57c7884f1d5a4fa84e8"
Expand Down

0 comments on commit a8d533a

Please sign in to comment.