XmlSplit

Abstract

This utility helps you dealing with (very) large XML input files, splitting them into smaller chunks of valid XML files, which can be processed sequentially (in memory) using e.g. libxmljs.

Motivation

Performance.

There are other (very useful) libs available, like

to name a few, because of the xml parsing behind the scenes, the performance is not quite good enough for some applications.

To handle the XML parsing part, plain JavaScript Strings and methods (.slice, split) are being used, for obvious reasons.

API

Initialize a new object

var xmlsplit = new XmlSplit(batchSize=1, tagName=<autodetect>)

and use the stream.Transform API to pass your XML data through.

By default, it splits children of the root, but the tag name can be specified with the constructor's second argument.

The optional batchSize argument sets the number of items in each XML document.

The awesome Stream Handbook covers the basics of Node.js Stream and is a "must read"..

Example

An example XML input file could look something like

<?xml version = '1.0' encoding = 'UTF-8'?>
<product_export  date="2015-06-19">
    <product id=1> ... </product>
    <product id=2> ... </product>
    ...
</product_export>

Using this code snippet:

var XmlSplit = require('./lib/xmlsplit.js')

var xmlsplit = new XmlSplit()
var inputStream = fs.createReadStream() // from somewhere

inputStream.pipe(xmlsplit).on('data', function(data) {
    var xmlDocument = data.toString()
    // do something with xmlDocument ..
})

You will get a full valid XML document on each iteration:

<?xml version = '1.0' encoding = 'UTF-8'?>
<product_export  date="2015-06-19">
    <product id=1> ... </product>
</product_export>

<?xml version = '1.0' encoding = 'UTF-8'?>
<product_export  date="2015-06-19">
    <product id=2> ... </product>
</product_export>

License

See LICENSE.txt

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
examples		examples
lib		lib
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.txt		LICENSE.txt
README.md		README.md
index.js		index.js
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

XmlSplit

Abstract

Motivation

API

Example

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

remuslazar/node-xmlsplit

Folders and files

Latest commit

History

Repository files navigation

XmlSplit

Abstract

Motivation

API

Example

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages