Skip to content

Split large XML files into smaller chunks, uses Node.js Stream API

License

Notifications You must be signed in to change notification settings

remuslazar/node-xmlsplit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

XmlSplit

Build Status

Abstract

This utility helps you dealing with (very) large XML input files, splitting them into smaller chunks of valid XML files, which can be processed sequentially (in memory) using e.g. libxmljs.

Motivation

Performance.

There are other (very useful) libs available, like

to name a few, because of the xml parsing behind the scenes, the performance is not quite good enough for some applications.

To handle the XML parsing part, plain JavaScript Strings and methods (.slice, split) are being used, for obvious reasons.

API

Initialize a new object

var xmlsplit = new XmlSplit(batchSize=1, tagName=<autodetect>)

and use the stream.Transform API to pass your XML data through.

By default, it splits children of the root, but the tag name can be specified with the constructor's second argument.

The optional batchSize argument sets the number of items in each XML document.

The awesome Stream Handbook covers the basics of Node.js Stream and is a "must read"..

Example

An example XML input file could look something like

<?xml version = '1.0' encoding = 'UTF-8'?>
<product_export  date="2015-06-19">
    <product id=1> ... </product>
    <product id=2> ... </product>
    ...
</product_export>

Using this code snippet:

var XmlSplit = require('./lib/xmlsplit.js')

var xmlsplit = new XmlSplit()
var inputStream = fs.createReadStream() // from somewhere

inputStream.pipe(xmlsplit).on('data', function(data) {
    var xmlDocument = data.toString()
    // do something with xmlDocument ..
})

You will get a full valid XML document on each iteration:

<?xml version = '1.0' encoding = 'UTF-8'?>
<product_export  date="2015-06-19">
    <product id=1> ... </product>
</product_export>
<?xml version = '1.0' encoding = 'UTF-8'?>
<product_export  date="2015-06-19">
    <product id=2> ... </product>
</product_export>

License

See LICENSE.txt

About

Split large XML files into smaller chunks, uses Node.js Stream API

Resources

License

Stars

Watchers

Forks

Packages

No packages published