Skip to content
This repository has been archived by the owner on Sep 25, 2022. It is now read-only.

Latest commit

 

History

History
65 lines (49 loc) · 5.04 KB

README.md

File metadata and controls

65 lines (49 loc) · 5.04 KB

Alfresco Bulk Import Tool

Build Status Downloads Open Issues License Codacy Code Climate GitHub Stats Project Stats

What Is It?

A high performance bulk import tool for the open source Alfresco Document Management System.

"'High Performance', you say?"

Why yes. Alfresco's built-in mechanisms for moving large amounts of content into the repository (the various file-server protocols, the venerable ACP mechanism, the mind-bogglingly inefficient CMIS standard etc.) all suffer from a variety of limitations that make them a lot slower than the core Alfresco repository. This tool cuts out virtually all of that nonsense, attempts to maximise "mechanical sympathy" (which, for Alfresco, basically means treating your database nicely), and makes one or two large and opinionated assumptions that allows it to be a lot faster than anything else out there.

In terms of benchmarks, the old v1.x versions of the tool have regularly demonstrated sustained ingestion rates of over 500 documents per second in production environments, and in testing, the v2.x version has been shown to be up to 4X faster than 1.x (in specific circumstances, notably for streaming imports).

Documentation

Resources

Older resources (less relevant for v2.0+):

What's New?

Contributing

Please see Contributing.

Attributions

Commercial Support

This extension is not supported by Alfresco Software Inc., although a fork of an early, pre-release version of this tool has been included in Alfresco Enterprise since v4.0, and has (at times) been supported by Alfresco support.

Please note that the embedded fork has never been rebased against upstream, meaning that it is ancient - equivalent to v1.0-RC1 (circa mid-2010). It also introduced a number of serious bugs (e.g. incorrect "source striping" algorithm, no support for Alfresco clusters) that the original edition never had. The embedded fork has also been independently measured to be around 25% slower than the original edition available here.

tl;dr: use of the embedded fork is STRONGLY discouraged!

License

Copyright Peter Monks 2007. Licensed under the Apache 2.0 License.