Have you ever found yourself updating thousands of nodes in Alfresco using a JavaScript script? For example to post-process documents imported by BFSIT? Or did you ever have to traverse a large folder tree and run some logic on each document? Then you know that repo-side scripts run slow and in one transaction: you may wait a day for it to finish, and if it fails then nothing is saved. It makes it almost impossible to use repo-side JavaScript for bulk processing.
Alfresco JavaScript Batch Executer tool is aimed to solve this problem in multithreaded and transactional manner. It can do the job 10 times faster, while clearly showing the progress and also allowing you to cancel running jobs.
To install the tool in your Alfresco instance you can simply download the
JAR file
from this project and put it to alfresco/WEB-INF/lib
folder.
If you use Maven to build your AMP or Alfresco WAR, you can add following dependency:
<dependency>
<groupId>nl.ciber.alfresco</groupId>
<artifactId>batch-executer</artifactId>
<version>0.9</version>
</dependency>
from following repository:
<repositories>
<repository>
<id>batch-executer-mvn-repo</id>
<url>https://raw.github.com/ciber/alfresco-js-batch-executer/mvn-repo/</url>
</repository>
</repositories>
Once you install the tool you will have a new root object available in your scripts: batchExecuter
.
Here is an example usage which will set an author for all documents in Alfresco:
batchExecuter.processFolderRecursively({
root: companyhome,
onNode: function(node) {
if (node.isDocument) {
node.properties['cm:author'] = "Ciber NL";
node.save();
}
}
});
This simple script will traverse all folders and documents in Alfresco recursively and set cm:author value to "Ciber NL" for each document. 4 threads will be processing nodes, and each batch of 200 documents will be committed in a separate transaction.
Here is another example which lets you process a CSV file in a highly-performing way:
batchExecuter.processArray({
items: companyhome.childByNamePath("groups.csv").content.split("\n"),
batchSize: 50,
threads: 2,
onNode: function(row) {
// split row string into columns and create a group with given name, for example
}
});
You can monitor the progress in log files and control running jobs using a webscript page:
http://localhost:8080/alfresco/s/ciber/batch-executer/jobs
Batch executer supports following functions to process bulks of data:
processFolderRecursively(parametersObject)
- processes a folder recursively. Parameterroot
specifies where to start.processArray(parametersObject)
- processes an array of items: it may be nodes or primitive JavaScript objects or anything. Parameteritems
contains the array.
Following parameters are supported when calling these functions.
Name | Description |
---|---|
root |
The folder to process, mandatory when calling processFolderRecursively function, ignored otherwise.
The folder is traversed in depth-first-search manner and all nodes are fed to
the processing function, including the root folder itself and any sub-folders and documents.
Only cm:contains associations are used to fetch children.
|
items |
The array of items to process, mandatory when calling processArray function, ignored otherwise.
Each item is fed to processing function onNode or onBatch and does not necessarily
have to be a node. It may be any JavaScript object.
|
batchSize |
The size of a batch to use when processing. Optional, default value is 200 .
Each batch is committed in separate transaction.
|
threads |
The number of processing threads. Optional, default value is 4 .
|
disableRules |
May be used to disable Alfresco rules when processing takes place. Optional, false by default.
|
onNode |
A JavaScript function which will be executed on each item found by batchExecuter . It receives one
parameter: the item, it may be a document, folder, a string from items array etc. Mandatory unless
onBatch function is supplied.
|
onBatch |
A JavaScript function which will be executed on each batch of items to process.
It receives one parameter: a JavaScript array of items in the batch. This function can be used to further
improve performance by grouping some logic in batches. For example if you have to check for each document if
another one exists with the same name, then you can make one query with all names included by
OR instead of executing one search query for each node. This can improve performance but
complicates the implementation of course. onBatch parameter is mandatory unless
onNode function is present.
|
Found a bug or have an idea of a new feature? Please create an issue here on GitHub!